[ Team LiB ] |
1.2 How to Read This BookIn this book we demonstrate important code-reading techniques and outline common programming concepts in the form they appear in practice, striving to improve your code-reading ability. Although you will find in the following chapters discussions of many important computer science and computing practice concepts such as data and control structures, coding standards, and software architectures, their treatment is by necessity cursory since the purpose of the book is to get you to examine the use of these ideas in the context of production code, rather than to introduce the ideas themselves. We have arranged the material in an order that will let you progress from the basic to the more sophisticated elements. However, the book is a reader, not a detective novel, so feel free to read it in the sequence that suits your interests. 1.2.1 Typographical ConventionsAll code listings and text references to program elements (for example, function names, keywords, operators) are set in typewriter font. Some of our examples refer to command sequences executed in a Unix or Windows shell. We display the shell command prompt $ to denote Unix shell commands and the DOS command prompt Figure 1.1 Example of an annotated listing.
C:\> to denote the Windows console prompt. Unix shell commands can span more than one line; we use > as the continuation line symbol.
The prompts and the continuation line symbol are displayed only to distinguish your input from the system output; you type only the commands after the prompt.
The code examples we use in this book come from real-world programs. We identify the programs we use (such as the one appearing in Figure 1.1) in a footnote[3] giving the precise location of the program in the directory tree of the book's companion source code and the line numbers covered by the specific fragment. When a figure includes parts of different source code files (as is the case in Figure 5.17, page 169) the footnote will indicate the directory where these files reside.[4]
Sometimes we omit parts from the code we list; we indicate those with an ellipsis sign [...]. In those cases the line numbers represent the entire range covered by the listed code. Other changes you may notice when referring back to the original code are changes of most C declarations from the old "Kernighan and Ritchie" style to ANSI C and the omission of some comments, white space, and program licensing information. We hope that these changes enhance the readability of the examples we provide without overly affecting the realism of the original examples. Nontrivial code samples are graphically annotated with comments using a custom-built software application. The use of the annotation software ensures that the examples remain correct and can be machine-verified. Sometimes we expand on an annotation in the narrative text. In those cases (Figure 1.1:1) the annotation starts with a number printed in a box; the same number, following a colon, is used to refer to the annotation from the text. 1.2.2 DiagramsWe chose UML for our design diagrams because it is the de facto industry standard. In preparing this book, we found it useful to develop an open-source declarative language for generating UML diagrams,[5] and we also made some small improvements to the code base underlying GraphViz[6] tool. We hope you find that the resulting UML diagrams help you better understand the code we analyze.
Figure 1.2 shows examples of the notation we use in our diagrams. Keep in mind the following.
Figure 1.2. UML-based diagram notation.All other relationships use standard UML notation.
1.2.3 ExercisesThe exercises you will find at the end of most sections aim to provide you with an incentive to apply the techniques we described and to further research particularly interesting issues, or they may be starting points for in-depth discussions. In most instances you can use references to the book's CD-ROM and to "code in your environment" interchangeably. What is important is to read and examine code from real-world, nontrivial systems. If you are currently working on such a system (be it in a proprietary development effort or an open-source project), it will be more productive to target the code-reading exercises toward that system instead of the book's CD-ROM. Many exercises begin by asking you to locate particular code sequences. This task can be automated. First, express the code you are looking for as a regular expression. (Read more about regular expressions in Chapter 10.) Then, search through the code base using a command such as the following in the Unix environment:
or using the Perl script codefind.pl[7] in the Windows environment. (Some of the files in the source code base have the same name as old MS-DOS devices, causing some Windows implementations to hang when trying to access them; the Perl script explicitly codes around this problem.)
1.2.4 Supplementary MaterialAll the examples you will find in this book are based on existing open-source software code. The source code base comprises more than 53,000 files occupying over 540 MB. All references to code examples are unambiguously identified in footnotes so you can examine the referenced code in its context. In addition, you can coordinate your exploration of the source code base with the book's text in three different ways.
1.2.5 ToolsSome of the examples we provide depend on the availability of programs found under Unix-type operating systems, such as grep and find. A number of such systems (for example, FreeBSD, GNU/Linux, NetBSD, OpenBSD, and Solaris) are now freely available to download and install on a wide variety of hardware. If you do not have access to such a system, you can still benefit from these tools by using ports that have been made to other operating systems such as Windows. (Section 10.9 contains further details on tool availability.) 1.2.6 OutlineIn Chapter 2 we present two complete programs and examine their workings in a step-by-step fashion. In doing so we outline some basic strategies for code reading and identify common C control structures, building blocks, idioms, and pitfalls. We leave some more advanced (and easily abused) elements of the C language to be discussed in Chapters 3 and 5. Chapter 4 examines how to read code embodying common data structures. Chapter 6 deals with code found in really large projects: geographically distributed team efforts comprising thousands of files and millions of lines of code. Large projects typically adopt common coding standards and conventions (discussed in Chapter 7) and may include formal documentation (presented in Chapter 8). Chapter 9 provides background information and advice on viewing the forest rather than the trees: the system's architecture rather than its code details. When reading code you can use a number of tools. These are the subject of Chapter 10. Finally, Chapter 11 contains a complete worked-out example: the code-reading and code-understanding techniques presented in the rest of the book are applied for locating and extracting a phase of the moon algorithm from the NetBSD source code base and adding it as an SQL function in the Java-based HSQL database engine. In the form of appendices you will find an overview of the code that we used in the examples and that accompanies this book (Appendix A), a list of individuals and organizations whose code appears in the book's text (Appendix B), a list of all referenced source files ordered by the directory in which they occur (Appendix C), the source code licenses (Appendix D), and a list of maxims for reading code with references to the page where each one is introduced (Appendix E). 1.2.7 The Great Language DebateMost examples in the book are based on C programs running on a POSIX character terminal environment. The reasons behind this choice have to do with the abundance of open-source software portable C code and the conciseness of the examples we found compared to similar ones written in C++ or Java. (The reasons behind this phenomenon are probably mostly related to the code's age or the prevalent coding style rather than particular language characteristics.) It is unfortunate that programs based on graphical user interfaces (GUIs) are poorly represented in our samples, but reading and reasoning about such programs really deserves a separate book volume. In all cases where we mention Microsoft Windows API functions we refer to the Win32 SDK API rather than the .NET platform.
We have been repeatedly asked about the languages used to write open-source software. Table 1.1 summarizes the number of projects using each of the top ten most-used languages in the SourceForge.net repository.[8] The C language features at the top of the list and is probably underrepresented because many very large C open-source projects such as FreeBSD and GNU/Linux are independently hosted, and many projects claiming to use C++ are in fact written in C, making very little use of the C++ features. On the other hand, keep in mind that a similar list compiled for non-open-source projects would be entirely different, probably featuring COBOL, Ada, Fortran, and assorted 4GLs at the top of the list. Furthermore, if you are maintaining code�a very likely if unfashionable reason for reading code�the language you will be reading would very likely have been adopted (if you are lucky) five or ten years ago, reflecting the programming language landscape of that era.
The code structures that our examples represent apply in most cases equally well to Java and C++ programs; many also apply to Perl, Python, and PHP. However, you can safely skip
|
[ Team LiB ] |
No comments:
Post a Comment