Monday, December 21, 2009

Chapter Summary










[Page 459 (continued)]

Chapter Summary



Technical Terms


array

array initializer

array length

binary search

data structure

element

element type

insertion sort

multidimensional array

one-dimensional array

polymorphic sort method

selection sort

sequential search

sorting

subscript

two-dimensional array




Summary of Important Points


  • An array is a named collection of contiguous storage locations, each of which stores a data item of the same data type. Each element of an array is referred to by a subscriptthat is, by its position in the array. If the array contains N elements, then its length is N and its indexes are 0, 1, . . . N-1.


  • [Page 460]
  • Array elements are referred to using the following subscript notation arrayname[subscript], where arrayname is any valid identifier, and subscript is an integer value in the range 0 to arrayname.length - 1. The array's length instance variable can be used as a bound for loops that process the array.

  • An array declaration provides the name and type of the array. An array instantiation uses the keyword new and causes the compiler to allocate memory for the array's elements:

    int arr[];          // Declare a one-dimensional array variable
    arr = new int[15]; // Allocate 15 int locations for it
  • Multidimensional arrays have arrays as their components:

    int twoDarr[][];            // Declare a two-dimensional array variable
    twoDarr = new int[10][15]; // Allocate 150 int locations
  • An array's values must be initialized by assigning values to each array location. An initializer expression may be included as part of the array declaration.

  • Insertion sort and selection sort are examples of array-sorting algorithms. Both algorithms require several passes over the array.

  • When an array is passed as an argument to a method, a reference to the array is passed rather than the entire array.

  • Swapping two elements of an array, or any two locations in memory, requires the use of a temporary variable.

  • Sequential search and binary search are examples of array-searching algorithms. Binary search requires that the array be sorted.

  • For multidimensional arrays, each dimension of the array has its own length variable.

  • Inheritance and polymorphism are useful design features for developing a hierarchy of computer games.













Some Tough Questions



[ Team LiB ]





Some Tough Questions


One reason that new innovations move into practice slowly is that some of them don't work very well in practice! Not all innovations are useful, and the Early Majority, Late Majority, and Laggards have some good reasons for being cautious. When presented with an innovation, they ask tough questions like these:[23]


  • Do experimental results prove conclusively that the innovation will work in practice?

  • Are successes a result of the innovation itself, or might they be the result of the people using it?

  • Is the innovation complete, or does it need to be adapted or extended before it can be applied?

  • Does the innovation have significant overhead (training, documentation) that offsets its value in the long run?

  • If the innovation was developed in a research setting, does it apply to real-world problems?

  • Does the innovation generally slow down the programmers?

  • Can the innovation be misapplied?

  • Is information available about the risks involved with using the innovation?

  • Does the innovation include information about how to integrate it with existing practices?

  • Must the innovation be applied in its entirety to realize significant benefits?


Very few software engineering practices have had data collected and disseminated widely enough to prepare software practitioners to answer questions like these. As a result, the software engineering practices described in Table 21-1 are stuck on the left side of the chasm. Early Adopters have been using many of those techniques for 15 years or more, while the later adopter groups are largely unaware of them. The numbers from Rogers's adopter categories are close to the numbers of projects that use code-and-fix development�about 75 percent of projects are still using code and fix or its close cousins, and a similar percentage of adopters fall onto the right side of the chasm.


Why is this? In Rogers's framework, one reason that innovations are diffused to Innovators and Early Adopters more quickly is that Innovators and Early Adopters tend to have more resources�they are in better positions to afford costly mistakes. Later adopters are more cautious partly because they aren't as resource rich. As I mentioned in Chapter 12, however, the scarce resource in this case isn't money; it's time. Lagging-edge practices such as code-and-fix development are associated with significant schedule overruns. The overtime that usually accompanies such overruns doesn't leave time for investigating and adopting more effective innovations.





    [ Team LiB ]



    Questions the SRS Answers for a Project










     < Free Open Study > 





    Questions the SRS Answers for a Project



    The major question that the SRS addresses concerns the problem domain in which the product will exist. As shown in Figure 17-3, all software products can be viewed from three perspectives: data, process, and behavior.



    Figure 17-3. Three Perspectives of Software Products



    The data view looks at identifying the primary data objects to be processed by the software system. The definition of each object as well as where it resides is identified. One object's relationship to other objects and the processes that act on them are also part of the data view. For an ATM machine, an important piece of data would be a customer's personal identification number (PIN).



    As the data within a software system moves, it is modified by a series of transformations. The process view of the problem domain focuses on these data transformations. The transformation of data from inputs to outputs satisfies the business requirements of the system as captured in the SRS. With our ATM, a process would be the validation of the PIN.



    The behavior view of the system focuses on the states a system assumes. The state is any externally observable mode of behavior. After a PIN is entered in the ATM, the machine is in the state of validating the PIN. Processes are running and data is being transformed, but the user is only observing a message that may indicate that his or her PIN is being validated.



    The SRS is the first step in a project that moves from the recognition and definition of the problem domain to a solution domain. The solution domain fills in the three perspectives with requirements models tailored to capturing data, process, and behavioral characteristics (see Figure 17-4). The development of these individual models will be covered later in this book. The SRS provides the framework for holding the models.



    Figure 17-4. Process, Data, Behavior



    Once we look at the views of our software system, questions arise that lead to the requirements collection. These questions relate to the quality of the SRS produced and to how we eventually validate the completed SRS. Construx Software (www.construx.com) provides a series of software development checklists. IEEE 830-1998 provides the set of quality characteristics an SRS exhibits:



    1. Correctness

    2. Unambiguousness

    3. Completeness

    4. Consistency

    5. Ranking for importance and/or stability

    6. Verifiability

    7. Modifiability

    8. Traceability



    Using the checklist model and the standard, an SRS aids in answering questions for these key characteristics:



    1. Correctness

      1. Is the expected response time, from the user's point of view, specified for all necessary operations?

      2. Are other timing considerations specified, such as processing time, data transfer, and system throughput?

      3. Are all the tasks the user wants to perform specified?

      4. Does each task specify the data used in the task and data resulting from the task?

      5. Is the level of security specified?

      6. Is the reliability specified, including the consequences of software failure, vital information protected from failure, error detection, and recovery?

      7. Are acceptable trade-offs between competing attributes specified, for example, between robustness and correctness?

      8. Are external people, software, and hardware interfaces defined?

      9. Is the definition of success included? Of failure?

    2. Unambiguousness

      1. Are the requirements clear enough to be turned over to an independent group for implementation and still be understood?

      2. Are functional requirements separated from non-functional?

    3. Completeness

      1. Are all the inputs to the system specified, including their source, accuracy, range of values, and frequency?

      2. Are all the outputs from the system specified, including their destination, accuracy, range of values, frequency, and format?

      3. Are all the communication interfaces specified, including handshaking, error checking, and communication protocols?

      4. Where information isn't available before development begins, are the areas of incompleteness specified?

      5. Are the requirements complete in the sense that if a product satisfies every requirement, it will be acceptable?

      6. Is there a sense of unease about any part of the requirements? Are some parts impossible to implement and included just to please a customer or a boss?

      7. Has a complete risk analysis been done with special emphasis on any missing requirements?

    4. Consistency

      1. Do the requirements avoid specifying the design?

      2. Are the requirements at a fairly consistent level? Should any requirement be specified in more detail? Should any requirement be specified in less detail?

    5. Ranking for importance and/or stability

      1. Is each item relevant to the problem and its solution?

    6. Verifiability

      1. Are the requirements written in a language and with a vocabulary the user understands? Do the users agree?

      2. Is each requirement testable? Will it be possible for independent testing to determine whether each requirement has been satisfied?

    7. Modifiability

      1. Are all possible changes to the requirements specified including the likelihood of each change?

      2. Is the maintainability of the system specified, including the ability to respond to changes in the operating environment, interfaces with other software, accuracy, performance, and additional predicted capabilities?

    8. Traceability

      1. Do all the requirements avoid conflicts with other requirements?

      2. Can each item be traced to its origin in the problem environment?












       < Free Open Study > 



      Section 5.7.&nbsp; Branching, Tagging, and Merging









      5.7. Branching, Tagging, and Merging


      As I explained in earlier chapters, most version control systems allow you to create branches and tags from the main trunk of your repository. Tags allow you to mark specific points in your development (such as releases) for later referral, and branches allow you to split the development of your repository and continue development in parallel on a secondary path. Unlike many other version control systems, Subversion doesn't actually have a built-in concept of branches or tags. Instead, it just uses cheap server-side copies that preserve history, while not taking up any additional space. Subversion just marks the location of the copied file or directory and notes where it came from. It then allows you to merge the changes from a "branch" back into the main trunk of the repository at a later date. Although the current implementation of Subversion's copy-merge system does pose the occasional problemfor instance, you can quickly lose track of a file's full merge/branch history if it becomes more than trivially complexthe flexibility of the paradigm more than makes up for its shortcomings in most cases.



      5.7.1. Creating a Branch or Tag


      Creating a branch or tag in Subversion is trivially easy. All you need to do is run svn copy on the file or directory that you want to branch/tag. Generally, your repository will be set up with one or more directories named branches and tags, which will be where you want to place copies that are conceptually branches or tags. This is a convention, however, and is in no way enforced by Subversion. As far as Subversion is concerned, directories named branches or tags are identical to any other directories.


      A typical repository layout is to have three top-level directories in the repository, named trunk, branches, and tags. The trunk directory contains the main development branch of the repository, and all branches and tags are created by putting copies in their respective directories. By separating the three directories at the top level, you can then check out only the TRunk directory, and deal with branches and tags only in the repository most of the time. When you need to use or edit parts of the repository that are in branches, you can use svn switch to swap them into your working copy of the TRunk directory.



      Creating a Branch/Tag in the Repository

      The most efficient way to branch or tag a part of the repository is usually to perform the copy entirely in the remote repository. When Subversion performs a copy in the repository, it doesn't make a copy of the data, but instead just points the copy back to the data from the original. This allows copies of any sized file to occur in a constant amount of time, be it five kilobytes or five hundred megabytes. On the other hand, if the copy is performed in the local working copy, the repository-side copy still occurs in constant time, but you also make a copy of the data in the working copy, which is proportional to the amount of data to be copied.


      A repository-only copy is performed using svn copy with repository URLs for both source and destination. When the command is run, Subversion performs an immediate commit, which means that it opens an editor to allow you to enter a log message if you don't supply one on the command line. As an example, the following command shows how you might create a branch of your entire trunk in a typical repository.


      [View full width]

      $ svn cp --message "Created a new branch" http://svn.example.com/repos/trunk http://svn
      .example.com/repos/branches/mybranch
      Committed revision 2353.




      Creating a Branch/Tag from the Working Copy

      Sometimes, creating a branch or tag entirely on the repository is impractical. For example, you might want to create a tag or branch that includes uncommitted local modifications, or consists of multiple mixed revisions. In such cases, you need to pass svn copy a working copy path as a source. You can also pass a working copy path as the destination of the copy, but that is generally not what you want, because that would result in the data being copied on your local copy. Instead, if you make sure the destination is still a URL, Subversion only sends the local changes to the repository and all other copies are done as the usual cheap repository-side copies.


      [View full width]

      $ svn cp --message "Tagging a snapshot of the local working copy" ~/repos/trunk http://svn
      .example.com/repos/tags/mytag
      Committed revision 2193.




      Switching to the Branch/Tag

      Regardless of how you create a branch or tag, the best way to edit it locally is usually to use svn switch to change the URL that all or part of your working copy points to. If you have a working copy of your trunk already checked out, you will generally save a lot of bandwidth by using the switch command, because it only downloads the differences necessary to switch the URL. The other advantage of svn switch is that it allows you to branch only a portion of your trunk, and then switch that portion in your working copy to the branch without invalidating any relative paths to the rest of your working copy.





      5.7.2. Merging a Branch


      As you work with a branch, you may periodically want to update it with changes from the main trunk, in order to get changes made by others. You will also usually want to merge the changes that you make in the branch back into the main trunk after they have reached a stable point. Both of these situations are handled with the svn merge command.


      The svn merge command is conceptually the hardest Subversion command to deal with, but after you understand how it works, using it is not that complicated. The most basic usage of svn merge is to run it with two revision numbers and a source, as in this example:



      $ svn merge --revision 100:150 http://svn.example.com/repos/branches/mybranch
      U goodbye.cpp
      A hello.cpp


      In this example, svn merge takes the differences between revisions 100 and 150 in the mybranch branch and applies them to the current working copy. The application of the changes occurs just as if you had done an svn update, and any conflicts are handled accordingly.


      Another way to run svn merge is to give it two URLs or working copy paths, with revision numbers, and perform a merge of their differences.



      $ svn merge branches/mybranch@1426 branches/myotherbranch@253
      D goodbye.cpp
      U hello.cpp


      As you can see, in the preceding example, there is no requirement that the first revision number should be lower than the second. In fact, regardless of which revision is lower, Subversion will always merge the differences calculated by subtracting the lefthand revision from the righthand revision. So, if a file exists in the righthand revision and not in the lefthand revision, it will be added by the merge. Conversely, if it exists in the lefthand revision but not the righthand, it will be removed.


      You will also notice in the last example that I used peg revisions to identify which revisions should be used to identify the files. Peg revisions in a merge work the same as they did for svn diff (see Section 5.4.1, "Getting Information on the Current State").



      Keeping Track of Merges

      One of the things that Subversion handles poorly is a file or directory's merge history. Subversion does not keep a record of what has been or hasn't been merged. Because of this, it is important for you to keep your own merge history, in order to avoid merging a change twice (which could cause that change to be undone). The easiest way to keep track of your merge history is to record in the commit log what files/revisions were merged.



      $ svn merge --revision 305:356 branches/mybranch
      U Makefile
      $ svn commit --message "Merged in revisions 305 to 356 of mybranch"
      Committed revision 1398.


      Then, the next time you want to merge the changes from mybranch into trunk, you can use svn log with grep to quickly see which revisions have already been merged in.



      $ svn log | grep 'Merged'
      Merged in revisions 305 to 356 of mybranch
      Merged in revisions 284 to 305 of mybranch
      Merged in revisions 246 to 320 of myotherbranch
      $ svn merge --revision 356:423 branches/mybranch




      Reverting Changes with Merge

      Another use for the merge command is to revert changes that were applied in a previous revision. Say, for instance, that you removed a couple of functions (named doFoo() and getBar()) from your source file, but now realize that you actually need them. Assuming that they were both removed in discrete commits (i.e., nothing else was changed in the source file at the same time those functions were removed), merging them back into the HEAD revision is quite simple.


      First, you'll want to check the logs to find out which revisions the function removals actually took place in. If the file is an active one, with a long log, you might want to use the grep command to pare the output down to something a little more manageable. As an example, the following command will search for any lines containing the word "removed" as well as lines that begin with "r" and a number (most likely log entry headers).



      $ svn log | grep -E 'removed|^r[0-9]'
      r15 | bill | 2004-06-14 19:44:06 -0500 (Sat, 14 Jun 2004) | 98 lines
      r76 | bill | 2004-07-17 12:35:24 -0500 (Sat, 17 Jul 2004) | 32 lines
      removed doFoo()
      r85 | bill | 2004-07-23 10:23:19 -0500 (Fri, 23 Jul 2004) | 15 lines
      r97 | bill | 2003-08-07 15:54:29 -0500 (Thu, 05 Aug 2004) | 145 lines
      removed getBar()


      After you know which revisions the functions were removed in, you can revert the removals by performing a merge with a range that goes backwards, from the revision where the removal took place, to the revision immediately preceding the removal. This will merge the removed sections back into your current working copy, where you can make any necessary modifications and then commit.



      $ svn merge --revision 76:75 foobar.c
      U foobar.c
      $ svn merge --revision 97:96 foobar.c
      U foobar.c
      $ svn status
      M foobar.c
      $ svn commit --message "Reverted changes from revs 76 & 97 to foobar.c"
      Sending foobar.c
      Transmitting file data .
      Committed revision 245.




      Looking before You Merge

      Merges can cause a lot of changes to be applied to the files in your working copy, and undoing those changes can be difficult if there are a lot of local changes. It can be helpful, then, to find out exactly which files Subversion will change (as well as what conflicts will occur) before actually performing the merge. Subversion allows you to do just that with the --dry-run option. When svn merge is invoked with --dry-run, Subversion will perform all of the necessary calculations for the merge and output a list of files that will be modified (as well as how those files will be modified), but it will not actually change any of your local files.











        15.2 Requirements and Qualities



        [ Team LiB ]









        15.2 Requirements and Qualities


        For new products to be derived from an organizational repository, they must be structured so that they can share modules. As we discussed in Chapter 14, this means that there must be a standard set of modules, with agreements about individual module's responsibility, behavior, performance, interface, locality of function, communication and coordination mechanisms, and other properties. This familywide structure, the modules it comprises, and the properties about each that are constant across all members of the product line constitute the product line architecture.


        As we have seen throughout this book, the primary purpose of an architecture is to acheive a system that meets its behavioral and quality requirements. The architecture for each SS2000 product line member was no exception. The most important of these requirements were:



        • Performance.
          Command-and-control systems must respond in real time to continuously arriving sensor inputs and be able to control weapons under tight deadlines.



        • Modifiability.
          The architecture needs to be robust with respect to computing platforms, operating systems, addition or replacement of sensor and weapon systems, human朿omputer interface requirements, communication protocols, and the like.



        • Safety, reliability, and availability.
          The system must be available when needed, provide the correct data and commands to weapon systems, and fire only under the correct conditions.



        • Testability.
          Each system must be integrable and testable so that errors (if any) are quickly found, isolated, and corrected.




        Besides these single-system requirements, the SS2000 architecture carried the additional burden of application to an entire class of systems. Thus its requirements included the ability to replace one module with another tailored to a particular system without disrupting the rest of the architecture.


        OPERATING ENVIRONMENT AND PHYSICAL ARCHITECTURE


        The requirements of modern shipboard systems influence design solutions in profound ways. Sensors and weapons systems are deployed all over the ship; crew members interact with them via a multitude of separately housed workstations. The HCI must be highly tuned to facilitate rapid information flow and command acceptance and must be tailored to the operational mission of the vessel and the cultural idiosyncrasies of its crew. The likelihood of component failure dictates a fault-tolerant design.


        Figure 15.12 is a physical view of a typical system. A redundant LAN is the communications backbone, connecting from 30 to 70 different, cooperating processors. Nodes on the LAN can total around 30. A node is the end of a communication run and may correspond to a crew station, a weapons platform, or a sensor suite, all located in various parts of the ship and widely dispersed. It may host up to six processors. The LAN is a dual Ethernet. Device-interface modules send and receive data to and from the system's peripherals, primarily sensors, and the weapons systems being controlled.


        Figure 15.12. Typical physical architecture of an SS2000 product









          [ Team LiB ]



          Mixed Types as Parameters




          I l@ve RuBoard


          Mixed Types as Parameters


          Classes Complex and Rational are good examples of programmer-defined types that emulate the properties of built-in C++ numeric types. The objects of these types can be operated on by using the same set of operators as that used for an ordinary numeric variable. This supports the C++ goal of treating programmer-defined types in the same way as C++ built-in types are.



          This analogy, however, is not complete. You can apply a number of operators to variables of numeric types that you cannot apply to Complex or Rational objects. Examples of such operators are modulo division, bitwise logical operators, and shifts. Of course, you can overload these operators similar to arithmetic and comparison operators, but the meaning of these operators (whatever meaning you decide to implement, the compiler will go along) will not be intuitively clear to the client programmer and to maintainer. An example of such arbitrary assignment of meaning is Complex::operator+() that I implemented to display the values of the data members of the Complex object. Intuitively, it is not clear at all what the expression +x should do to the Complex variable x.



          Another problem with treating the objects of built-in types and programmer-defined types equally is the problem of implicit type conversions. C++ supports type conversions without reservation. For example, these expressions are syntactically and semantically correct for any numeric types.





          c += b; // ok for b and c of any built-in numeric types
          c += 4; // ok for c of any built-in numeric type



          Whatever the numeric type of variable b, it is implicitly converted to the numeric type of variable c; whatever the numeric type of variable c, integer 4 is implicitly converted to that type. If variables b and c in the code snippet above are of type Rational, the second line above is in error. For this line to be syntactically correct, one of these functions should be available in the scope of the client code.





          void Rational::operator+=(int x); // c+=4; is c.operator+=(4);
          void operator+=(Rational &r, int x); // c+=4; is operator+=(c,4);



          None of these functions is implemented in Listing 10.5, and that results in the syntax error in the client code. This is an example of the member function that eliminates the error.





          void Rational::operator += (int x) // target object changes
          { nmr = nmr + x * dnm; // n1/d1 + n = (n1+n*d1)/d1
          this->normalize(); }



          Notice that if both functions are available, a member function and a global function with these interfaces, the second line in the code snippet above is still in error. This time around it is an ambiguity of the function call. Since either of these functions can foot the bill (the bill here is the interpretation of the statement c += 4;) the compiler does not know which function to call.



          However, if variables b and c in the code snippet above are of type Complex, both lines in the snippet are syntactically correct. Why? Because I implemented two versions of operator+=() in Listing 10.4.





          void Complex::operator += (const Complex &b);
          void Complex::operator += (int b);



          In the client code, the first function is called for the first line of code, and the second function is called for the second line of code.





          c += b; // c.Complex::operator+=(b); Complex argument
          c += 4; // c.Complex::operator+=(4); integer argument



          This resolves the problem of using operands of mixed types in expressions. The second overloaded operator function works not only for integer arguments, but also for characters, short integers, long integers, floating point, and double floating point arguments. According to the rules of argument conversion, a value of each of these built-in types can be converted to an integer. There is no need to overload the function operator +=() for each of these built-in types. One function would suffice.



          But do not sigh with relief yet. What about other operators such as -=, *=, /=? Each of these operators requires yet another overloaded operator function with a numeric parameter. And what about other arithmetic operator functions such as operator+(), operator-(), operator*(), and operator/()? Consider the following snippet for object instances of class Rational.





          c = a + b; // c = a.operator+(b);
          c = a + 5; // ?? incompatible types ??



          Again, the second line results in a syntax error because the overloaded operator expects an object of the type Rational as the actual argument, not a value of a built-in numeric type. Meanwhile, all of these expressions are not a product of inflamed imagination. Numeric values are mixed with complex numbers and rational numbers in algorithms. And what about comparisons? You should be able to compare Rational objects with integers, and this poses yet additional problems.



          The solution that I used so far is legitimate but boring. For each operator function with a Rational (or any other class) object as an argument, I have to write yet another operator function with a long integer value as an argument. (An integer might not be sufficient on 16-bit machines.)



          Can anything be done about that? And here C++ offers you a beautiful tool that allows you to get away with only one set of operator functions (with class parameter) and force these operator functions to accept actual arguments of built-in numeric types.



          What is this tool? It is a tool that allows you to cast a numeric value to a value of the class. Let us start with a very simple example.





          Rational c = 5; // incompatible types ??



          It goes without saying that this line is in error. In Chapter 3, "Working with C++ Data and Expressions,"
          I discussed the concept of the cast that converts the value of one built-in numeric type to the value of another built-in numeric type. Of course, these casts are available only between built-in types, not between built-in types and the programmer-defined type Rational. But if a cast between built-in types and type Rational existed, how would it look? Its syntax would be the same as for numeric types: the type name in parentheses. And the type name would be the name of the type to which the value is converted.





          Rational c = (Rational)5; // this is how the cast should look like



          If you remember, in Chapter 3 you saw two syntactic forms for the cast, one that comes from C (the form I used in the line above) and another is the C++ function-like cast.





          Rational c = Rational(5); // this is how the cast could look like



          Doesn't this line look like a constructor call? Now, what do you call the function that produces the value of the class type? Don't you call it a constructor? So this function looks like a constructor and behaves like a constructor. The conclusion is that it is a constructor.



          Next question梬hat constructor? This is simple. In Chapter 9, we called a constructor with one parameter of nonclass type a conversion constructor. Now you should understand why this name is used. This constructor converts a value of its parameter type into the value of the class type. To make the line above syntactically correct, you have to write a constructor with one parameter.



          What should this constructor do with its single parameter? If the value of the parameter is, say, 5, the value of the Rational object should be set to 5 or to 5/1. If the value of the parameter is 7, the value of the object should be set to 7/1. Hence, the value of the parameter should be used to initialize the numerator, and the denominator should be set to 1 for any value of the actual argument. This results in the following constructor.





          Rational::Rational(long n) // conversion constructor
          { nmr = n; dnm = 1; } // initialize to a whole number



          This constructor is called every time that a function that expects a Rational parameter is called with a numeric actual argument. The class Rational now should look this way.





          class Rational {
          long nmr, dnm; // private data
          void normalize(); // private member function
          public:
          Rational() // default constructor: zero value 0/1
          { nmr = 0; dnm = 1; }
          Rational(long n) // conversion constructor: whole value n/1
          { nmr = n; dnm = 1; }
          Rational(long n, long d) // general constructor: fraction as n/d
          { nmr = n; dnm = d;
          this->normalize(); }
          Rational operator + (const Rational &x) const
          { return Rational(nmr*x.dnm + x.nmr*dnm, dnm*x.dnm); }
          // THE REST OF CLASS Rational
          } ;



          Some programmers dislike writing several constructors if one constructor with default parameters can do the job. A popular constructor that can be used as general constructor, conversion constructor, and default constructor will look this way.





          Rational(long n=0, long d=1) // general, conversion, default constructor
          { nmr = n; dnm = d;
          this->normalize(); }



          Make sure that you see that this constructor is called when the client code supplies two arguments for object initialization: one argument and no arguments; default values are used instead of missing arguments when defining Rational objects.





          Rational a(1,4); // Rational a = Rational(1,4); - two arguments
          Rational b(2); // Rational b = Rational(2,1); - one argument
          Rational c; // Rational c = Rational(0,1); - no arguments



          Notice that the actual arguments supplied in this example are of type int but the constructor expects arguments of type long. This is not a problem梚mplicit built-in conversion from int to long is available by default as it is available between all built-in numeric types. In function calls, the compiler allows not more than one built-in conversion and not more than on class-defined conversion (a conversion constructor call).



          In compiling expressions with Rational operands, the compiler converts the int arguments first to long and then to Rational; after this conversion, the compiler generates the call to the appropriate operator.





          c = a.operator+(Rational((long)5)); // real meaning of c = a + 5;



          Now the client code above compiles without the operator Rational::operator+(long). A temporary Rational object is created, the conversion constructor is then called, then the operator+(), and then the Rational destructor.



          Now you can write client code with numeric values as the second operand, while the first operand is of type Rational.





          int main()
          { Rational a(1,4), b(3,2), c, d;
          c = a + 5; // c = a.operator+(Rational((long)5));
          d = b - 1; // d = b.operator-(Rational((long)1));
          c = a * 7; // c = a.operator*(Rational((long)7));
          d = b / 2; // d = b.operator/(Rational((long)2));
          c += 3; // c.operator+=(Rational((long)3));
          d *= 2; // d.operator*=(Rational((long)2));
          if (b < 2) // if (b.operator<(Rational((long)2))
          cout << "Everything works\n";
          return 0; }




          Listing 10.6 shows a new version of class Rational that supports mixed types in binary expressions. The output of the program run is shown in Figure 10-6.





          Figure 10-6. Output for program in Listing 10.6.










          Example 10.6. Class Rational that supports mixed types in expressions.


          #include <iostream>
          using namespace std;

          class Rational {
          long nmr, dnm; // private data
          void normalize(); // private member function
          public:
          Rational(long n=0, long d=1) // general, conversion, default
          { nmr = n; dnm = d;
          this->normalize(); }
          Rational operator + (const Rational &x) const; // const target
          Rational operator - (const Rational &x) const;
          Rational operator * (const Rational &x) const;
          Rational operator / (const Rational &x) const;
          void operator += (const Rational &x); // target changes
          void operator -= (const Rational &x);
          void operator *= (const Rational &x);
          void operator /= (const Rational &x);

          bool operator == (const Rational &other) const; // const target
          bool operator < (const Rational &other) const;
          bool operator > (const Rational &other) const;
          void show() const;
          } ; // end of class specification

          Rational Rational::operator + (const Rational &x) const
          { return Rational(nmr*x.dnm + x.nmr*dnm, dnm*x.dnm); }
          Rational Rational::operator - (const Rational &x) const
          { return Rational(nmr*x.dnm - x.nmr*dnm, dnm*x.dnm); }

          Rational Rational::operator * (const Rational &x) const
          { return Rational(nmr * x.nmr, dnm * x.dnm); }

          Rational Rational::operator / (const Rational &x) const
          { return Rational(nmr * x.dnm, dnm * x.nmr); }

          void Rational::operator += (const Rational &x)
          { nmr = nmr * x.dnm + x.nmr * dnm; // 3/8+3/2=(6+24)/16=15/8
          dnm = dnm * x.dnm; // n1/d1+n2/d2 = (n1*d2+n2*d1)/(d1*d2)
          this->normalize(); }

          void Rational::operator -= (const Rational &x)
          { nmr = nmr * x.dnm - x.nmr * dnm; // 3/8+3/2=(6+24)/16=15/8
          dnm = dnm * x.dnm; // n1/d1+n2/d2 = (n1*d2-n2*d1)/(d1*d2)
          this->normalize(); }

          void Rational::operator *= (const Rational &x)
          { nmr = nmr * x.nmr; dnm = dnm * x.dnm;
          this->normalize(); }

          void Rational::operator /= (const Rational &x)
          { nmr = nmr * x.dnm; dnm = dnm * x.nmr;
          this->normalize(); }

          bool Rational::operator == (const Rational &other) const
          { return (nmr * other.dnm == dnm * other.nmr); }

          bool Rational::operator < (const Rational &other) const
          { return (nmr * other.dnm < dnm * other.nmr); }

          bool Rational::operator > (const Rational &other) const
          { return (nmr * other.dnm > dnm * other.nmr); }

          void Rational::show() const
          { cout << " " << nmr << "/" << dnm; }

          void Rational::normalize() // private member function
          { if (nmr == 0) { dnm = 1; return; }
          int sign = 1;
          if (nmr < 0) { sign = -1; nmr = -nmr; }
          if (dnm < 0) { sign = -sign; dnm = -dnm; }
          long gcd = nmr, value = dnm; // greatest common divisor
          while (value != gcd) { // stop when the GCD is found
          if (gcd > value)
          gcd = gcd - value; // subtract smaller number from greater
          else value = value - gcd; }
          nmr = sign * (nmr/gcd); dnm = dnm/gcd; } // denominator is positive

          int main()
          { cout << endl << endl;
          Rational a(1,4), b(3,2), c, d;
          c = a + 5; // I'll discuss c = 5 + a; later
          a.show(); cout << " + " << 5 << " ="; c.show(); cout << endl;
          d = b - 1;
          b.show(); cout << " - " << 1 << " ="; d.show(); cout << endl;
          c = a * 7;
          a.show(); cout << " * " << 7 << " ="; c.show(); cout << endl;
          d = b / 2;
          b.show(); cout << " / " << 2 << " ="; d.show(); cout << endl;
          c.show();
          c += 3;
          cout << " += " << 3 << " ="; c.show(); cout << endl;
          d.show();
          d *= 2;
          cout << " *= " << 2 << " ="; d.show(); cout << endl;
          if (b < 2)
          { b.show(); cout << " < " << 2 << endl; }
          return 0;
          }


          Remember that the conversions to type Rational, however implicit (silent), are function calls to the conversion constructor. When the function terminates, the temporary object created for this conversion is destroyed with the call to the destructor. (For this class, it is a default destructor supplied by the compiler.) Remember that story about two functions here and two functions there? (Actually, the story was about two dollars here and two dollars there.) Hence, this version of class Rational is somewhat slower than the version that does not rely on argument conversions and provides a separate overloaded operator for each type of the argument.



          This implicit use of conversion constructors is supported not only for overloaded operators, but also for any function, member function, and global function that has object parameters. As I mentioned in Chapter 9, conversion constructors deal a blow to the C++ system of strong typing. If intentionally you use a numeric value instead of an object, fine. If you use it by mistake, the compiler does not tell you that you are making a mistake.



          C++ offers a wonderful technique for preventing errors and for forcing the designer of client code to tell the maintainer what is going on. This technique consists of using the keyword explicit with the constructor.





          explicit Rational(long n=0, long d=1) // cannot be called implicitly
          { nmr = n; dnm = d;
          this->normalize(); }



          By declaring a constructor explicit, you make any implicit call to this constructor a syntax error.





          Rational a(1,4), b(3,2), c, d;
          c = a + 5; // syntax error: implicit call
          c = a + Rational(5); // ok: explicit call
          d = b - 1; // syntax error: implicit call
          d = b - (Rational)1; // ok: explicit call
          if (b < 1) // syntax error: implicit call
          if (b < Rational(2)) // ok: explicit call
          cout << "Everything is fine\n";



          This is a very good idea because it gives the class designer a better control over the way the class objects are used by the client programmer.



          Programmer-defined classes like Complex and Rational do indeed have to emulate the behavior of built-in numeric types as much as possible. Using numeric variables instead of objects in expressions is not an error but a legitimate technique to implement computational algorithms. In the code snippet above, I marked some lines as ok and other lines as syntax error. Given a choice, every programmer would prefer to write code like in the lines marked as syntax errors. The need to spell out the casts every time a numeric operand is used is an imposition on the client programmer and results in code that is less aesthetically pleasing.



          From that point of view, the use of the keyword explicit for the constructors of classes like Complex and Rational is probably overkill.



          NOTE



          Do not use the keyword
          explicit
          for constructors of numeric classes that implement overloaded operator functions. Utilize it for classes where using a built-in argument instead of the class object in a function call is an error, not a legitimate way of using the class.









          I l@ve RuBoard

          URL Abbreviation with mod_rewrite




          [ Team LiB ]









          URL Abbreviation with mod_rewrite


          URL abbreviation is one of the most effective techniques you can use to optimize your HTML. First seen on Yahoo!'s home page, URL abbreviation substitutes short redirect URLs (like r/ci) for longer ones (like Computers_and_Internet/) to save space. The Apache and IIS web servers, and Manila (http://www.userland.com) and Zope (http://www.zope.org) all support this technique. In Apache, the mod_rewrite module transparently handles URL expansion. For IIS, ISAPI filters handle URL rewrites. Here are some IIS rewriting filters:



          • ISAPI_Rewrite (http://www.isapirewrite.com/)


          • OpCode's OpURL (http://www.opcode.co.uk/components/rewrite.asp)


          • Qwerksoft's IISRewrite based on mod_rewrite (http://www.qwerksoft.com/products/iisrewrite/)




          Cocoon


          Many of these fancy browser detection and performance techniques can be handled elegantly with Cocoon. Cocoon is an open source web-application platform based on XML and XSLT. Actually a big Java servlet, Cocoon runs on most servlet engines. Cocoon can automatically transform XML into (X)HTML and other formats using XSLT. You can set up a style sheet and an output format for certain types of browsers (WAP, DOM, etc.), and Cocoon does the rest. It makes the difficult task of separating content, layout, and logic trivial. Cocoon can handle the following:



          • Server-side programming


          • URL rewriting


          • Browser detection


          • PDF, legacy file formats, and image generation


          • Server-side compression



          You can read more about Cocoon at http://xml.apache.org/cocoon/.



          URL abbreviation is especially effective for home or index pages, which typically have a lot of links. As you will discover in Chapter 19, "Case Studies: Yahoo.com and WebReference.com," URL abbreviation can save anywhere from 20 to 30 percent off of your HTML file size. The more links you have, the more you'll save.


          NOTE



          As with most of these techniques, there's always a tradeoff. Using abbreviated URLs can lower search engine relevance, although you can alleviate this somewhat with clever expansions with mod_rewrite.




          The popular Apache web server[9] has an optional module, mod_rewrite, that enables your server to automatically rewrite URLs.[10] Created by Ralf Engelschall, this versatile module has been called the "Swiss Army knife of URL manipulation."[11] mod_rewrite can handle everything from URL layout, load balancing, to access restriction. We'll be using only a small portion of this module's power by substituting expanded URLs with regular expressions.


          [9] Netcraft, Ltd, "Netcraft Web Server Survey" [online], (Bath, UK: Netcraft, Ltd., October 2002 [cited 13 November 2002]), available from the Internet at http://www.netcraft.com/survey/. For active sites, over 65 percent use Apache.


          [10] Ralf S. Engelschall, "Apache 1.3 URL Rewriting Guide" [online], (Forest Hill, MD: Apache Software Foundation, 1997 [cited 13 November 2002]), available from the Internet at http://httpd.apache.org/docs/misc/rewriteguide.html. Ralf Engelschall is the author of mod_rewrite, the module used for URL rewriting. See also http://www.engelschall.com/pw/apache/rewriteguide/.


          [11] Apache Software Foundation, "Apache Module mod_rewrite" [online], (Forest Hill, MD: Apache Software Foundation, 2001 [cited 13 November 2002]), available from the Internet at http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html. An URL rewriting engine.


          The module first examines each requested URL. If it matches one of the patterns you specify, the URL is rewritten according to the rule conditions you set. Essentially, mod_rewrite replaces one URL with another, allowing abbreviations and redirects.


          This URL rewriting machine manipulates URLs based on various tests including environment variables, time stamps, and even database lookups. You can have up to 50 global rewrite rules without any discernible effect on server performance.[12] Abbreviated URI expansion requires only one.


          [12] Ralf S. Engelschall and Christian Reiber, "URL Manipulation with Apache" Heise Online [online], (Hanover, Germany: Heise Zeitschriften Verlag GmbH, 2001), available from the Internet at http://www.heise.de/ix/artikel/E/1996/12/149/. English translation.


          Tuning mod_rewrite


          To install mod_rewrite on your Apache Web server, you or your IT department needs to edit one of your server configuration files. The best way to run mod_rewrite is through the httpd.conf file, as this is accessed once per server restart. Without configuration file access, you'll have to use .htaccess for each directory. Keep in mind that the same mod_include performance caveats apply; .htaccess files are slower as each directory must be traversed to read each .htaccess file for each requested URL.



          The Abbreviation Challenge


          The strategy is to abbreviate the longest, most frequently accessed URLs with the shortest abbreviations. Most webmasters choose one, two, or three-letter abbreviations for directories. On WebReference.com, the goal was to create a mod_rewrite rule that would expand URLs like this:



          r/d

          Into this:



          dhtml/

          Like Yahoo!, r is the flag we've chosen for redirects. But why stop there? We can extend this concept into more directories. So turn this:



          r/dc

          Into this:



          dhtml/column

          and so on. Note that the lack of a trailing forward slash in this second example allows us to intelligently append column numbers.


          With the right RewriteRule, the abbreviation of /r/c/66 expands into the string /dhtml/column66/.



          The RewriteRule Solution


          To accomplish this expansion, you need to write a RewriteRule regular expression. First, you need to find the URI pattern /r/d, and then extract /d and turn it into /dhtml. Next, append a trailing slash.


          Apache requires two directives to turn on and configure the mod_rewrite module: the RewriteEngine Boolean and the RewriteRule directive. The RewriteRule is a regular expression that transforms one URI into another. The syntax is shown here:



          RewriteRule <pattern> <rewrite as>

          So to create a RewriteRule to solve this problem, you need to add the following two mod_rewrite directives to your server configuration file (httpd.conf or .htaccess):



          RewriteEngine On
          RewriteRule ^/r/d(.*) /dhtml$1

          This regular expression matches a URL that begins with /r/ (the ^ character at the beginning means to match from the beginning of the string). Following that pattern is d(.*), which matches one or more characters after the d. Note that using /r/dodo would expand to /dhtmlodo, so you'll have to make sure anything after r/d always includes a /.


          So when a request comes in for the URI <a href="/r/d/diner/">DHTML Diner</a>, this rule expands this abbreviated URI into <a href="/dhtml/diner/">DHTML Diner</a>.



          The RewriteMap Solution for Multiple Abbreviations


          The RewriteRule solution would work well for a few abbreviations, but what if you want to abbreviate a large number of links? That's where the RewriteMap directive comes in. This feature allows you to group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here's an example map file at (/www/misc/redir/abbr_webref.txt):



          d dhtml/
          dc dhtml/column
          pg programming/
          h html/
          ht html/tools/

          The MapName specifies a mapping function between keys and values for a rewriting rule using the following syntax:



          ${ MapName : LookupKey | DefaultValue }

          When you are using a mapping construct, you generalize the RewriteRule regular expression. Instead of a hard-coded value, the MapName is consulted, and the LookupKey accessed. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there is no match, the rule substitutes the default value or a blank string.


          To use this external map file, we'll add the RewriteMap directive and tweak the regular expression correspondingly. The following httpd.conf commands turn rewriting on, show where to look for your rewrite map, and show the definition of the RewriteRule:



          RewriteEngine On
          RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt
          RewriteRule ^/r/([^/]*)/?(.*) ${abbr:$1}$2 [redirect=permanent,last]

          The first directive turns on rewrites as before. The second points the rewrite module to the text version of our map file. The third tells the processor to look up the value of the matching expression in the map file. Note that the RewriteRule has a permanent redirect (301 instead of 302) and last flags appended to it. Once an abbreviation is found for this URL, no further rewrite rules are processed for it, which speeds up lookups.


          Here we've set the rewrite MapName to abbr and the map file location (text format) to the following:



          /www/misc/redir/abbr_webref.txt

          The RewriteRule processes requested URLs using the regular expression:



          ^/r/([^/]*)/?(.*) ${abbr:$1}$2

          This regular expression matches an URL that begins with /r/. (The ^ character at the beginning means to match from the beginning of the string.) Then the regular expression ([^/]*) matches as many non-slash characters it can to the end of the string. This effectively pulls out the first string between two slashes following the /r. For example, in the URL /r/pg/javascript/, this portion of the regular expression matches pg. It also will match ht in /r/ht. (Because there are no slashes following, it just continues until it reaches the end of the URL.)


          The rest of the pattern /?(.*) matches 0 or 1 forward slashes / with any characters that follow. These two parenthesized expressions will be used in the replacement pattern.


          The Replacement Pattern

          The substitution (${abbr:$1}$2) is the replacement pattern that will be used in the building of the new URL. The $1 and $2 variables refer back (backreferences) to the first and second patterns found in the supplied URL. They represent the first set of parentheses and the second set of parentheses in the regular expression, respectively. Thus for /r/pg/javascript/, $1 = "pg" and $2 = "javascript/". Replacing these in the example produces the following:



          ${abbr:pg}javascript/

          The ${abbr:pg} is a mapping directive that says, "Refer to the map abbr (recall our map command, RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt), look up the key pg, and return the corresponding data value for that key." In this case, that value is programming/. Thus the abbreviated URL, /r/pg/javascript, is replaced by the following:



          /programming/javascript/

          Voila! So you've effectively created an abbreviation expander using a regular expression and a mapping file. Using the preceding rewrite map file, the following URL expansions would occur:



          "r/dc" becomes "dhtml/column"
          "r/pg" becomes "programming/"

          The server, upon seeing a matching abbreviation in the map file, will automatically rewrite the URL to the longer value.


          But what happens if you have many keys in your RewriteMap file? Scanning a long text file every time a user clicks a link can slow down lookups. That's where binary hash files come in handy.



          Binary Hash RewriteMap

          For maximum speed, convert your text RewriteMap file into a binary *DBM hash file. This binary hash version of your key and value pairs is optimized for maximum lookup speed. Convert your text file with a DBM tool or the txt2dbm Perl script provided at http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html.


          NOTE



          Note that this example is specific to Apache on Unix. Your platform may vary.




          Next, change the RewriteMap directive to point to your optimized DBM hash file:



          RewriteMap abbr dbm:/www/misc/redir/abbr_webref

          That's the abbreviated version of how you set up link abbreviation on an Apache server. It is a bit of work, but once you've got your site hierarchy fixed, you can do this once and forget it. This technique saves space by allowing abbreviated URLs on the client side and shunting the longer actual URLs to the server. The delay using this technique is hardly noticeable. (If Yahoo! can do it, anyone can.) Done correctly, the rewriting can be transparent to the client. The abbreviated URL is requested, the server expands it, and serves back the content at the expanded location without telling the browser what it has done. You also can use the /r/ flag or the RewriteLog directive to track click-throughs in your server logs.


          This technique works well for sites that don't change very often: You would manually abbreviate your URIs to match your RewriteMap abbreviations stored on your server. But what about sites that are updated every day, or every hour, or every minute? Wouldn't it be nice if you could make the entire abbreviation process automatic? That's where the magic of Perl and cron jobs (or the Schedule Tasks GUI in Windows) comes in.




          Automatic URL Abbreviation


          You can create a Perl or shell script (insert your favorite CGI scripting language here) to look for URLs that match the lookup keys in your map file and automatically abbreviate your URLs. We use this technique on WebReference.com's home page. To make it easy for other developers to auto-abbreviate their URLs, we've created an open source script called shorturls.pl. It is available at http://www.webreference.com/scripts/.


          NOTE



          XSLT gives you another way to abbreviate URLs automatically. Just create the correct templates to abbreviate all the local links in your files.




          The shorturls.pl script allows you to abbreviate URLs automatically and exclude portions of your HTML code from optimization with simple XML tags (<NOABBREV> ...</NOABBREV>).


          Using this URL abbreviation technique, we saved over 20 percent (5KB) off our 24KB hand-optimized front page. We could have saved even more space, but for various reasons, we excluded some URLs from abbreviation.


          This gives you an idea of the link abbreviation process, but what about all the other areas of WebReference? Here is a truncated version of our abbreviation file to give you an idea of what it looks like (the full version is available at http://www.webreference.com/scripts/):



          b dlab/
          d dhtml/
          g graphics/
          h html/
          p perl/
          x xml/

          3c 3d/lesson
          dd dhtml/dynomat/
          ddd dhtml/dynomat/dialogs/
          dc dhtml/column
          ...
          i http://www.internet.com/
          ic http://www.internet.com/corporate/
          ...
          jsc http://www.javascript.com/
          jss http://www.javascriptsource.com/
          jsm http://www.justsmil.com/
          ...

          Note that we use two and three-letter abbreviations to represent longer URLs on WebReference.com. Yahoo! uses two-letter abbreviations throughout their home page. How brief you make your abbreviations depends on how many links you need to abbreviate, and how descriptive you want the URLs to be.


          The URL Abbreviation/Expansion Process: Step by Step

          In order to enable automatic link abbreviation (with shorturls.pl) and expansion (with mod_rewrite), do the following:










          1. Create an abbreviation map file (RewriteMap) with short abbreviations that correspond to frequently used and longer directories separated by tabs. For example:



            d dhtml/
            g graphics/
            dc dhtml/column
            gc graphics/column
            ...


          2. Add the following lines to your httpd.conf file to enable the mod_rewrite engine:



            RewriteEngine On
            RewriteMap abbr txt:/www/misc/redir/abbr_yrdomain.txt
            RewriteRule^/r/([^/]*)/?(.*) ${abbr:$1}$2 [redirect=permanent,last]


          3. Try some abbreviated URLs (type in /r/d, etc.). If they work, move on to step 4; otherwise, check your map and your rewrite directives. If all else fails, contact your system administrator.



          4. Convert your RewriteMap text file to a binary hash file. See http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html for the txt2dbm Perl script.



          5. Change the preceding RewriteMap directive to point to this optimized *DBM hash file:



            RewriteMap abbr dbm:/www/misc/redir/abbr_yrdomain


          6. Now your rewrite engine is set up. To automate URL abbreviation, point shorturls.pl to the text version of your RewriteMap file, input your home page template and output your home page, and schedule the job with cron on UNIX/Linux, or the Schedule Tasks GUI in Windows:



            echo "\nBuilding $YRPAGE from $YRTEMPLATE\n"
            /www/yrdomain/cgi-bin/shorturls.pl $YRTEMPLATE $YRPAGE


          That's it. Now any new content that appears on your home page will be automatically abbreviated according to the RewriteMap file that you created, listing the abbreviations you want.



          Use Short URLs


          You could name your directories using these short, cryptic abbreviations. Using descriptive names for directories and file names has advantages, however, in usability and search engine positioning. Using URL abbreviation, you can have the best of both worlds for high-traffic pages like home pages.


          For front page or frequently referenced objects like single-pixel GIFs, logos, navigation bars, and site-wide rollovers, however, you can use short URLs by placing them high in your site's file structure, and using short filenames. For example:



          /i.gif (internet.com logo)
          /t.gif (transparent single pixel gif)

          I've seen some folks carry the descriptive-names-at-all-cost idea to extremes. Here's a surreal-world example:



          transparent-single-pixel-gif1x1.gif (actual file name)

          Some search engine positioning firms sprinkle keywords wherever they are legal�and in some places where they're not. Again, it's a tradeoff. Bulking up your pages with keyword-filled alt values and object names may increase your rankings, but with the advent of backlink-based search engines like Google and Teoma, these practices are fading in effectiveness.


          You could even use content negotiation or your srm.conf file to abbreviate file type suffixes. This technique is pretty extreme, seldom used, but perfectly valid. Here's an example:



          i.g (.g = .gif, srm.conf directive of AddType image/gif g)
          i (content negotiation resolves to i.gif, could later use i.png)










            [ Team LiB ]



            Cryptographic Algorithm Usage










            Cryptographic Algorithm Usage


            This section focuses on how different algorithms should be approached in new and earlier code. The SDL requirements dictate that


            • New code uses only algorithms and key lengths from the rightmost column.

            • Algorithms listed in the middle column are to be used only for backward compatibility.

            • Algorithms and key lengths listed in the left column are not to be used in shipping products without an exception from the central security team.


            Using any cryptographic algorithms that are not listed in the middle or right-hand columns requires an exception from your central security team. Be aware that the United States federal government mandates the use of specific cryptographic algorithms (NIST 2005).




            Symmetric Block Ciphers and Key Lengths


            For symmetric block encryption algorithms, a minimum key length of 128 bits is required for new code (KeyLength 2006). The only block encryption algorithm recommended for new code is AES. (AES-128, AES-192, and AES-256 are all acceptable.) Two-key (112-bit) or three-key (168-bit) 3DES are currently acceptable if already in use in existing code. However, transitioning to AES is highly recommended. DES, DESX, RC2, and SKIPJACK are no longer considered secure; continued use of these algorithms should be for opt-in backward compatibility only.


            Best Practices

            For projects using symmetric block ciphers, AES is required for new code, and two- or three-key 3DES is permissible for backward compatibility. All other symmetric block cipher usage, including RC2, DES, DESX, and SKIPJACK, can be used only for decrypting old data.





            Symmetric Stream Ciphers and Key Lengths


            For symmetric stream ciphers, there is currently no recommended algorithmyou should use a block cipher, such as AES, with at least 128 bits of key material. Existing code that uses RC4 should be using a key size of at least 128 bits, and your application's use of RC4 should be reviewed by a cryptographer. This last point is very importantthere are numerous subtle errors that can arise when using stream ciphers such as RC4. Refer to the "References" section of this chapter for other material outlining some of the common errors.


            Best Practices

            The RC4 stream cipher should be used with extreme caution, and any use of the algorithm should be reviewed by a cryptographer.



            Best Practices

            All stream cipher usages must undergo a security review. RC4 with 128-bit length key or greater is permissible, but only after a security review. All other usage, including RC4 <128 bit key, is permissible only for decrypting old data.





            Symmetric Algorithm Modes


            Symmetric algorithms can operate in a number of modes, most of which link together the encryption operations on successive blocks of plaintext and ciphertext. The electronic code book (ECB) mode of operation should not be used without signoff from the central security team. Cipher-block-chaining (CBC) is the recommended mode of operation for block ciphers. If, for interoperability reasons, you believe that you need to use another chaining mode, you should talk to the security team.


            Best Practices

            Projects using symmetric encryption algorithms must use CBC.





            Asymmetric Algorithms and Key Lengths


            For RSA-based asymmetric encryption and digital signatures, the minimum acceptable key length is 1024 bits, and 1024-bit signature keys should be used only for signatures with validity periods of one year or less. New code should use RSA keys of at least 2048 bits in length.


            For DSA-based digital signatures, only 1024-bit keys should be used (the maximum allowed by the DSA standard) and then only for short-lived signatures (less than one year).


            For key exchange and digital signatures that are based on elliptic curve cryptography (ECC), the three NIST-approved curvesP-256, P-384, and P-521are all acceptable.


            For key agreement, Diffie-Hellman is recommended, with 2048-bit keys for new code and 1024-bit keys for backward compatibility. Keys of 512 bits or fewer are not to be used at all.


            Best Practices

            For projects using asymmetric algorithms, ECC with >=256-bit keys or RSA with >=2048-bit keys is required for new code. RSA with >=1024-bit keys is permissible for backward compatibility. RSA <1024-bit keys can be used only for decrypting old data. ECC-based key exchange and digital signatures must use one of the three NIST-approved curvesP-256, P-384, and P521 are all acceptable. For key agreement, Diffie-Hellman is recommended, with >=2048-bit keys for new code, >=1024-bit keys for backward compatibility, and no keys using <1024 bits.





            Hash Functions


            No new code should use the MD4 or MD5 hash algorithms because hash collisions have been demonstrated for both algorithms, which effectively "breaks" them in the eyes of the cryptographic community. Continued use of SHA-1 is permissible in existing code for backward compatibility purposes and, as described in the next Best Practices reader aid, for new code running on certain down-level platforms. The SHA-2 family of hash functions (SHA-256, SHA-384, or SHA-512) is currently the only group that is generally recommended. The SHA-2 hash functions are available in .NET code and in unmanaged Microsoft Win32 code targeting Windows Server 2003 SP1 and Windows Vista.


            Note that hash function agilitythe ability to switch to another hash function without updating your codeis part of the cryptographic agility requirement discussed earlier in this chapter. Absent a backward compatibility requirement, code that uses SHA-1 must migrate to SHA-2 once SHA-2 is available on the platform.


            Best Practices

            For .NET code, use of a SHA-2 hash function is required. For new native Win32 code shipping to Windows Server 2003 SP1 or Windows Vista, use of a SHA-2 hash function is required. For new native Win32 code shipping to earlier operating systems (including Windows 95, Windows 98, Microsoft Windows NT 4, and Windows 2000), use of SHA-1 is permitted. This exemption automatically expires if a service pack containing SHA-2 support ships on the platform in question. Continued use of SHA-1 is permissible for backward compatibility. All others hash functions, including MD2, MD4, and MD5, should not be used.





            Message Authentication Codes


            The most common and well-known message authentication code (MAC) function is the HMAC, which uses a hash function and secret MAC key for message authentication. It uses an underlying hash function (MD5, SHA-1, or SHA-2) and a secret key of a specified length. The strength of an HMAC relies on the strength of the underlying hash function and the length of the secret.


            Best Practices

            For HMAC usage, SHA-2 with >=128-bit keys is required for new code. SHA-1 with >=128-bit keys is permissible for backward compatibility. All other keys lengths <112 bits or hash functions, including MD2, MD4, or MD5, should not be used.














            Section 15.10. Java Network Security Restrictions










            [Page 752 (continued)]

            15.10. Java Network Security Restrictions


            One of the most attractive features of Java is the extensive effort to make it a secure language. This is especially important for a language that makes it so easy to implement networking applications. After all, nobody wants to download a Java applet that proceeds to erase the hard disk. Such an applet might be written by a cyber-terrorist deliberately aiming to cause severe damage, or it might be written by a cyber-doofus who inadvertently writes code that does severe damage.


            What are some of Java's techniques for guarding against deliberate or inadvertent insecure code? One level of security is Java's bytecode verification process, which the Java Virtual Machine performs on any "untrusted" code it receives. Java checks every class that it loads into memory to make sure it doesn't contain illegal or insecure code. Another line of defense is the so-called sandbox security model, which refers to the practice of restricting the kinds of things that certain programs can do. For example, the "sandbox" environment for Java applets restricts them from having any access whatsoever to the local file system.



            Code verification





            Limited privileges




            Another restriction imposed on applets limits their networking capabilities. For example, a Java applet cannot create a network connection to any computer except the one from which its code was downloaded. Also, a Java applet cannot listen for, or accept, connections on privileged portsthose numbered 1024 or lower. Together, these two restrictions severely limit the kinds of client/server programs that can be built as applets.



            Limited network access




            Java sets aside certain locations as repositories for trusted code. The Java class libraries are placed in such a location, as are the directories where your Java programs are stored. Any class loaded from some other directory is considered untrusted. By this definition, applets downloaded over the Internet would be considered untrusted code.


            In addition to the restrictions for applets, which apply to all untrusted code, Java defines a number of other limitations:


            • Untrusted code cannot make use of certain system facilities, such as System.exit() and classes in the java.security package.

            • Untrusted code cannot make use of certain AWT methods, such as methods that access the system clipboard. Another AWT restriction is that any window created by untrusted code must display a message informing the user that it is untrusted. You may have seen such messages on windows opened from applets.

            • Untrusted code is limited in the kinds of threads it can create.



            Trusted code




            JDK 1.2 introduced security enhancements based on the concepts of "permission" and "policy." Code is assigned "permissions" based on the security policy currently in effect. Each permission specifies the type of access allowed for a particular resource (such as "read" and "write" access to a specified file or directory, or "connect" access to a given host and port). The policy that controls permissions can be initialized from an external configurable policy file. Unless a permission is explicitly granted to code, it cannot access the resource guarded by the permission. These new enhancements offer a more fine-grained and extensible approach to security for both applets and applications.


            [Page 753]

            As this brief overview illustrates, the Java Virtual Machine is designed with security as one of its primary issues. This does not guarantee 100 percent security, but it is a big improvement over some of the languages and systems that preceded Java. Moreover, security is an ongoing concern of the Java development process. Flaws in the existing security system are fixed very quickly. Advanced methods are constantly being developed and incorporated into the system. One such enhancement is the use of encryption to guarantee the integrity of classes transferred over the network.












            Interprocess Communication










            Interprocess Communication


            UNIX systems provide several mechanisms for processes to communicate with each other to share information or synchronize their activities. These mechanisms are typically used for transactions across multiple processes, sharing data elements, and coordinating resource sharing. Naturally, the power that IPC primitives afford also presents a potential for vulnerability in applications that use these mechanisms haphazardly.




            Pipes


            Pipes are a simple mechanism for IPC in UNIX. A pipe is a unidirectional pair of file descriptors; one descriptor is used for writing information, and the other is used for reading information. A process can write data to the write side of the pipe, and another process can read that data from the read side of the pipe. The pipe descriptors are created at the same time by the pipe() system call, so they are useful for setting up IPC in advance, typically by handing one side of the pipe to a child process via a fork().


            Not surprisingly, pipes are the underlying mechanism shell programs use when you link programs by using pipe characters. Say you run a command like this:


            echo hi | more


            The shell creates a pipe and gives the write end to a child process that uses it as its standard output descriptor (which is file descriptor 1, if you recall). The read end is handed to a different child process that uses it as its standard input. Then one process runs echo hi and the other process runs the more program, and communication takes place across that pipe.


            You've already looked at a library function based on the use of pipes, popen(). It creates a pipe and hands one end of it to the child process running the requested program. In this way, it can read from the standard output of the subprogram or write to the standard output of the subprogram.


            One interesting feature of a pipe is that writing to a pipe with a closed read end causes your program to receive a SIGPIPE, which has a default behavior of terminating the process. If the process deals with the SIGPIPE, the write call returns a failure code of EPIPE to the program.




            Named Pipes


            Named pipes (also called "FIFOs" because of their first-in, first-out nature) are pipes that exist on the file system and can be opened just like normal files. Software can use named pipes to set up IPC with a process it isn't related to. Pipes are typically created with mkfifo() or mknod() and then opened with open(). Like regular files, named pipes have associated file permissions specified in the creation call, and they are modified by the umask. Therefore, an application that creates a FIFO needs to ensure that it applies appropriate permissions to the new object. In this context, "appropriate" means using a restrictive set of permissions that only allows specific applications access to the pipe.


            Pipes have an interesting behavior in how they're opened by a process that might prove useful in an attack. If a process opens a pipe for reading, the pipe is blocked until another process opens the same pipe for writing. So open() doesn't return until a peer process has joined the game. Similarly, opening a pipe for writing causes a program to block until another process opens the pipe for reading. Opening a pipe in read/write mode (O_RDWR) is undefined behavior, but it usually results in the pipe being opened as a reader without blocking occurring. You can open pipes in nonblocking mode if you want to avoid the blocking behavior. Programs expecting regular files could instead be passed a named pipe that causes the blocking behavior. Although this isn't a security problem in-itself, it could slow down the program when attempting to perform some other TOCTOU-based attack. In addition to open() blocking, attackers can cause the read pipe to block whenever they choose if they are the only writer attached to the other end of the pipe, thus providing additional control over process execution. In fact, Michael Zalewski (a researcher that we have noted previously in this chapter) demonstrated this attack when exploiting a race condition in the GNU C Compiler (GCC). It's more of an exploitation technique but is worth mentioning because race conditions that might have seemed infeasible become more readily exploitable (the technique is detailed at http://seclists.org/bugtraq/1998/Feb/0077.html).


            There are also quirks in writing to named pipes. If you try to write to a named pipe with no attached reader, you the get same result as with a normal pipe: a SIGPIPE signal and the EPIPE error from the write system call.


            Another potential problem when dealing with pipes is nonsecure use of mkfifo() and mknod(). Unlike open(), these two functions don't return a file descriptor upon successful creation of a resource; instead, they return a value of 0 indicating success. Therefore, a program that creates a named pipe must subsequently call open() on the created pipe to use it. This situation creates the potential for a race condition; if the pipe is deleted and a new file is created in its place between the time mkfifo() is used and open() is called, the program might inadvertently open something it didn't intend to. Here's an example of vulnerable code:


            int open_pipe(char *pipename)
            {
            int rc;

            rc = mkfifo(pipename, S_IRWXU);

            if(rc == -1)
            return 1;

            return open(pipename, O_WRONLY);
            }


            In this case, if the process can be interrupted between mkfifo() and open(), it might be possible to delete the created file and create a symlink to a system file or perform a similar attack.


            From a code-auditing standpoint, the existence of named pipes introduces three potential issues in UNIX-based applications:


            • Named pipes created with insufficient privileges might result in unauthorized clients performing some sort of data exchange, potentially leading to compromise via unauthorized (or forged) data messages.

            • Applications that are intended to deal with regular files might unwittingly find themselves interacting with named pipes. This allows attackers to cause applications to stall in unlikely situations or cause error conditions in unexpected places. When auditing an application that deals with files, if it fails to determine the file type, consider the implications of triggering errors during file accesses and blocking the application at those junctures.

            • The use of mknod() and mkfifo() might introduce a race condition between the time the pipe is created and the time it's opened.




            System V IPC


            System V IPC mechanisms are primitives that allow unrelated processes to communicate with each other or achieve some level of synchronization. Three IPC mechanisms in System V IPC are message queues, semaphores, and shared memory.


            Message queues are a simple stateless messaging system that allows processes to send each other unspecified data. The kernel keeps messages until the message queue is destroyed or a process receives the messages. Unlike file system access, message queue permissions are checked for each operation instead of just when the process is opened. The functions for using message queues are msget(), msgctl(), msgrcv(), and msgsend().


            Semaphores are a synchronization mechanism that processes can use to control the sequence of activities that occur between them. The semaphore primitives provide the capability to manipulate semaphore sets, which are a series of semaphores that can be operated on independently. The functions for manipulating semaphores are semget(), semop(), and semctl().


            Finally, shared memory segments are a mechanism whereby a memory segment can be mapped to more than one process simultaneously. By reading or writing to the memory in this segment, processed can exchange information or maintain state and variables among a number of processes. Shared memory segments can be created and manipulated with shmget(), shmctl(), shmat(), and shmdt().


            The System V IPC mechanisms have their own namespace in kernel memory that isn't tied to the file system, and they implement their own simple permissions model. In reality, these mechanisms are rarely used in applications; however, you should know about them in case you encounter code that does use them. The most important issue is permissions associated with an IPC entity. IPC implements its own simple permissions model. Each IPC object has its own mode field that describes the requirements for accessing it. This field is nine bits: three bits describing the owner's privileges, three bits describing the group privileges (of the group the owner belongs to), and three bits describing the permissions for everybody else. The bits represent whether the object can be read from or written to for the appropriate group (with one extra bit that's reserved).


            These permissions are a simplified version of how file system permissions work (except IPC mechanisms don't have the execute permission). Obviously, programs that set these permissions inappropriately are vulnerable to attacks in which arbitrary processes interfere with a communication channel being used by a more privileged process. The consequences can range from simple denial-of-service attacks to memory corruption vulnerabilities to logic errors resulting in privilege escalation. Recently, a denial-of-service vulnerability was found in Apache Web server related to shared memory access for users who could run data with privileges of the Apache user (that is, could write scripts for the Web server to run). In an article at www.securityfocus.com/archive/1/294026, Zen-parse noted that running scripts in this context allowed users to access the HTTPd scoreboard, which was stored in a shared memory segment. He describes several attacks that resulted in Apache spawning endless numbers of processes or being able to send signals to arbitrary processes as root.


            Another issue when dealing with shared memory segments is that when a process forks, both the child and parent receive a copy of the mapped shared memory segment. This means if one of the processes is compromised to a level that user-malleable code can be run, each process can access shared memory segments with the permissions it was mapped in with. If an exec() occurs, the shared memory segment is detached.


            Finally, the use of shared resources might introduce the possibility of race conditions, particularly in shared memory segments. Because the data segment can be mapped into multiple processes simultaneously, any of those processes that can write to the segment might be able to cause race conditions by modifying data after another process has read it but before the data has been acted on. Of course, there are also complications if multiple writers are acting at the same time. Synchronization issues are covered in more depth in Chapter 13, "Synchronization and State."




            UNIX Domain Sockets


            UNIX domain sockets are similar to pipes, in that they allow processes on a local system to communicate with each other. Like pipes, UNIX domain sockets can be named or anonymous. Anonymous domain sockets are created by using the socketpair() function. It works similarly to the pipe() function; it creates a pair of unnamed endpoints that a process can use to communicate information. Anonymous domain sockets are typically used when a process intends to fork and needs a communication channel between a parent and a child.


            Named domain sockets provide a general-purpose mechanism for exchanging data in a stream-based or record-based fashion. They use the socket API functions to create and manage a connection over a domain socket. In essence, the code to implement connection management and data exchange over named pipes is almost identical to networked applications, although the security implications of using local domain sockets are quite different. Named sockets are implemented by using special socket device files, created automatically when a server calls bind(). The location of the filename is specified in the socket address structure passed to the bind() function. A socket device file is created with permissions (777 & ~umask). Therefore, if a setuid program creates a socket, setting the umask to 0 before starting the program creates the socket file with full read, write, and execute privileges for everyone, meaning any user on the system could connect to the socket and write arbitrary data to the process that bound the socket. An example of a dangerous socket creation is shown:


            int create_sock(char *path)
            {
            struct sockaddr_un sun;
            int s;

            bzero(&sun, sizeof(sun));

            sun.sun_family = AF_UNIX;
            strncpy(sun.sun_path, path, sizeof(sun.sun_path)-1;

            s = socket(AF_UNIX, SOCK_STREAM, 0);

            if(s < 0)
            return s;

            if(bind(s, (struct sockaddr *)&sun, sizeof(sun)) < 0)
            return -1;

            return s;
            }


            Assuming this code is running in a privileged context, it could be dangerous because it doesn't explicitly set the umask before creating the socket. Therefore, the calling user might be able to clear the umask and write to a socket that's not intended to receive connections from arbitrary clients. It's easy to overlook file permissions in this situation because they aren't addressed in the socket functions (as opposed to pipe functions such as mkfifo(), which have a mode argument for creating a new pipe).


            Of course, if users can specify any part of the pathname generated to store the socket or if any writeable directories are used in the path, race attacks could be performed to intercept traffic between a client and server. Specifically, consider the following code:


            int create_sock(void)
            {
            struct sockaddr_un sun;
            char *path = "/data/fifo/sock1";
            int s;

            bzero(&sun, sizeof(sun));

            sun.sun_family = AF_UNIX;
            strncpy(sun.sun_path, path, sizeof(sun.sun_path)-1);

            s = socket(AF_UNIX, SOCK_STREAM, 0);

            if(s < 0)
            return s;

            if(bind(s, (struct sockaddr *)&sun, sizeof(sun)) < 0)
            return -1;

            return s;
            }


            This slightly modified example shows that a socket is created in /data/fifo. If the /data directory is writeable, you could let the server create the socket, and then unlink the /fifo directory or symlink it to some other location where another socket named sock1 has been created. Any client connecting to this socket would then be connecting to the wrong program unwittingly. This might have security implications if sensitive data is being transmitted or if resources are being passed across the socket, such as file descriptors.


            Auditing code that uses UNIX domain sockets requires paying attention to the manner in which the socket is created. Because socket files need to be opened explicitly with the socket API, socket files can't be passed as arguments to setuid programs in an attempt to manipulate the speed at which a process is running, as described for named pipes. There is one exceptionwhen the socket has already been opened and a new process inherits the descriptor via a call to execve().


            Note



            Also, bear in mind that when a server closes a socket, the socket file isn't deleted from the file system; it needs to be deleted with the unlink() function. Failure to do so by the server might result in it being unable to bind again when it needs to be restarted (if a static pathname is used) or a directory being continually filled up with unused socket files. This isn't so much a security issue but can result in the application not being able to bind sockets when it needs to.