Chapter SummaryTechnical Terms
Summary of Important Points
|
Monday, December 21, 2009
Chapter Summary
Some Tough Questions
[ Team LiB ] |
Some Tough QuestionsOne reason that new innovations move into practice slowly is that some of them don't work very well in practice! Not all innovations are useful, and the Early Majority, Late Majority, and Laggards have some good reasons for being cautious. When presented with an innovation, they ask tough questions like these:[23]
Very few software engineering practices have had data collected and disseminated widely enough to prepare software practitioners to answer questions like these. As a result, the software engineering practices described in Table 21-1 are stuck on the left side of the chasm. Early Adopters have been using many of those techniques for 15 years or more, while the later adopter groups are largely unaware of them. The numbers from Rogers's adopter categories are close to the numbers of projects that use code-and-fix development�about 75 percent of projects are still using code and fix or its close cousins, and a similar percentage of adopters fall onto the right side of the chasm. Why is this? In Rogers's framework, one reason that innovations are diffused to Innovators and Early Adopters more quickly is that Innovators and Early Adopters tend to have more resources�they are in better positions to afford costly mistakes. Later adopters are more cautious partly because they aren't as resource rich. As I mentioned in Chapter 12, however, the scarce resource in this case isn't money; it's time. Lagging-edge practices such as code-and-fix development are associated with significant schedule overruns. The overtime that usually accompanies such overruns doesn't leave time for investigating and adopting more effective innovations. |
[ Team LiB ] |
Questions the SRS Answers for a Project
< Free Open Study > |
Questions the SRS Answers for a ProjectThe major question that the SRS addresses concerns the problem domain in which the product will exist. As shown in Figure 17-3, all software products can be viewed from three perspectives: data, process, and behavior. Figure 17-3. Three Perspectives of Software ProductsThe data view looks at identifying the primary data objects to be processed by the software system. The definition of each object as well as where it resides is identified. One object's relationship to other objects and the processes that act on them are also part of the data view. For an ATM machine, an important piece of data would be a customer's personal identification number (PIN). As the data within a software system moves, it is modified by a series of transformations. The process view of the problem domain focuses on these data transformations. The transformation of data from inputs to outputs satisfies the business requirements of the system as captured in the SRS. With our ATM, a process would be the validation of the PIN. The behavior view of the system focuses on the states a system assumes. The state is any externally observable mode of behavior. After a PIN is entered in the ATM, the machine is in the state of validating the PIN. Processes are running and data is being transformed, but the user is only observing a message that may indicate that his or her PIN is being validated. The SRS is the first step in a project that moves from the recognition and definition of the problem domain to a solution domain. The solution domain fills in the three perspectives with requirements models tailored to capturing data, process, and behavioral characteristics (see Figure 17-4). The development of these individual models will be covered later in this book. The SRS provides the framework for holding the models. Figure 17-4. Process, Data, BehaviorOnce we look at the views of our software system, questions arise that lead to the requirements collection. These questions relate to the quality of the SRS produced and to how we eventually validate the completed SRS. Construx Software (www.construx.com) provides a series of software development checklists. IEEE 830-1998 provides the set of quality characteristics an SRS exhibits:
Using the checklist model and the standard, an SRS aids in answering questions for these key characteristics:
|
< Free Open Study > |
Section 5.7. Branching, Tagging, and Merging
5.7. Branching, Tagging, and MergingAs I explained in earlier chapters, most version control systems allow you to create branches and tags from the main trunk of your repository. Tags allow you to mark specific points in your development (such as releases) for later referral, and branches allow you to split the development of your repository and continue development in parallel on a secondary path. Unlike many other version control systems, Subversion doesn't actually have a built-in concept of branches or tags. Instead, it just uses cheap server-side copies that preserve history, while not taking up any additional space. Subversion just marks the location of the copied file or directory and notes where it came from. It then allows you to merge the changes from a "branch" back into the main trunk of the repository at a later date. Although the current implementation of Subversion's copy-merge system does pose the occasional problemfor instance, you can quickly lose track of a file's full merge/branch history if it becomes more than trivially complexthe flexibility of the paradigm more than makes up for its shortcomings in most cases. 5.7.1. Creating a Branch or TagCreating a branch or tag in Subversion is trivially easy. All you need to do is run svn copy on the file or directory that you want to branch/tag. Generally, your repository will be set up with one or more directories named branches and tags, which will be where you want to place copies that are conceptually branches or tags. This is a convention, however, and is in no way enforced by Subversion. As far as Subversion is concerned, directories named branches or tags are identical to any other directories. A typical repository layout is to have three top-level directories in the repository, named trunk, branches, and tags. The trunk directory contains the main development branch of the repository, and all branches and tags are created by putting copies in their respective directories. By separating the three directories at the top level, you can then check out only the TRunk directory, and deal with branches and tags only in the repository most of the time. When you need to use or edit parts of the repository that are in branches, you can use svn switch to swap them into your working copy of the TRunk directory. Creating a Branch/Tag in the RepositoryThe most efficient way to branch or tag a part of the repository is usually to perform the copy entirely in the remote repository. When Subversion performs a copy in the repository, it doesn't make a copy of the data, but instead just points the copy back to the data from the original. This allows copies of any sized file to occur in a constant amount of time, be it five kilobytes or five hundred megabytes. On the other hand, if the copy is performed in the local working copy, the repository-side copy still occurs in constant time, but you also make a copy of the data in the working copy, which is proportional to the amount of data to be copied. A repository-only copy is performed using svn copy with repository URLs for both source and destination. When the command is run, Subversion performs an immediate commit, which means that it opens an editor to allow you to enter a log message if you don't supply one on the command line. As an example, the following command shows how you might create a branch of your entire trunk in a typical repository.
Creating a Branch/Tag from the Working CopySometimes, creating a branch or tag entirely on the repository is impractical. For example, you might want to create a tag or branch that includes uncommitted local modifications, or consists of multiple mixed revisions. In such cases, you need to pass svn copy a working copy path as a source. You can also pass a working copy path as the destination of the copy, but that is generally not what you want, because that would result in the data being copied on your local copy. Instead, if you make sure the destination is still a URL, Subversion only sends the local changes to the repository and all other copies are done as the usual cheap repository-side copies.
Switching to the Branch/TagRegardless of how you create a branch or tag, the best way to edit it locally is usually to use svn switch to change the URL that all or part of your working copy points to. If you have a working copy of your trunk already checked out, you will generally save a lot of bandwidth by using the switch command, because it only downloads the differences necessary to switch the URL. The other advantage of svn switch is that it allows you to branch only a portion of your trunk, and then switch that portion in your working copy to the branch without invalidating any relative paths to the rest of your working copy. 5.7.2. Merging a BranchAs you work with a branch, you may periodically want to update it with changes from the main trunk, in order to get changes made by others. You will also usually want to merge the changes that you make in the branch back into the main trunk after they have reached a stable point. Both of these situations are handled with the svn merge command. The svn merge command is conceptually the hardest Subversion command to deal with, but after you understand how it works, using it is not that complicated. The most basic usage of svn merge is to run it with two revision numbers and a source, as in this example:
In this example, svn merge takes the differences between revisions 100 and 150 in the mybranch branch and applies them to the current working copy. The application of the changes occurs just as if you had done an svn update, and any conflicts are handled accordingly. Another way to run svn merge is to give it two URLs or working copy paths, with revision numbers, and perform a merge of their differences.
As you can see, in the preceding example, there is no requirement that the first revision number should be lower than the second. In fact, regardless of which revision is lower, Subversion will always merge the differences calculated by subtracting the lefthand revision from the righthand revision. So, if a file exists in the righthand revision and not in the lefthand revision, it will be added by the merge. Conversely, if it exists in the lefthand revision but not the righthand, it will be removed. You will also notice in the last example that I used peg revisions to identify which revisions should be used to identify the files. Peg revisions in a merge work the same as they did for svn diff (see Section 5.4.1, "Getting Information on the Current State"). Keeping Track of MergesOne of the things that Subversion handles poorly is a file or directory's merge history. Subversion does not keep a record of what has been or hasn't been merged. Because of this, it is important for you to keep your own merge history, in order to avoid merging a change twice (which could cause that change to be undone). The easiest way to keep track of your merge history is to record in the commit log what files/revisions were merged.
Then, the next time you want to merge the changes from mybranch into trunk, you can use svn log with grep to quickly see which revisions have already been merged in.
Reverting Changes with MergeAnother use for the merge command is to revert changes that were applied in a previous revision. Say, for instance, that you removed a couple of functions (named doFoo() and getBar()) from your source file, but now realize that you actually need them. Assuming that they were both removed in discrete commits (i.e., nothing else was changed in the source file at the same time those functions were removed), merging them back into the HEAD revision is quite simple. First, you'll want to check the logs to find out which revisions the function removals actually took place in. If the file is an active one, with a long log, you might want to use the grep command to pare the output down to something a little more manageable. As an example, the following command will search for any lines containing the word "removed" as well as lines that begin with "r" and a number (most likely log entry headers).
After you know which revisions the functions were removed in, you can revert the removals by performing a merge with a range that goes backwards, from the revision where the removal took place, to the revision immediately preceding the removal. This will merge the removed sections back into your current working copy, where you can make any necessary modifications and then commit.
Looking before You MergeMerges can cause a lot of changes to be applied to the files in your working copy, and undoing those changes can be difficult if there are a lot of local changes. It can be helpful, then, to find out exactly which files Subversion will change (as well as what conflicts will occur) before actually performing the merge. Subversion allows you to do just that with the --dry-run option. When svn merge is invoked with --dry-run, Subversion will perform all of the necessary calculations for the merge and output a list of files that will be modified (as well as how those files will be modified), but it will not actually change any of your local files. |
15.2 Requirements and Qualities
[ Team LiB ] |
15.2 Requirements and QualitiesFor new products to be derived from an organizational repository, they must be structured so that they can share modules. As we discussed in Chapter 14, this means that there must be a standard set of modules, with agreements about individual module's responsibility, behavior, performance, interface, locality of function, communication and coordination mechanisms, and other properties. This familywide structure, the modules it comprises, and the properties about each that are constant across all members of the product line constitute the product line architecture. As we have seen throughout this book, the primary purpose of an architecture is to acheive a system that meets its behavioral and quality requirements. The architecture for each SS2000 product line member was no exception. The most important of these requirements were:
Besides these single-system requirements, the SS2000 architecture carried the additional burden of application to an entire class of systems. Thus its requirements included the ability to replace one module with another tailored to a particular system without disrupting the rest of the architecture. OPERATING ENVIRONMENT AND PHYSICAL ARCHITECTUREThe requirements of modern shipboard systems influence design solutions in profound ways. Sensors and weapons systems are deployed all over the ship; crew members interact with them via a multitude of separately housed workstations. The HCI must be highly tuned to facilitate rapid information flow and command acceptance and must be tailored to the operational mission of the vessel and the cultural idiosyncrasies of its crew. The likelihood of component failure dictates a fault-tolerant design. Figure 15.12 is a physical view of a typical system. A redundant LAN is the communications backbone, connecting from 30 to 70 different, cooperating processors. Nodes on the LAN can total around 30. A node is the end of a communication run and may correspond to a crew station, a weapons platform, or a sensor suite, all located in various parts of the ship and widely dispersed. It may host up to six processors. The LAN is a dual Ethernet. Device-interface modules send and receive data to and from the system's peripherals, primarily sensors, and the weapons systems being controlled. Figure 15.12. Typical physical architecture of an SS2000 product |
[ Team LiB ] |
Mixed Types as Parameters
I l@ve RuBoard |
Mixed Types as Parameters
Classes Complex and Rational are good examples of programmer-defined types that emulate the properties of built-in C++ numeric types. The objects of these types can be operated on by using the same set of operators as that used for an ordinary numeric variable. This supports the C++ goal of treating programmer-defined types in the same way as C++ built-in types are.
This analogy, however, is not complete. You can apply a number of operators to variables of numeric types that you cannot apply to Complex or Rational objects. Examples of such operators are modulo division, bitwise logical operators, and shifts. Of course, you can overload these operators similar to arithmetic and comparison operators, but the meaning of these operators (whatever meaning you decide to implement, the compiler will go along) will not be intuitively clear to the client programmer and to maintainer. An example of such arbitrary assignment of meaning is Complex::operator+() that I implemented to display the values of the data members of the Complex object. Intuitively, it is not clear at all what the expression +x should do to the Complex variable x.
Another problem with treating the objects of built-in types and programmer-defined types equally is the problem of implicit type conversions. C++ supports type conversions without reservation. For example, these expressions are syntactically and semantically correct for any numeric types.
c += b; // ok for b and c of any built-in numeric types
c += 4; // ok for c of any built-in numeric type
Whatever the numeric type of variable b, it is implicitly converted to the numeric type of variable c; whatever the numeric type of variable c, integer 4 is implicitly converted to that type. If variables b and c in the code snippet above are of type Rational, the second line above is in error. For this line to be syntactically correct, one of these functions should be available in the scope of the client code.
void Rational::operator+=(int x); // c+=4; is c.operator+=(4);
void operator+=(Rational &r, int x); // c+=4; is operator+=(c,4);
None of these functions is implemented in Listing 10.5, and that results in the syntax error in the client code. This is an example of the member function that eliminates the error.
void Rational::operator += (int x) // target object changes
{ nmr = nmr + x * dnm; // n1/d1 + n = (n1+n*d1)/d1
this->normalize(); }
Notice that if both functions are available, a member function and a global function with these interfaces, the second line in the code snippet above is still in error. This time around it is an ambiguity of the function call. Since either of these functions can foot the bill (the bill here is the interpretation of the statement c += 4;) the compiler does not know which function to call.
However, if variables b and c in the code snippet above are of type Complex, both lines in the snippet are syntactically correct. Why? Because I implemented two versions of operator+=() in Listing 10.4.
void Complex::operator += (const Complex &b);
void Complex::operator += (int b);
In the client code, the first function is called for the first line of code, and the second function is called for the second line of code.
c += b; // c.Complex::operator+=(b); Complex argument
c += 4; // c.Complex::operator+=(4); integer argument
This resolves the problem of using operands of mixed types in expressions. The second overloaded operator function works not only for integer arguments, but also for characters, short integers, long integers, floating point, and double floating point arguments. According to the rules of argument conversion, a value of each of these built-in types can be converted to an integer. There is no need to overload the function operator +=() for each of these built-in types. One function would suffice.
But do not sigh with relief yet. What about other operators such as -=, *=, /=? Each of these operators requires yet another overloaded operator function with a numeric parameter. And what about other arithmetic operator functions such as operator+(), operator-(), operator*(), and operator/()? Consider the following snippet for object instances of class Rational.
c = a + b; // c = a.operator+(b);
c = a + 5; // ?? incompatible types ??
Again, the second line results in a syntax error because the overloaded operator expects an object of the type Rational as the actual argument, not a value of a built-in numeric type. Meanwhile, all of these expressions are not a product of inflamed imagination. Numeric values are mixed with complex numbers and rational numbers in algorithms. And what about comparisons? You should be able to compare Rational objects with integers, and this poses yet additional problems.
The solution that I used so far is legitimate but boring. For each operator function with a Rational (or any other class) object as an argument, I have to write yet another operator function with a long integer value as an argument. (An integer might not be sufficient on 16-bit machines.)
Can anything be done about that? And here C++ offers you a beautiful tool that allows you to get away with only one set of operator functions (with class parameter) and force these operator functions to accept actual arguments of built-in numeric types.
What is this tool? It is a tool that allows you to cast a numeric value to a value of the class. Let us start with a very simple example.
Rational c = 5; // incompatible types ??
It goes without saying that this line is in error. In Chapter 3, "Working with C++ Data and Expressions,"
I discussed the concept of the cast that converts the value of one built-in numeric type to the value of another built-in numeric type. Of course, these casts are available only between built-in types, not between built-in types and the programmer-defined type Rational. But if a cast between built-in types and type Rational existed, how would it look? Its syntax would be the same as for numeric types: the type name in parentheses. And the type name would be the name of the type to which the value is converted.
Rational c = (Rational)5; // this is how the cast should look like
If you remember, in Chapter 3 you saw two syntactic forms for the cast, one that comes from C (the form I used in the line above) and another is the C++ function-like cast.
Rational c = Rational(5); // this is how the cast could look like
Doesn't this line look like a constructor call? Now, what do you call the function that produces the value of the class type? Don't you call it a constructor? So this function looks like a constructor and behaves like a constructor. The conclusion is that it is a constructor.
Next question梬hat constructor? This is simple. In Chapter 9, we called a constructor with one parameter of nonclass type a conversion constructor. Now you should understand why this name is used. This constructor converts a value of its parameter type into the value of the class type. To make the line above syntactically correct, you have to write a constructor with one parameter.
What should this constructor do with its single parameter? If the value of the parameter is, say, 5, the value of the Rational object should be set to 5 or to 5/1. If the value of the parameter is 7, the value of the object should be set to 7/1. Hence, the value of the parameter should be used to initialize the numerator, and the denominator should be set to 1 for any value of the actual argument. This results in the following constructor.
Rational::Rational(long n) // conversion constructor
{ nmr = n; dnm = 1; } // initialize to a whole number
This constructor is called every time that a function that expects a Rational parameter is called with a numeric actual argument. The class Rational now should look this way.
class Rational {
long nmr, dnm; // private data
void normalize(); // private member function
public:
Rational() // default constructor: zero value 0/1
{ nmr = 0; dnm = 1; }
Rational(long n) // conversion constructor: whole value n/1
{ nmr = n; dnm = 1; }
Rational(long n, long d) // general constructor: fraction as n/d
{ nmr = n; dnm = d;
this->normalize(); }
Rational operator + (const Rational &x) const
{ return Rational(nmr*x.dnm + x.nmr*dnm, dnm*x.dnm); }
// THE REST OF CLASS Rational
} ;
Some programmers dislike writing several constructors if one constructor with default parameters can do the job. A popular constructor that can be used as general constructor, conversion constructor, and default constructor will look this way.
Rational(long n=0, long d=1) // general, conversion, default constructor
{ nmr = n; dnm = d;
this->normalize(); }
Make sure that you see that this constructor is called when the client code supplies two arguments for object initialization: one argument and no arguments; default values are used instead of missing arguments when defining Rational objects.
Rational a(1,4); // Rational a = Rational(1,4); - two arguments
Rational b(2); // Rational b = Rational(2,1); - one argument
Rational c; // Rational c = Rational(0,1); - no arguments
Notice that the actual arguments supplied in this example are of type int but the constructor expects arguments of type long. This is not a problemæ¢mplicit built-in conversion from int to long is available by default as it is available between all built-in numeric types. In function calls, the compiler allows not more than one built-in conversion and not more than on class-defined conversion (a conversion constructor call).
In compiling expressions with Rational operands, the compiler converts the int arguments first to long and then to Rational; after this conversion, the compiler generates the call to the appropriate operator.
c = a.operator+(Rational((long)5)); // real meaning of c = a + 5;
Now the client code above compiles without the operator Rational::operator+(long). A temporary Rational object is created, the conversion constructor is then called, then the operator+(), and then the Rational destructor.
Now you can write client code with numeric values as the second operand, while the first operand is of type Rational.
int main()
{ Rational a(1,4), b(3,2), c, d;
c = a + 5; // c = a.operator+(Rational((long)5));
d = b - 1; // d = b.operator-(Rational((long)1));
c = a * 7; // c = a.operator*(Rational((long)7));
d = b / 2; // d = b.operator/(Rational((long)2));
c += 3; // c.operator+=(Rational((long)3));
d *= 2; // d.operator*=(Rational((long)2));
if (b < 2) // if (b.operator<(Rational((long)2))
cout << "Everything works\n";
return 0; }
Listing 10.6 shows a new version of class Rational that supports mixed types in binary expressions. The output of the program run is shown in Figure 10-6.
Figure 10-6. Output for program in Listing 10.6.
Example 10.6. Class Rational that supports mixed types in expressions.
#include <iostream>
using namespace std;
class Rational {
long nmr, dnm; // private data
void normalize(); // private member function
public:
Rational(long n=0, long d=1) // general, conversion, default
{ nmr = n; dnm = d;
this->normalize(); }
Rational operator + (const Rational &x) const; // const target
Rational operator - (const Rational &x) const;
Rational operator * (const Rational &x) const;
Rational operator / (const Rational &x) const;
void operator += (const Rational &x); // target changes
void operator -= (const Rational &x);
void operator *= (const Rational &x);
void operator /= (const Rational &x);
bool operator == (const Rational &other) const; // const target
bool operator < (const Rational &other) const;
bool operator > (const Rational &other) const;
void show() const;
} ; // end of class specification
Rational Rational::operator + (const Rational &x) const
{ return Rational(nmr*x.dnm + x.nmr*dnm, dnm*x.dnm); }
Rational Rational::operator - (const Rational &x) const
{ return Rational(nmr*x.dnm - x.nmr*dnm, dnm*x.dnm); }
Rational Rational::operator * (const Rational &x) const
{ return Rational(nmr * x.nmr, dnm * x.dnm); }
Rational Rational::operator / (const Rational &x) const
{ return Rational(nmr * x.dnm, dnm * x.nmr); }
void Rational::operator += (const Rational &x)
{ nmr = nmr * x.dnm + x.nmr * dnm; // 3/8+3/2=(6+24)/16=15/8
dnm = dnm * x.dnm; // n1/d1+n2/d2 = (n1*d2+n2*d1)/(d1*d2)
this->normalize(); }
void Rational::operator -= (const Rational &x)
{ nmr = nmr * x.dnm - x.nmr * dnm; // 3/8+3/2=(6+24)/16=15/8
dnm = dnm * x.dnm; // n1/d1+n2/d2 = (n1*d2-n2*d1)/(d1*d2)
this->normalize(); }
void Rational::operator *= (const Rational &x)
{ nmr = nmr * x.nmr; dnm = dnm * x.dnm;
this->normalize(); }
void Rational::operator /= (const Rational &x)
{ nmr = nmr * x.dnm; dnm = dnm * x.nmr;
this->normalize(); }
bool Rational::operator == (const Rational &other) const
{ return (nmr * other.dnm == dnm * other.nmr); }
bool Rational::operator < (const Rational &other) const
{ return (nmr * other.dnm < dnm * other.nmr); }
bool Rational::operator > (const Rational &other) const
{ return (nmr * other.dnm > dnm * other.nmr); }
void Rational::show() const
{ cout << " " << nmr << "/" << dnm; }
void Rational::normalize() // private member function
{ if (nmr == 0) { dnm = 1; return; }
int sign = 1;
if (nmr < 0) { sign = -1; nmr = -nmr; }
if (dnm < 0) { sign = -sign; dnm = -dnm; }
long gcd = nmr, value = dnm; // greatest common divisor
while (value != gcd) { // stop when the GCD is found
if (gcd > value)
gcd = gcd - value; // subtract smaller number from greater
else value = value - gcd; }
nmr = sign * (nmr/gcd); dnm = dnm/gcd; } // denominator is positive
int main()
{ cout << endl << endl;
Rational a(1,4), b(3,2), c, d;
c = a + 5; // I'll discuss c = 5 + a; later
a.show(); cout << " + " << 5 << " ="; c.show(); cout << endl;
d = b - 1;
b.show(); cout << " - " << 1 << " ="; d.show(); cout << endl;
c = a * 7;
a.show(); cout << " * " << 7 << " ="; c.show(); cout << endl;
d = b / 2;
b.show(); cout << " / " << 2 << " ="; d.show(); cout << endl;
c.show();
c += 3;
cout << " += " << 3 << " ="; c.show(); cout << endl;
d.show();
d *= 2;
cout << " *= " << 2 << " ="; d.show(); cout << endl;
if (b < 2)
{ b.show(); cout << " < " << 2 << endl; }
return 0;
}
Remember that the conversions to type Rational, however implicit (silent), are function calls to the conversion constructor. When the function terminates, the temporary object created for this conversion is destroyed with the call to the destructor. (For this class, it is a default destructor supplied by the compiler.) Remember that story about two functions here and two functions there? (Actually, the story was about two dollars here and two dollars there.) Hence, this version of class Rational is somewhat slower than the version that does not rely on argument conversions and provides a separate overloaded operator for each type of the argument.
This implicit use of conversion constructors is supported not only for overloaded operators, but also for any function, member function, and global function that has object parameters. As I mentioned in Chapter 9, conversion constructors deal a blow to the C++ system of strong typing. If intentionally you use a numeric value instead of an object, fine. If you use it by mistake, the compiler does not tell you that you are making a mistake.
C++ offers a wonderful technique for preventing errors and for forcing the designer of client code to tell the maintainer what is going on. This technique consists of using the keyword explicit with the constructor.
explicit Rational(long n=0, long d=1) // cannot be called implicitly
{ nmr = n; dnm = d;
this->normalize(); }
By declaring a constructor explicit, you make any implicit call to this constructor a syntax error.
Rational a(1,4), b(3,2), c, d;
c = a + 5; // syntax error: implicit call
c = a + Rational(5); // ok: explicit call
d = b - 1; // syntax error: implicit call
d = b - (Rational)1; // ok: explicit call
if (b < 1) // syntax error: implicit call
if (b < Rational(2)) // ok: explicit call
cout << "Everything is fine\n";
This is a very good idea because it gives the class designer a better control over the way the class objects are used by the client programmer.
Programmer-defined classes like Complex and Rational do indeed have to emulate the behavior of built-in numeric types as much as possible. Using numeric variables instead of objects in expressions is not an error but a legitimate technique to implement computational algorithms. In the code snippet above, I marked some lines as ok and other lines as syntax error. Given a choice, every programmer would prefer to write code like in the lines marked as syntax errors. The need to spell out the casts every time a numeric operand is used is an imposition on the client programmer and results in code that is less aesthetically pleasing.
From that point of view, the use of the keyword explicit for the constructors of classes like Complex and Rational is probably overkill.
NOTE
Do not use the keyword
explicit
for constructors of numeric classes that implement overloaded operator functions. Utilize it for classes where using a built-in argument instead of the class object in a function call is an error, not a legitimate way of using the class.
I l@ve RuBoard |
URL Abbreviation with mod_rewrite
[ Team LiB ] |
URL Abbreviation with mod_rewriteURL abbreviation is one of the most effective techniques you can use to optimize your HTML. First seen on Yahoo!'s home page, URL abbreviation substitutes short redirect URLs (like r/ci) for longer ones (like Computers_and_Internet/) to save space. The Apache and IIS web servers, and Manila (http://www.userland.com) and Zope (http://www.zope.org) all support this technique. In Apache, the mod_rewrite module transparently handles URL expansion. For IIS, ISAPI filters handle URL rewrites. Here are some IIS rewriting filters:
URL abbreviation is especially effective for home or index pages, which typically have a lot of links. As you will discover in Chapter 19, "Case Studies: Yahoo.com and WebReference.com," URL abbreviation can save anywhere from 20 to 30 percent off of your HTML file size. The more links you have, the more you'll save. NOTE As with most of these techniques, there's always a tradeoff. Using abbreviated URLs can lower search engine relevance, although you can alleviate this somewhat with clever expansions with mod_rewrite. The popular Apache web server[9] has an optional module, mod_rewrite, that enables your server to automatically rewrite URLs.[10] Created by Ralf Engelschall, this versatile module has been called the "Swiss Army knife of URL manipulation."[11] mod_rewrite can handle everything from URL layout, load balancing, to access restriction. We'll be using only a small portion of this module's power by substituting expanded URLs with regular expressions.
The module first examines each requested URL. If it matches one of the patterns you specify, the URL is rewritten according to the rule conditions you set. Essentially, mod_rewrite replaces one URL with another, allowing abbreviations and redirects. This URL rewriting machine manipulates URLs based on various tests including environment variables, time stamps, and even database lookups. You can have up to 50 global rewrite rules without any discernible effect on server performance.[12] Abbreviated URI expansion requires only one.
Tuning mod_rewriteTo install mod_rewrite on your Apache Web server, you or your IT department needs to edit one of your server configuration files. The best way to run mod_rewrite is through the httpd.conf file, as this is accessed once per server restart. Without configuration file access, you'll have to use .htaccess for each directory. Keep in mind that the same mod_include performance caveats apply; .htaccess files are slower as each directory must be traversed to read each .htaccess file for each requested URL. The Abbreviation ChallengeThe strategy is to abbreviate the longest, most frequently accessed URLs with the shortest abbreviations. Most webmasters choose one, two, or three-letter abbreviations for directories. On WebReference.com, the goal was to create a mod_rewrite rule that would expand URLs like this:
Into this:
Like Yahoo!, r is the flag we've chosen for redirects. But why stop there? We can extend this concept into more directories. So turn this:
Into this:
and so on. Note that the lack of a trailing forward slash in this second example allows us to intelligently append column numbers. With the right RewriteRule, the abbreviation of /r/c/66 expands into the string /dhtml/column66/. The RewriteRule SolutionTo accomplish this expansion, you need to write a RewriteRule regular expression. First, you need to find the URI pattern /r/d, and then extract /d and turn it into /dhtml. Next, append a trailing slash. Apache requires two directives to turn on and configure the mod_rewrite module: the RewriteEngine Boolean and the RewriteRule directive. The RewriteRule is a regular expression that transforms one URI into another. The syntax is shown here:
So to create a RewriteRule to solve this problem, you need to add the following two mod_rewrite directives to your server configuration file (httpd.conf or .htaccess):
This regular expression matches a URL that begins with /r/ (the ^ character at the beginning means to match from the beginning of the string). Following that pattern is d(.*), which matches one or more characters after the d. Note that using /r/dodo would expand to /dhtmlodo, so you'll have to make sure anything after r/d always includes a /. So when a request comes in for the URI <a href="/r/d/diner/">DHTML Diner</a>, this rule expands this abbreviated URI into <a href="/dhtml/diner/">DHTML Diner</a>. The RewriteMap Solution for Multiple AbbreviationsThe RewriteRule solution would work well for a few abbreviations, but what if you want to abbreviate a large number of links? That's where the RewriteMap directive comes in. This feature allows you to group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here's an example map file at (/www/misc/redir/abbr_webref.txt):
The MapName specifies a mapping function between keys and values for a rewriting rule using the following syntax:
When you are using a mapping construct, you generalize the RewriteRule regular expression. Instead of a hard-coded value, the MapName is consulted, and the LookupKey accessed. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there is no match, the rule substitutes the default value or a blank string. To use this external map file, we'll add the RewriteMap directive and tweak the regular expression correspondingly. The following httpd.conf commands turn rewriting on, show where to look for your rewrite map, and show the definition of the RewriteRule:
The first directive turns on rewrites as before. The second points the rewrite module to the text version of our map file. The third tells the processor to look up the value of the matching expression in the map file. Note that the RewriteRule has a permanent redirect (301 instead of 302) and last flags appended to it. Once an abbreviation is found for this URL, no further rewrite rules are processed for it, which speeds up lookups. Here we've set the rewrite MapName to abbr and the map file location (text format) to the following:
The RewriteRule processes requested URLs using the regular expression:
This regular expression matches an URL that begins with /r/. (The ^ character at the beginning means to match from the beginning of the string.) Then the regular expression ([^/]*) matches as many non-slash characters it can to the end of the string. This effectively pulls out the first string between two slashes following the /r. For example, in the URL /r/pg/javascript/, this portion of the regular expression matches pg. It also will match ht in /r/ht. (Because there are no slashes following, it just continues until it reaches the end of the URL.) The rest of the pattern /?(.*) matches 0 or 1 forward slashes / with any characters that follow. These two parenthesized expressions will be used in the replacement pattern. The Replacement PatternThe substitution (${abbr:$1}$2) is the replacement pattern that will be used in the building of the new URL. The $1 and $2 variables refer back (backreferences) to the first and second patterns found in the supplied URL. They represent the first set of parentheses and the second set of parentheses in the regular expression, respectively. Thus for /r/pg/javascript/, $1 = "pg" and $2 = "javascript/". Replacing these in the example produces the following:
The ${abbr:pg} is a mapping directive that says, "Refer to the map abbr (recall our map command, RewriteMap abbr txt:/www/misc/redir/abbr_webref.txt), look up the key pg, and return the corresponding data value for that key." In this case, that value is programming/. Thus the abbreviated URL, /r/pg/javascript, is replaced by the following:
Voila! So you've effectively created an abbreviation expander using a regular expression and a mapping file. Using the preceding rewrite map file, the following URL expansions would occur:
The server, upon seeing a matching abbreviation in the map file, will automatically rewrite the URL to the longer value. But what happens if you have many keys in your RewriteMap file? Scanning a long text file every time a user clicks a link can slow down lookups. That's where binary hash files come in handy. Binary Hash RewriteMapFor maximum speed, convert your text RewriteMap file into a binary *DBM hash file. This binary hash version of your key and value pairs is optimized for maximum lookup speed. Convert your text file with a DBM tool or the txt2dbm Perl script provided at http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html. NOTE Note that this example is specific to Apache on Unix. Your platform may vary. Next, change the RewriteMap directive to point to your optimized DBM hash file:
That's the abbreviated version of how you set up link abbreviation on an Apache server. It is a bit of work, but once you've got your site hierarchy fixed, you can do this once and forget it. This technique saves space by allowing abbreviated URLs on the client side and shunting the longer actual URLs to the server. The delay using this technique is hardly noticeable. (If Yahoo! can do it, anyone can.) Done correctly, the rewriting can be transparent to the client. The abbreviated URL is requested, the server expands it, and serves back the content at the expanded location without telling the browser what it has done. You also can use the /r/ flag or the RewriteLog directive to track click-throughs in your server logs. This technique works well for sites that don't change very often: You would manually abbreviate your URIs to match your RewriteMap abbreviations stored on your server. But what about sites that are updated every day, or every hour, or every minute? Wouldn't it be nice if you could make the entire abbreviation process automatic? That's where the magic of Perl and cron jobs (or the Schedule Tasks GUI in Windows) comes in. Automatic URL AbbreviationYou can create a Perl or shell script (insert your favorite CGI scripting language here) to look for URLs that match the lookup keys in your map file and automatically abbreviate your URLs. We use this technique on WebReference.com's home page. To make it easy for other developers to auto-abbreviate their URLs, we've created an open source script called shorturls.pl. It is available at http://www.webreference.com/scripts/. NOTE XSLT gives you another way to abbreviate URLs automatically. Just create the correct templates to abbreviate all the local links in your files. The shorturls.pl script allows you to abbreviate URLs automatically and exclude portions of your HTML code from optimization with simple XML tags (<NOABBREV> ...</NOABBREV>). Using this URL abbreviation technique, we saved over 20 percent (5KB) off our 24KB hand-optimized front page. We could have saved even more space, but for various reasons, we excluded some URLs from abbreviation. This gives you an idea of the link abbreviation process, but what about all the other areas of WebReference? Here is a truncated version of our abbreviation file to give you an idea of what it looks like (the full version is available at http://www.webreference.com/scripts/):
Note that we use two and three-letter abbreviations to represent longer URLs on WebReference.com. Yahoo! uses two-letter abbreviations throughout their home page. How brief you make your abbreviations depends on how many links you need to abbreviate, and how descriptive you want the URLs to be. The URL Abbreviation/Expansion Process: Step by StepIn order to enable automatic link abbreviation (with shorturls.pl) and expansion (with mod_rewrite), do the following:
That's it. Now any new content that appears on your home page will be automatically abbreviated according to the RewriteMap file that you created, listing the abbreviations you want.
|
[ Team LiB ] |
Cryptographic Algorithm Usage
Cryptographic Algorithm UsageThis section focuses on how different algorithms should be approached in new and earlier code. The SDL requirements dictate that
Using any cryptographic algorithms that are not listed in the middle or right-hand columns requires an exception from your central security team. Be aware that the United States federal government mandates the use of specific cryptographic algorithms (NIST 2005). Symmetric Block Ciphers and Key LengthsFor symmetric block encryption algorithms, a minimum key length of 128 bits is required for new code (KeyLength 2006). The only block encryption algorithm recommended for new code is AES. (AES-128, AES-192, and AES-256 are all acceptable.) Two-key (112-bit) or three-key (168-bit) 3DES are currently acceptable if already in use in existing code. However, transitioning to AES is highly recommended. DES, DESX, RC2, and SKIPJACK are no longer considered secure; continued use of these algorithms should be for opt-in backward compatibility only. Best Practices
Symmetric Stream Ciphers and Key LengthsFor symmetric stream ciphers, there is currently no recommended algorithmyou should use a block cipher, such as AES, with at least 128 bits of key material. Existing code that uses RC4 should be using a key size of at least 128 bits, and your application's use of RC4 should be reviewed by a cryptographer. This last point is very importantthere are numerous subtle errors that can arise when using stream ciphers such as RC4. Refer to the "References" section of this chapter for other material outlining some of the common errors. Best Practices
Best Practices
Symmetric Algorithm ModesSymmetric algorithms can operate in a number of modes, most of which link together the encryption operations on successive blocks of plaintext and ciphertext. The electronic code book (ECB) mode of operation should not be used without signoff from the central security team. Cipher-block-chaining (CBC) is the recommended mode of operation for block ciphers. If, for interoperability reasons, you believe that you need to use another chaining mode, you should talk to the security team. Best Practices
Asymmetric Algorithms and Key LengthsFor RSA-based asymmetric encryption and digital signatures, the minimum acceptable key length is 1024 bits, and 1024-bit signature keys should be used only for signatures with validity periods of one year or less. New code should use RSA keys of at least 2048 bits in length. For DSA-based digital signatures, only 1024-bit keys should be used (the maximum allowed by the DSA standard) and then only for short-lived signatures (less than one year). For key exchange and digital signatures that are based on elliptic curve cryptography (ECC), the three NIST-approved curvesP-256, P-384, and P-521are all acceptable. For key agreement, Diffie-Hellman is recommended, with 2048-bit keys for new code and 1024-bit keys for backward compatibility. Keys of 512 bits or fewer are not to be used at all. Best Practices
Hash FunctionsNo new code should use the MD4 or MD5 hash algorithms because hash collisions have been demonstrated for both algorithms, which effectively "breaks" them in the eyes of the cryptographic community. Continued use of SHA-1 is permissible in existing code for backward compatibility purposes and, as described in the next Best Practices reader aid, for new code running on certain down-level platforms. The SHA-2 family of hash functions (SHA-256, SHA-384, or SHA-512) is currently the only group that is generally recommended. The SHA-2 hash functions are available in .NET code and in unmanaged Microsoft Win32 code targeting Windows Server 2003 SP1 and Windows Vista. Note that hash function agilitythe ability to switch to another hash function without updating your codeis part of the cryptographic agility requirement discussed earlier in this chapter. Absent a backward compatibility requirement, code that uses SHA-1 must migrate to SHA-2 once SHA-2 is available on the platform. Best Practices
Message Authentication CodesThe most common and well-known message authentication code (MAC) function is the HMAC, which uses a hash function and secret MAC key for message authentication. It uses an underlying hash function (MD5, SHA-1, or SHA-2) and a secret key of a specified length. The strength of an HMAC relies on the strength of the underlying hash function and the length of the secret. Best Practices
|
Section 15.10. Java Network Security Restrictions
15.10. Java Network Security RestrictionsOne of the most attractive features of Java is the extensive effort to make it a secure language. This is especially important for a language that makes it so easy to implement networking applications. After all, nobody wants to download a Java applet that proceeds to erase the hard disk. Such an applet might be written by a cyber-terrorist deliberately aiming to cause severe damage, or it might be written by a cyber-doofus who inadvertently writes code that does severe damage. What are some of Java's techniques for guarding against deliberate or inadvertent insecure code? One level of security is Java's bytecode verification process, which the Java Virtual Machine performs on any "untrusted" code it receives. Java checks every class that it loads into memory to make sure it doesn't contain illegal or insecure code. Another line of defense is the so-called sandbox security model, which refers to the practice of restricting the kinds of things that certain programs can do. For example, the "sandbox" environment for Java applets restricts them from having any access whatsoever to the local file system. Code verification Limited privileges Another restriction imposed on applets limits their networking capabilities. For example, a Java applet cannot create a network connection to any computer except the one from which its code was downloaded. Also, a Java applet cannot listen for, or accept, connections on privileged portsthose numbered 1024 or lower. Together, these two restrictions severely limit the kinds of client/server programs that can be built as applets. Limited network access Java sets aside certain locations as repositories for trusted code. The Java class libraries are placed in such a location, as are the directories where your Java programs are stored. Any class loaded from some other directory is considered untrusted. By this definition, applets downloaded over the Internet would be considered untrusted code. In addition to the restrictions for applets, which apply to all untrusted code, Java defines a number of other limitations:
Trusted code JDK 1.2 introduced security enhancements based on the concepts of "permission" and "policy." Code is assigned "permissions" based on the security policy currently in effect. Each permission specifies the type of access allowed for a particular resource (such as "read" and "write" access to a specified file or directory, or "connect" access to a given host and port). The policy that controls permissions can be initialized from an external configurable policy file. Unless a permission is explicitly granted to code, it cannot access the resource guarded by the permission. These new enhancements offer a more fine-grained and extensible approach to security for both applets and applications. As this brief overview illustrates, the Java Virtual Machine is designed with security as one of its primary issues. This does not guarantee 100 percent security, but it is a big improvement over some of the languages and systems that preceded Java. Moreover, security is an ongoing concern of the Java development process. Flaws in the existing security system are fixed very quickly. Advanced methods are constantly being developed and incorporated into the system. One such enhancement is the use of encryption to guarantee the integrity of classes transferred over the network. |
Interprocess Communication
Interprocess CommunicationUNIX systems provide several mechanisms for processes to communicate with each other to share information or synchronize their activities. These mechanisms are typically used for transactions across multiple processes, sharing data elements, and coordinating resource sharing. Naturally, the power that IPC primitives afford also presents a potential for vulnerability in applications that use these mechanisms haphazardly. PipesPipes are a simple mechanism for IPC in UNIX. A pipe is a unidirectional pair of file descriptors; one descriptor is used for writing information, and the other is used for reading information. A process can write data to the write side of the pipe, and another process can read that data from the read side of the pipe. The pipe descriptors are created at the same time by the pipe() system call, so they are useful for setting up IPC in advance, typically by handing one side of the pipe to a child process via a fork(). Not surprisingly, pipes are the underlying mechanism shell programs use when you link programs by using pipe characters. Say you run a command like this: echo hi | more The shell creates a pipe and gives the write end to a child process that uses it as its standard output descriptor (which is file descriptor 1, if you recall). The read end is handed to a different child process that uses it as its standard input. Then one process runs echo hi and the other process runs the more program, and communication takes place across that pipe. You've already looked at a library function based on the use of pipes, popen(). It creates a pipe and hands one end of it to the child process running the requested program. In this way, it can read from the standard output of the subprogram or write to the standard output of the subprogram. One interesting feature of a pipe is that writing to a pipe with a closed read end causes your program to receive a SIGPIPE, which has a default behavior of terminating the process. If the process deals with the SIGPIPE, the write call returns a failure code of EPIPE to the program. Named PipesNamed pipes (also called "FIFOs" because of their first-in, first-out nature) are pipes that exist on the file system and can be opened just like normal files. Software can use named pipes to set up IPC with a process it isn't related to. Pipes are typically created with mkfifo() or mknod() and then opened with open(). Like regular files, named pipes have associated file permissions specified in the creation call, and they are modified by the umask. Therefore, an application that creates a FIFO needs to ensure that it applies appropriate permissions to the new object. In this context, "appropriate" means using a restrictive set of permissions that only allows specific applications access to the pipe. Pipes have an interesting behavior in how they're opened by a process that might prove useful in an attack. If a process opens a pipe for reading, the pipe is blocked until another process opens the same pipe for writing. So open() doesn't return until a peer process has joined the game. Similarly, opening a pipe for writing causes a program to block until another process opens the pipe for reading. Opening a pipe in read/write mode (O_RDWR) is undefined behavior, but it usually results in the pipe being opened as a reader without blocking occurring. You can open pipes in nonblocking mode if you want to avoid the blocking behavior. Programs expecting regular files could instead be passed a named pipe that causes the blocking behavior. Although this isn't a security problem in-itself, it could slow down the program when attempting to perform some other TOCTOU-based attack. In addition to open() blocking, attackers can cause the read pipe to block whenever they choose if they are the only writer attached to the other end of the pipe, thus providing additional control over process execution. In fact, Michael Zalewski (a researcher that we have noted previously in this chapter) demonstrated this attack when exploiting a race condition in the GNU C Compiler (GCC). It's more of an exploitation technique but is worth mentioning because race conditions that might have seemed infeasible become more readily exploitable (the technique is detailed at http://seclists.org/bugtraq/1998/Feb/0077.html). There are also quirks in writing to named pipes. If you try to write to a named pipe with no attached reader, you the get same result as with a normal pipe: a SIGPIPE signal and the EPIPE error from the write system call. Another potential problem when dealing with pipes is nonsecure use of mkfifo() and mknod(). Unlike open(), these two functions don't return a file descriptor upon successful creation of a resource; instead, they return a value of 0 indicating success. Therefore, a program that creates a named pipe must subsequently call open() on the created pipe to use it. This situation creates the potential for a race condition; if the pipe is deleted and a new file is created in its place between the time mkfifo() is used and open() is called, the program might inadvertently open something it didn't intend to. Here's an example of vulnerable code: int open_pipe(char *pipename) In this case, if the process can be interrupted between mkfifo() and open(), it might be possible to delete the created file and create a symlink to a system file or perform a similar attack. From a code-auditing standpoint, the existence of named pipes introduces three potential issues in UNIX-based applications:
System V IPCSystem V IPC mechanisms are primitives that allow unrelated processes to communicate with each other or achieve some level of synchronization. Three IPC mechanisms in System V IPC are message queues, semaphores, and shared memory. Message queues are a simple stateless messaging system that allows processes to send each other unspecified data. The kernel keeps messages until the message queue is destroyed or a process receives the messages. Unlike file system access, message queue permissions are checked for each operation instead of just when the process is opened. The functions for using message queues are msget(), msgctl(), msgrcv(), and msgsend(). Semaphores are a synchronization mechanism that processes can use to control the sequence of activities that occur between them. The semaphore primitives provide the capability to manipulate semaphore sets, which are a series of semaphores that can be operated on independently. The functions for manipulating semaphores are semget(), semop(), and semctl(). Finally, shared memory segments are a mechanism whereby a memory segment can be mapped to more than one process simultaneously. By reading or writing to the memory in this segment, processed can exchange information or maintain state and variables among a number of processes. Shared memory segments can be created and manipulated with shmget(), shmctl(), shmat(), and shmdt(). The System V IPC mechanisms have their own namespace in kernel memory that isn't tied to the file system, and they implement their own simple permissions model. In reality, these mechanisms are rarely used in applications; however, you should know about them in case you encounter code that does use them. The most important issue is permissions associated with an IPC entity. IPC implements its own simple permissions model. Each IPC object has its own mode field that describes the requirements for accessing it. This field is nine bits: three bits describing the owner's privileges, three bits describing the group privileges (of the group the owner belongs to), and three bits describing the permissions for everybody else. The bits represent whether the object can be read from or written to for the appropriate group (with one extra bit that's reserved). These permissions are a simplified version of how file system permissions work (except IPC mechanisms don't have the execute permission). Obviously, programs that set these permissions inappropriately are vulnerable to attacks in which arbitrary processes interfere with a communication channel being used by a more privileged process. The consequences can range from simple denial-of-service attacks to memory corruption vulnerabilities to logic errors resulting in privilege escalation. Recently, a denial-of-service vulnerability was found in Apache Web server related to shared memory access for users who could run data with privileges of the Apache user (that is, could write scripts for the Web server to run). In an article at www.securityfocus.com/archive/1/294026, Zen-parse noted that running scripts in this context allowed users to access the HTTPd scoreboard, which was stored in a shared memory segment. He describes several attacks that resulted in Apache spawning endless numbers of processes or being able to send signals to arbitrary processes as root. Another issue when dealing with shared memory segments is that when a process forks, both the child and parent receive a copy of the mapped shared memory segment. This means if one of the processes is compromised to a level that user-malleable code can be run, each process can access shared memory segments with the permissions it was mapped in with. If an exec() occurs, the shared memory segment is detached. Finally, the use of shared resources might introduce the possibility of race conditions, particularly in shared memory segments. Because the data segment can be mapped into multiple processes simultaneously, any of those processes that can write to the segment might be able to cause race conditions by modifying data after another process has read it but before the data has been acted on. Of course, there are also complications if multiple writers are acting at the same time. Synchronization issues are covered in more depth in Chapter 13, "Synchronization and State." UNIX Domain SocketsUNIX domain sockets are similar to pipes, in that they allow processes on a local system to communicate with each other. Like pipes, UNIX domain sockets can be named or anonymous. Anonymous domain sockets are created by using the socketpair() function. It works similarly to the pipe() function; it creates a pair of unnamed endpoints that a process can use to communicate information. Anonymous domain sockets are typically used when a process intends to fork and needs a communication channel between a parent and a child. Named domain sockets provide a general-purpose mechanism for exchanging data in a stream-based or record-based fashion. They use the socket API functions to create and manage a connection over a domain socket. In essence, the code to implement connection management and data exchange over named pipes is almost identical to networked applications, although the security implications of using local domain sockets are quite different. Named sockets are implemented by using special socket device files, created automatically when a server calls bind(). The location of the filename is specified in the socket address structure passed to the bind() function. A socket device file is created with permissions (777 & ~umask). Therefore, if a setuid program creates a socket, setting the umask to 0 before starting the program creates the socket file with full read, write, and execute privileges for everyone, meaning any user on the system could connect to the socket and write arbitrary data to the process that bound the socket. An example of a dangerous socket creation is shown: int create_sock(char *path) Assuming this code is running in a privileged context, it could be dangerous because it doesn't explicitly set the umask before creating the socket. Therefore, the calling user might be able to clear the umask and write to a socket that's not intended to receive connections from arbitrary clients. It's easy to overlook file permissions in this situation because they aren't addressed in the socket functions (as opposed to pipe functions such as mkfifo(), which have a mode argument for creating a new pipe). Of course, if users can specify any part of the pathname generated to store the socket or if any writeable directories are used in the path, race attacks could be performed to intercept traffic between a client and server. Specifically, consider the following code: int create_sock(void) This slightly modified example shows that a socket is created in /data/fifo. If the /data directory is writeable, you could let the server create the socket, and then unlink the /fifo directory or symlink it to some other location where another socket named sock1 has been created. Any client connecting to this socket would then be connecting to the wrong program unwittingly. This might have security implications if sensitive data is being transmitted or if resources are being passed across the socket, such as file descriptors. Auditing code that uses UNIX domain sockets requires paying attention to the manner in which the socket is created. Because socket files need to be opened explicitly with the socket API, socket files can't be passed as arguments to setuid programs in an attempt to manipulate the speed at which a process is running, as described for named pipes. There is one exceptionwhen the socket has already been opened and a new process inherits the descriptor via a call to execve(). Note Also, bear in mind that when a server closes a socket, the socket file isn't deleted from the file system; it needs to be deleted with the unlink() function. Failure to do so by the server might result in it being unable to bind again when it needs to be restarted (if a static pathname is used) or a directory being continually filled up with unused socket files. This isn't so much a security issue but can result in the application not being able to bind sockets when it needs to. |