Saturday, December 19, 2009

Information Hiding




I l@ve RuBoard


Information Hiding


The concept of information hiding is also related to the principle of separation of concerns. Normally, without information hiding, the programmer who writes the code (or the person who maintains the code) has to keep in mind two sets of design decisions, or two sets of knowledge, simultaneously. One set of knowledge and concerns is about the design of data (e.g., type Cylinder), and another set of knowledge is about the application-related manipulation of data (setting fields, comparing volumes, scaling sizes, etc.).



With information hiding, the areas of concern are separate. The programmer who writes (or maintains) the client code is concerned only with application-related manipulations of data, not with data design. The programmer who writes (or maintains) the data access functions is concerned only with data design, not with application-related manipulations of data.



If this sounds similar to the concept of data encapsulation, you have it right. I have to admit that most definitions of information hiding I have read are vague and nonoperational; they do not explain how to distinguish information hiding from encapsulation, how to recognize the lack of information hiding, or how to implement information hiding. Most people tacitly assume that information hiding is the same as encapsulation.



The concept of encapsulation is narrower梬e want the names and types of data fields to be encapsulated from the client code so that the client code will not explicitly mention the names of underlying data fields. In our example, it will mean that the client code shall not mention c1.radius,
c1.height, and so on explicitly, as I did in the snippet of code above. Encapsulation through the use of access functions improves the quality of code: its readability and independence of components.



How is information hiding different from encapsulation? Before I answer this question, let us consider an example of encapsulation that is not very effective. Try to implement encapsulation by introducing server functions that perform operations on a Cylinder object, for example, returning the values of Cylinder fields or setting the Cylinder dimensions. These server functions are also called access functions because they access cylinder data on behalf of their client code. The term "access" does not distinguish between different types of access梩hese functions can either retrieve field values or modify them.





void setRadius(Cylinder &c, double r) // modifier function
{ c.radius = r; }

void setHeight(Cylinder &c, double h) // modifier function
{ c.height = h; }

double getRadius(const Cylinder& c) // selector function
{ return c.radius; }

double getHeight(const Cylinder& c) // selector function
{ return c.height; }



The main() function does not have to use the names of cylinder components; if they change, it is the functions setRadius(),
setHeight(),
getRadius(), and getHeight() that have to change, not main() or any other client of Cylinder.
Listing 8.8 shows the use of these access functions. The output of this program is the same as the output of the code in Listing 8.6桰 changed the design of the code but not its functionality.





Example 8.8. Example of ineffective encapsulation.


#include <iostream> // awkward encapsulation
using namespace std;

struct Cylinder { // data structure to access
double radius, height; } ;

void setRadius(Cylinder &c, double r) // modifier
{ c.radius = r; }

void setHeight(Cylinder &c, double h) // modifier
{ c.height = h; }

double getRadius(const Cylinder& c) // accessor
{ return c.radius; }

double getHeight(const Cylinder& c) // accessor
{ return c.height; }

int main()
{
Cylinder c1, c2; double radius, height; // program data
cout << "Enter radius and height of the first cylinder: ";
cin >> radius >> height; // initialize data
setRadius(c1,radius); setHeight(c1,height);
if (getRadius(c1)<0) setRadius(c1,10); // verify data
if (getHeight(c1)<0) setHeight(c1,20);
cout << "Enter radius and height of the second cylinder: ";
cin >> radius >> height; // initialize data
setRadius(c2,radius); setHeight(c2,height);
if (getRadius(c2)<0) setRadius(c2,10); // verify data
if (getHeight(c2)<0) setHeight(c2,20);
if (getHeight(c1)*getRadius(c1)*getRadius(c1)*3.141593
< getHeight(c2)*getRadius(c2)*getRadius(c2)*3.141593)
{ setRadius(c1,getRadius(c1)*1.2);
setHeight(c1,getHeight(c1)*1.2); // scale up
cout << "\nFirst cylinder changed size\n"; // print new size
cout <<"radius: "<<c1.radius<<" height: "<<c1.height<<endl; }
else // otherwise do nothing
cout << "No change in first cylinder size" << endl;
return 0;
}


You see that indeed the main() function is encapsulated from Cylinder data field names. If these names change in the process of redesign, there is a limited and easily identified set of access functions that have to be changed. No other place in the program, even if the program is very large, has to be modified or even inspected. It has to be recompiled, but this is a different story. Figure 8-8 shows the object diagram for this design. Similar to the object diagram that I introduced in Chapter 1, "Object-Oriented Approach: What's So Good About It?"
(Figure 1-7), this diagram demonstrates that server functions setRadius(),
setHeight(),
getRadius(), and getHeight() conceptually belong together. They access Cylinder fields radius and height on behalf of the client code. The client code accesses server data only through calls to the server access functions, not directly.





Figure 8-8. Object diagram for program in Listing 8.8.







However, the encapsulation here is awkward. Actually, it is useless. The design principles listed at the beginning of this chapter are not used. The access functions do little for achieving the goals of the client code: The responsibility for data manipulation is not pushed to the server functions, it remains with the client. Despite the use of access functions, the main() client code mixes access to data, for example, calls to getRadius() with data manipulation, so that the meaning of computations (computing volume, scaling the size) is not easy to grasp. If the number of fields of the programmer-defined type Cylinder changes, the number of access functions will change, too, and the client code has to be modified as well.



To correctly choose the set of access server functions, you have to take into account the responsibilities of the client code. In this example, the client code is responsible for initializing cylinder objects, validating object data, computing cylinder volume, scaling cylinder size, and displaying cylinder attributes. Let us design access functions that do exactly that: setCylinder(),
validateCylinder(),
getVolume(),
scaleCylinder(), and printCylinder().



With these access functions, you push responsibility down from the client code to the server code. It is the server functions that set cylinder fields, validate cylinder data, compute volume, change size, and display cylinder data. The client code only requests these operations. As a result, the operations in main() are expressed in terms of function calls to servers.



The mix of access to data with data manipulation disappears. The client code specifies what should be done (set data fields, compute volume, etc.) The server code specifies how this is done. The Cylinder data representation is encapsulated: If the field names change, the client code is not affected. If you add more fields to the Cylinder design, the client code is not affected. (Well, for that to be entirely true, the input operations have to be encapsulated as well.)



The knowledge shared by client designers and server designers is limited to the names and interfaces of server functions. The areas of concern for client programmers and server programmers are separate: one encompasses high-level application-related operations, another is limited to data field names and low-level computations.



Even for this tiny example, you see the advantages of using access functions. The client code is expressed in terms of meaningful application-related operations. What does c1.height*c1.radius*c1.radius*3.141593 mean in Listing 8.6? The maintenance programmer has to figure that out. The same is true about the statements c1.radius*=1.2; and c1.height*=1.2�/TT>do all dimensions of the cylinder change? Is the factor the same for all dimensions? In printing statements, are all cylinder dimensions displayed or only some? When the access to data and the application-related operations are intermixed, it is more difficult to figure out the meaning of processing.



Using access functions also makes validation of user input easier梩he main() function is not cluttered by details of validation. If the data representation (cylinder design or just field names) changes, it is the server functions that have to change. As I mentioned earlier, this is not just a matter of labor needed for maintenance. It is a matter of attention span. Without access function, the potential area of change is the whole program. (Cylinders could be used anywhere.) With access functions, the potential area of change is well defined梚t includes functions that access the cylinder data representation.



This approach promotes reusability. Without access functions, any algorithm that uses cylinder objects has to be written and verified from scratch. With access functions, new algorithms can be written in terms of function calls to them. Each of these operations has to be verified only once.



The drawback of this approach is that you have to write and test more source code. One can argue, however, that this is actually an additional advantage. In the total balance of time, typing code takes a small fraction. All other development steps require reading the code梔ebugging, testing, integration, and maintenance. Writing client code in terms of function calls to access functions (which are already written and tested) makes these steps easier, less error prone, and less expensive.



So, what does the criterion of information hiding add to data encapsulation? Let us look again at the server functions validateCylinder() and getVolume(). The first function encapsulates the validation operations, default values, and the like. This is good, because the client code does not need to know the details of validation; it is enough to know that validation is done. The second function encapsulates the geometrical computations. This is also good, because the client code need not be concerned with the rules of geometry; it is enough for it to know that the cylinder volume is computed.



Both of these functions are no good from the point of view of information hiding. They expand the client designer's knowledge about the design of the server, enlarge the client designer's attention span, and bring information to the client code for manipulation instead of manipulating it in the server code.



The first function, validateCylinder(), betrays the need for data validation梚t should not be within the span of attention of the client code designer and maintainer. This can be eliminated by redesign, that is, by changing the list of functions and their responsibilities. A good solution to this problem is to merge functions validateCylinder() and enterData().





void enterData(Cylinder &c, char number[])
{ cout << "Enter radius and height of the ";
cout << number << " cylinder: ";
cin >> c.radius >> c.height; // initialize cylinder
if (c.radius < 0) c.radius = 10; // defaults for corrupted data
if (c.height < 0) c.height = 20; }



As you see again and again, the criteria of cohesion, coupling, encapsulation, and information hiding are not operational. They signal the existence of a design drawback, but they do not indicate in what directions you should change the design to eliminate the drawback. The principles listed at the beginning of this chapter are operational: They indicate how to change the design. In this example, information hiding is improved by pushing responsibility to server functions. Instead of forcing the client code to call two server functions, enterData() and validateCylinder(), this design requires the client code to call only one access function.



The function getVolume() violates the principle of pushing responsibilities to server functions by giving the client code more information than it needs. The client code needs to know whether one cylinder is larger than another. Instead of serving this client need, the server code returns the computed value of the volume of the cylinder and lets the client code do with this value whatever the client likes. Information about the cylinder volume should be hidden from the client code. To serve this client need, I should change the design, introducing, for example, function firstIsSmaller().





bool firstIsSmaller(const Cylinder& c1, const Cylinder& c2)
{ if (c1.height*c1.radius*c1.radius*3.141593 // compare volumes
< c2.height*c2.radius*c2.radius*3.141593)
return true;
else
return false; }




Listing 8.9 shows the version of the source code that combines proper encapsulation with information hiding. Notice that the functionality of the code remained the same for all versions of the program. It is the design that I changed, and it is the design that affects the quality of the code. The output of the program is the same as the output of the program in Listing 8.6.





Example 8.9. Combining encapsulation and information hiding.


#include <iostream>
using namespace std;

struct Cylinder { // data structure to access
double radius, height; } ;

void enterData(Cylinder &c, char number[])
{ cout << "Enter radius and height of the ";
cout << number << " cylinder: ";
cin >> c.radius >> c.height; // initialize cylinder
if (c.radius < 0) c.radius = 10; // defaults for corrupted data
if (c.height < 0) c.height = 20; }

bool firstIsSmaller(const Cylinder& c1, const Cylinder& c2)
{ if (c1.height*c1.radius*c1.radius*3.141593 // compare volumes
< c2.height*c2.radius*c2.radius*3.141593)
return true;
else
return false; }

void scaleCylinder(Cylinder &c, double factor)
{ c.radius *= factor; c.height *= factor; } // scale dimensions

void printCylinder(const Cylinder &c) // print object state
{ cout << "radius: " <<c.radius << " height: " <<c.height <<endl; }

int main() // pushing responsibility to server functions
{
Cylinder c1, c2; // program data
enterData(c1,"first"); // initialize first cylinder
enterData(c2,"second"); // initialize second cylinder
if (firstIsSmaller(c1,c2))
{ scaleCylinder(c1,1.2); // scale it up and
cout << "\nFirst cylinder changed size\n"; // print new size
printCylinder(c1); }
else // otherwise do nothing
cout << "\nNo change in first cylinder size" << endl;
return 0;
}



Figure 8-9 shows the object diagram for this design. Similar to the previous figure, it shows that functions enterData(),
firstIsSmaller(),
scaleCylinder(), and printCylinder() belong together. Here, the server serves the client code better because the access functions do the work for the client code rather than bring information to the client for further manipulation.





Figure 8-9. Object diagram for program in Listing 8.9.












I l@ve RuBoard

No comments: