I l@ve RuBoard |
12.2 Design and Complexity MetricsClasses and methods are the basic constructs for OO technology. The amount of function provided by an OO software can be estimated based on the number of identified classes and methods or its variants. Therefore, it is natural that the basic OO metrics are related to classes and methods, and the size (logical lines of code, or LOC) or function points of the classes and methods. For design and complexity measures, the metrics would have to deal with specific OO characteristics such as inheritance, instance variable, and coupling. 12.2.1 Lorenz Metrics and Rules of ThumbBased on his experience in OO software development, Lorenz (1993) proposed eleven metrics as OO design metrics. He also provided the rules of thumb for some of the metrics, which we summarized in Table 12.1.. As the table shows, some of these metrics are guidelines for OO design and development rather than metrics in the sense of quantitative measurements. Although most of these eleven metrics are related to OO design and implementation, metric 8 is a statement of good programming practices, metric 9 is a quality indicator, and metric 11 is a metric for validating the OO development process. With regard to average method size, a large number may indicate poor OO designs and therefore function-oriented coding. For average number of methods per class, a large number is desirable from the standpoint of code reuse because subclasses tend to inherit a larger number of methods from superclasses. However, if the number of methods per object class gets too large, extensibility will suffer. A larger number of methods per object class is also likely to complicate testing as a result to increased complexity. Too many methods in a single class, not counting inherited methods, is also a warning that too much responsibility is being placed in one type of object. There are probably other undiscovered classes. On this point, similar reasoning can be applied to instance variables�a large number of instance variables indicates that one class is doing more than it should. In other words, the design may need refinement. Inheritance tree depth is likely to be more favorable than breadth in terms of reusability via inheritance. Deeper inheritance trees would seem to promote greater method sharing than would broad trees. On the other hand, a deep inheritance tree may be more difficult to test than a broad one and comprehensibility may be diminished. Deep class hierarchy may be the result of overzealous object creation, almost the opposite concern of having too many methods or instance variables in one class. The pertinent question therefore is, what should the optimum value be for OO metrics such as the several just discussed? There may not be one correct answer, but the rules of thumb by Lorenz as shown in Table 12.1. are very useful. They were derived based on experiences from industry OO projects. They provide a threshold for comparison and interpretation.
In 1994 Lorenz and Kidd (1994) expanded their metrics work by publishing a suite of recommended OO metrics with multiple metrics for each of the following categories: method size, method internals, class size, class inheritance, method inheritance, class internals, and class externals. They also showed the frequency distribution of the number of classes for five projects, in the histogram form, along the values of some of the metrics. No numeric parameters of these metrics (e.g., mean or median) were provided, however. 12.2.2 Some Metrics ExamplesIn early 1993, IBM Object Oriented Technology Council (OOTC) (1993) published a white paper on OO metrics with recommendations to the product divisions. The list included more that thirty metrics, each with a relative importance rating of high, medium, or low. All proposed metrics by Lorenz in 1993 (Table 12.1.), with the exception of metrics 5 and 6, were in the IBM OOTC list with a high importance rating. Almost all of the OOTC metrics were included in Lorenz and Kidd's (1994) comprehensive suite. This commonality was not a coincidence because both Lorenz and Kidd were formerly affiliated with IBM and Lorenz was formerly the technical lead of IBM's OOTC. As one would expect, Lorenz's OO metrics rules of thumb were the same as IBM OOTC's. The OOTC also recommended that the average depth of hierarchy be less than 4 for C++ projects. In terms of project size, the OOTC classified projects with fewer than 200 classes as small, projects with 200 to 500 classes as medium, and projects with more than 500 classes as large. Table 12.2 shows selected metrics for six OO projects developed at the IBM Rochester software development laboratory. Project A was for the lower layer of a large operating system that interacts with hardware microcode; Project B was the development of an operating system itself; Project C was for the software that drives the input and output (IO) devices of a computer system; Project D was for a Visualage application; Project E was for a software for a development environment, which was a joint project with an external alliance; and Project F was for a software that provides graphical operations for a subsystem of an operating system. Based on OOTC's project size categorization, Projects A, B, and C were very large projects, Projects E and F were medium-sized projects, and Project D was a small project. Compared with the rules of thumb per Lorenz (1993) and IBM OOTC (1993), Project E had a much higher average number of methods per class, a larger class in terms of LOC, and a larger maximum depth of inheritance tree. Project E was a joint project with an external alliance and when code drops were delivered, acceptance testing was conducted by IBM. Our defect tracking during acceptance testing did show a high defect volume and a significantly higher defect rate, even when compared to other projects that were developed in procedural programming. This supports the observation that a deep inheritance tree may be more difficult to test than a broad one and comprehensibility may be diminished, thereby allowing more opportunities for error injection.
The metric values for the other projects all fell below the rule-of-thumb thresholds. The average methods per class for projects A, B, and C were far below the threshold of 20, with project B's value especially low. A smaller number of methods per class may mean larger overheads in class interfaces and a negative impact on the software's performance. Not coincidentally, all three projects were not initially meeting their performance targets, and had to undergo significant performance tuning before the products were ready to ship. The performance challenges of these three projects apparently could not be entirely attributed to this aspect of the class design because there were other known factors, but the data demonstrated a good correlation. Indeed, our experience is that performance is a major concern that needs early action for most OO projects. The positive lesson learned from the performance tuning work of these projects is that performance tuning and improvement are easier in OO development than in procedural programming. 12.2.3 The CK OO Metrics SuiteIn 1994 Chidamber and Kemerer proposed six OO design and complexity metrics, which later became the commonly referred to CK metrics suite:
Chidamber and Kemerer (1994) applied these six metrics in an empirical study of two companies, one using C++ and one using Smalltalk. Site A, a software vendor, provided data on 634 classes from two C++ libraries. Site B, a semiconductor manufacturer, provided data on 1,459 Smalltalk classes. The summary statistics are shown in Table 12.3. The median weighted methods per class (WMC) for both sites were well below the threshold value for the average number of methods (20) as discussed earlier. The DIT maximums exceeded the threshold of 6, but the medians seemed low, especially for the C++ site. The classes for both sites had low NOCs�with medians equal to zero, and 73% of site A and 68% of site B had zero children. Indeed the low values of DIT and NOC led the authors to the observation that the designers might not be taking advantage of reuse of methods through inheritance. Striking differences in CBOs and RFCs were shown between the C++ and Smalltalk sites, with the median values for the Smalltalk site much higher. The contrast reflects the differences in the lan-guages with regard to OO implementation. Smalltalk has a higher emphasis on pure OO message passing and a stronger adherence to object-oriented principles (Henderson-Sellers, 1995). Last, the distribution of lack of cohesion on methods was very different for the two sites. Overall, this empirical study shows the feasibility of collecting metrics in realistic environments and it highlights the lack of use of inheritance. The authors also suggested that the distribution of the metric values be used for the identification of design outliers (i.e., classes with extreme values).
12.2.4 Validation Studies and Further ExamplesTo evaluate whether the CK metrics are useful for predicting the probability of detecting faulty classes, Basili and colleagues (1996) designed and conducted an empirical study over four months at the University of Maryland. The study participants were the students of a C++ OO software analysis and design class. The study involved 8 student teams and 180 OO classes. The independent variables were the six CK metrics and the independent variables were the faulty classes and number of faults detected during testing. The LCOM metric was operationalized as the number of pairs of member functions without shared instance variables, minus the number of pairs of member functions with shared instance variables. When the above subtraction is negative, the metric was set to zero. The hypotheses linked high values of the CK metrics to higher probability of faulty classes. Key findings of the study are as follows:
This validation study provides positive confirmation of the value of the CK metrics. The authors, however, caution that several factors may limit the generalizability of results. These factors include: small project sizes, limited conceptual complexity, and student participants. In 1997 Chidamber, Darcy, and Kemerer (1997) applied the CK metrics suite to three financial application software programs and assessed the usefulness of the metrics from a managerial perspective. The three software systems were all developed by one company. They are used by financial traders to assist them in buying, selling, recording, and analysis of various financial instruments such as stocks, bonds, options, derivatives, and foreign exchange positions. The summary statistics of the CK metrics of these application software are in Table 12.4. One of the first observed results was the generally small values for the depth of inheritance tree (DIT) and number of children (NOC) metrics in all three systems, indicating that developers were not taking advantage of the inheritance reuse feature of the OO design. This result is consistent with the earlier findings of an empirical study by two of the authors on two separate software systems (Chidamber and Kemerer, 1994). Second, the authors found that three of the metrics, weighted methods per class (WMC), response for a class (RFC), and coupling between classes (CBO) were highly correlated, with correlation coefficients above the .85 level. In statistical interpretation, this implies that for the three software systems in the study, all three metrics were measuring something similar. This finding was in stark contrast to the findings by the validation study by Basilli and colleagues (1995), in which all six CK metrics were found to be relatively independent. This multi-collinearity versus independence among several of the CK metrics apparently needs more empirical studies to clarify. We noted that the metrics values from these three systems are more dissimilar than similar, especially with regard to the maximum values. This is also true when including the two empirical data sets in Table 12.3. Therefore, it seems that many more empirical studies need to be accumulated before preferable threshold values of the CK metrics can be determined. The authors also made the observation that the threshold values of these metrics cannot be determined a priori and should be derived and used locally from each of the data sets. They decided to use the "80-20" principle in the sense of using the 80th percentile and 20th percentile of the distributions to determine the cutoff points for a "high" or "low" value for a metric. The authors also recommended that the values reported in their study not be accepted as rules, but rather practitioners should analyze local data and set thresholds appropriately.
The authors' major objective was to explore the effects of the CK metrics on managerial variables such as productivity, effort to make classes reusable, and design effort. Each managerial variable was evaluated using data from a different project. Productivity was defined as size divided by the number of hours required (lines of code per person hour), and was assessed using data from software A. Assessment of the effort to make classes reusable was based on data from software B. Some classes in software B were reused on another project and the rework effort was recorded and measured in the number of hours spent by the next project staff to modify each class for use in the next project. Assessment on design effort was based on data from software C. Design effort was defined as the amount of time spent in hours to specify the high-level design of each class. Multivariate statistical techniques were employed with class as the unit of analysis and with each managerial variable as the independent variable. The CK metrics were the independent variables; other relevant variables, such as size and specific developers who had superior performance, were included in the models to serve as control variables so that the net effect of the CK metrics could be estimated. The findings indicated that of the six CK metrics, high levels of CBOs (coupling between object classes) and LCOMs (lack of cohesion on methods) were associated with lower productivity, higher effort to make classes reusable, and greater design effort. In other words, high values of CBO and LCOM were not good with regard to the managerial variables. Specifically, the final regression equation for the productivity evaluation was as follows: The equation indicates that controlling for the size of the classes and the effect of a star performer (STAFF_4), the productivity for classes with high CBO (coupling between object classes) and LCOM (lack of cohesion on methods) values was much lower (again, the authors used the 80th percentile as the cutoff point to define high values). Because productivity was defined as lines of code per person hour, the regression equation indicates that the productivity was 76.57 lines of code per hour lower (than other classes) for classes with high CBO values, and 33.96 lines of code per hour lower for classes with high LCOM values. The effects were very significant! As a side note, it is interesting to note that the productivity of the classes developed by the star performer (STAFF_4) was 48.11 lines of code per hour higher! This finding is significant because it reflects the strength of the underlying concepts of coupling and cohesion. In practical use, the metrics can be used to flag out-lying classes for special attention. Rosenberg, Stapko, and Gallo (1999) discuss the metrics used for OO projects at the NASA Software Assurance Technology Center (SATC). They recommend the six CK metrics plus three traditional metrics, namely, cyclomatic complexity, lines of code, and comment percentage based on SATC experience. The authors also used these metrics to flag classes with potential problems. Any class that met at least two of the following criteria was flagged for further investigation:
Considerable research and discussions on OO metrics have taken place in recent years, for example, Li and Henry (1993), Henderson-Sellers (1996), De Champeaux (1997), Briand (1999), Kemerer (1999), Card and Scalzo (1999), and Babsiya and Davis (2002). With regard to the direction of OO metrics research, there seems to be agreement that it is far more important to focus on empirical validation (or refuta-tion) of the proposed metrics than to propose new ones, and on their relationships with managerial variables such as productivity, quality, and project management. |
I l@ve RuBoard |
No comments:
Post a Comment