Friday, November 13, 2009

Are We Testing Enough?











Are We Testing Enough?



Defining "Good Enough"


In Chapter 3, I presented Kano Analysis as a technique for thinking holistically in terms of satisfiers and dissatisfiers for a project, and in Chapter 4, I discussed planning an iteration. The iteration's objectives should determine the test objectives.


Although it may be hard to determine what constitutes "good enough" testing for a project as a whole, it's not that hard for the iteration. The whole team should know what "good enough" means for the current iteration, and that should be captured in the scenarios, QoS that are planned to be implemented, and risks to be addressed in the iteration. Testing should verify the accomplishment of that target through the course of the iteration.[9]


A value-up practice for planning "good enough" is to keep the bug backlog as close to zero as possible. Every time you defer resolving or closing a bug, you impose additional future liability on the project for three reasons: The bug itself will have to be handled multiple times, someone (usually a developer) will have a longer lag before returning to the code for analysis and fixing, and you'll create a "Broken Windows" effect. The Broken Windows theory holds that in neighborhoods where small details, such as broken windows, go unaddressed, other acts of crime are more likely to be ignored. Cem Kaner, a software testing professor and former public prosecutor, describes this well:[10]



The challenge with graffiti and broken windows is that they identify a community standard. If the community can't even keep itself moderately clean, then: (1) Problems like these are not worth reporting, and so citizens will stop reporting them. (We also see the converse of this, as a well-established phenomenon. In communities that start actually prosecuting domestic violence or rape, the reported incidence of these crimes rises substantiallypresumably, the visible enforcement causes a higher probability of a report of a crime, rather than more crime). In software, many bugs are kept off the lists as not worth reporting. (2) People will be less likely to clean these bugs up on their own because their small effort won't make much of a difference. (3) Some people will feel it is acceptable (socially tolerated in this community) to commit more graffiti or to break more windows. (4) Many people will feel that if these are tolerated, there probably isn't much bandwidth available to enforce laws against more serious street crimes.



Similarly, in projects with large bug backlogs, overall attention to quality issues may decline. This is one of many reasons to keep the bug backlog as close to zero as possible.




Set Iteration Test Objectives by Assigning Work Items to the Iteration


In VSTS, all work items, including scenarios, QoS, bugs, and risks, can be assigned to an iteration. This assignment creates a test target list for that iteration, or in other words, a visible bar defining good enough testing for that iteration. You can, of course, add more to that list or reschedule items to future iterations, but there is always a visible, agreed definition of the iteration test goals, and changes to it are tracked in an auditable manner.







Exploratory Testing


Most testing I've discussed so far is either automated or highly scripted manual testing. These are good for finding the things that you know to look for but weak for finding bugs or issues where you don't know to look. Exploratory testing, also called ad hoc testing, is an important mindset to bring to all of the testing that you do. In exploratory testing, the tester assumes the persona of the user and exercises the software as that persona would. Kaner, Bach, and Pettichord describe exploratory testing this way:



By exploration, we mean purposeful wandering; navigating through a space with a general mission, but without a prescripted route. Exploration involves continuous learning and experimenting. There's a lot of backtracking, repetition, and other processes that look like waste to the untrained eye.[11]



Exploratory testing can be a very important source of discovery, not just of bugs but also of unforeseen (or not yet described) scenarios and QoS requirements. Capture these in the backlog of the work item database so that you can use them in planning the current and future iterations. As a manager, plan for a certain level of exploratory testing in every iteration. Define charters for these testing sessions according to the goals of the iteration. Tune the charters and the resource level according to the value you get from these sessions. In short, plan capacity for exploratory testing.




Testing as Discovery


Embrace testing that discovers new scenarios, QoS requirements, and risks in addition of course to finding bugs. Capture the new scenarios, QoS, and risks as work items in the product backlog. This is vital information. It makes the quantitative coverage measurement a little harder in that you're increasing the denominator, but that's a small price to pay for helping the project deliver more customer value.


A particular type of scenario test is the "soap opera." Hans Buwalda describes the technique in his article "Soap Opera Testing" as follows:



Soap operas are dramatic daytime television shows that were originally sponsored by soap vendors. They depict life in a way that viewers can relate to, but the situations portrayed are typically condensed and exaggerated. In one episode, more things happen to the characters than most of us will experience in a lifetime. Opinions may differ about whether soap operas are fun to watch, but it must be great fun to write them. Soap opera testing is similar to a soap opera in that tests are based on real life, exaggerated, and condensed.[12]



Soap operas are harsh, complex teststhey test many features using intricate and perhaps unforeseen sequences. The essence of soap operas is that they present cases that are relevant to the domain and important to the stakeholders but that cannot be tested in isolation. They are a good test of robustness in iterations where the software is mature enough to handle long test sequences.




False Confidence


When you have automated or highly scripted testing, and you do not balance it with exploration, you run the risk of what Boris Beizer coined as the "Pesticide Paradox":[13]



Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual.



In other words, you can make your software immune to the tests that you already have. This pattern is especially a concern when the only testing being done is regression testing, and the test pool is very stable. There are three ways to mitigate the Pesticide Paradox:


  1. Make sure that the tests the software faces continually include fresh ones, including good negative tests.

  2. Look at gaps in test coverage against scenarios, QoS, risks, and code. Prioritize the gaps and think about tests that can close them.

  3. Use progressively harsher tests, notably soap operas and exploratory testing, to confirm, from a knowledgeable domain expert's perspective, that the software doesn't have undiscovered vulnerabilities.


Exploratory testing, soap operas, and risk identification all mitigate against a false sense of confidence.













No comments: