Hints for Designing, Debugging, and Testing
At the risk of presenting advice that is contrary to that given in many other books and technical articles, which stress testing and little else, my personal advice is to balance your efforts so that you pay attention to design, implementation, and use of familiar programming models. The best debugging technique is not to create the bugs in the first place; this advice, of course, is easier to give than to follow. Nonetheless, when defects do occur, as they will, code inspection, balanced with debugging, often is most effective in finding and fixing the defects' root causes.
Overdependence on testing is not advisable because many serious defects will elude the most extensive and expensive testing. Testing can only reveal defects; it cannot prove they do not exist, and testing shows only defect symptoms, not root causes. As a personal example, I ran a version of a multiple semaphore wait function that used the CV model without the finite time-out on the event variable wait. The defect, which could cause a thread to block indefinitely, did not show up in over a year of use; eventually, however, something would have failed. Simple code inspection and knowledge of the condition variable model revealed the error.
Debugging is also problematic because debuggers change timing behavior, masking the very race conditions you wish to expose. For example, debugging is unlikely to find a problem with an incorrect choice of event type (auto-reset or manual-reset) and SetEvent/PulseEvent. You have to think carefully about what you wish to achieve.
Having said all that, testing on a wide variety of platforms, including SMP, is an essential part of any multithreaded software development project.
Avoiding Incorrect Code
Every bug you don't put in your code in the first place is one more bug you won't find in testing or production. Here are some hints, most of which are taken, although rephrased, from Butenhof's Programming with POSIX Threads (PWPT).
Avoid relying on thread inertia. Threads are asynchronous, but we often assume, for example, that a parent thread will continue running after creating one or more child threads. The assumption is that the parent's "inertia" will keep it running before the children run. This assumption is especially dangerous on an SMP system, but it can also lead to problems on single-processor systems.
Never bet on a thread race. Nearly anything can happen in terms of thread scheduling. Your program has to assume that any ready thread can start running at any time and that any running thread can be preempted at any time. "No ordering exists between threads unless you cause ordering" (PWPT, p. 294).
Scheduling is not the same as synchronization. Scheduling policy and priorities cannot ensure proper synchronization. Use synchronization objects instead.
Sequence races can occur even when you use mutexes to protect shared data. Just because data is protected, there is no assurance as to the order in which different threads will access the shared data. For example, if one thread adds money to a bank account and another makes a withdrawal, there is no assurance, using a mutex guard alone, that the deposit will be made before the withdrawal. Exercise 1014 shows how to control thread execution order.
Cooperate to avoid deadlocks. You need a well-understood lock hierarchy, used by all threads, to ensure that deadlocks will not occur.
Never share events between predicates. Each event used in a condition variable implementation should be associated with a distinct predicate. Furthermore, an event should always be used with the same mutex.
Beware of sharing stacks and related memory corrupters. Always remember that when you return from a function or when a thread terminates, memory local to the function or thread is no longer valid. Memory on a thread's stack can be used by other threads, but you have to be sure that the first thread continues to exist.
Be sure to use the volatile storage modifier. Whenever a shared variable can be changed in one thread and accessed in another, the variable should be volatile to ensure that each thread stores and fetches the variable to and from memory, rather than assuming that the variable is held in a register that is specific to the thread.
Here are some additional guidelines and rules of thumb that can be helpful.
Use the condition variable model properly, being certain not to use two distinct mutexes with the same event. Understand the condition variable model on which you depend. Be certain that the invariant holds before waiting on a condition variable. Understand your invariants and condition variable predicates, even if they are stated only informally. Be certain that the invariant always holds outside the critical code section. Keep it simple. Multithreaded programming is complex enough without the burden of additional complex, poorly understood thread models and logic. If a program becomes overly complex, assess whether the complexity is really necessary or is the result of poor design. Careful use of standard threading models can simplify your program and make it easier to understand, and lack of a good model may be a symptom of a poorly designed program.
Test on both single-processor and multiprocessor systems and on systems with different clock rates and other characteristics. Some defects will never, or rarely, show up on a single-processor system but will occur immediately on an SMP system, and conversely. Likewise, a variety of system characteristics helps ensure that a defective program has more opportunity to fail.
Testing is necessary but not sufficient to ensure correct behavior. There have been a number of examples of programs, known to be defective, that seldom fail in routine or even extensive tests.
Be humble. After all these precautions, bugs will still occur. This is true even with single-threaded programs; threads simply give us more, different, and very interesting ways to cause problems.
|
No comments:
Post a Comment