Saturday, December 19, 2009

The Condition Variable Model and Safety Properties









The Condition Variable Model and Safety Properties


Threaded programs are much easier to develop, understand, and maintain if we use well-understood and familiar techniques and models. Chapter 7 discussed this and introduced the boss/worker and work crew models to establish a useful framework for understanding many threaded programs. The critical section concept is essential when using mutexes, and describing the invariants of your data structure is very useful. Finally, even defects have models, as we saw with the deadlock example. Note: Microsoft has its own set of models, such as the apartment model and free threading. These terms are most often used with COM and are discussed briefly at the end of Chapter 11.


Using Events and Mutexes Together


The next step is to describe how to use mutexes and events together, generalizing Program 8-2, where we had the following situation, which will occur over and over again. Note: This discussion applies to CRITICAL_SECTIONs as well as to mutexes.


  • The mutex and event are both associated with the message block or other data structure.

  • The mutex defines the critical code section for accessing the data structure object.

  • The event is used to signal the fact that there is a new message.

  • Generalizing, the mutex ensures the object's invariants (or safety properties), and the event signals that the object has changed state (e.g., a message has been added or removed from a message buffer), possibly being put into a known state (e.g., there is at least one message in the message buffer).

  • One thread (the producer in Program 8-2) locks the data structure, changes the object's state by creating a new message, and sets or pulses the event associated with the fact that there is a new message.

  • At least one other thread (the consumer in this example) waits on the event for the object to reach the desired state. The wait must occur outside the critical section so that the producer can access the object.

  • A consumer thread can also lock the mutex, test the object's state (e.g., is there a new message in the buffer?), and avoid the event wait if the object is already in the desired state.


The Condition Variable Model


Now let's combine all of this into a single code fragment that represents what we will call the condition variable model (CV model) with two variations, the signal and broadcast CV models. The first examples use the broadcast variation. The result is a program model that will occur many times and can be used to solve a wide variety of synchronization problems. For convenience, the example is stated in terms of a producer and a consumer.


The discussion may seem a bit abstract, but once the techniques are understood, we will be able to solve a number of synchronization problems that would be very difficult without a good model.


The code fragment has several key elements.


  • A data structure of type STATE_TYPE that contains all the data or state variables such as the messages, checksums, and counters used in Program 8-2.

  • A mutex and one or more events associated with, and usually a part of, the data structure.

  • One or more Boolean functions to evaluate the condition variable predicates, which are the conditions (states) on which a thread might wait. Examples include "a new message is ready," "there is available space in the buffer," and "the queue is not empty." A distinct event may be associated with each condition variable predicate, or one event may be used to represent simply a change of state or a combination (logical "or") of several predicates. In the latter case, individual predicate functions must be tested, with the mutex locked, to determine the actual state. If the predicate (logical expression) is simple, there is no need for a separate function.


The following code segment shows a producer and consumer using these principles, with a single event and condition variable predicate (implemented with a function, cvp, that is assumed but not shown). When the producer signals that a desired state has been reached, this example assumes that it is appropriate to release several consumer threadsthat is, the signal should be broadcast to all waiting consumers. For instance, the producer may have created several messages, and the state is changed by increasing the message count. In many situations, you want to release only a single thread, as will be discussed after the code fragment.


This code segment is designed to operate under Windows 9x as well as all NT versions. SignalObjectAndWait will then be used to simplify the solution.


Note and caution:
This example deliberately and consciously uses PulseEvent, even though some writers and some of the Microsoft documentation warn against its use (see the remarks section in the MSDN entry). The choice will be justified in the ensuing discussion and the examples, and the reader is invited to attempt to solve the problem (correctly) by using SetEvent.



typedef struct _state_t {
HANDLE Guard; /* Mutex to protect the object. */
HANDLE CvpSet; /* Manual-reset event -- cvp () holds. */
... other condition variables ...
/* State structure with counts, checksums, etc. */
struct STATE_VAR_TYPE StateVar;
} STATE_TYPE State;
...
/* Initialize State, creating the mutex and event. */
...
/* PRODUCER thread that modifies State. */
WaitForSingleObject (State.Guard, INFINITE);
/* Change state so that the CV predicate holds. */
/* Example: one or more messages are now ready. */
State.StateVar.MsgCount += N;
PulseEvent (State.CvpSet);
ReleaseMutex (State.Guard);
/* End of the interesting part of the producer. */
...
/* CONSUMER thread function waits for a particular state. */
WaitForSingleObject (State.Guard, INFINITE);
while (!cvp (&State)) {
ReleaseMutex (State.Guard);
WaitForSingleObject (State.CvpSet, TimeOut);
WaitForSingleObject (State.Guard, INFINITE);
}
/* This thread now owns the mutex and cvp (&State) holds. */
/* Take appropriate action, perhaps modifying State. */
...
ReleaseMutex (State.Guard);
/* End of the interesting part of the consumer. */


Comments on the Condition Variable Model

The essential feature in the code segment is the loop in the consumer. The loop body consists of three steps: (1) unlock the mutex that was locked prior to entering the loop, (2) wait on the event, and (3) lock the mutex again. The event wait time-out is significant, as explained later.


Pthreads, as implemented in many UNIX and other systems, combine these three steps into a single function, pthread_cond_wait, combining a mutex and a condition variable (which is similar but not identical to the Windows event). This is the reason for using the term condition variable model. There is also a timed version, which allows a time-out on the event wait.


Importantly, the single Pthreads function implements the first two steps (the mutex release and event wait) as an atomic operation so that no other thread can run before the calling thread waits on the event (or condition variable).


The Pthreads designers made a wise choice; the two functions (with and without a time-out) are the only ways to wait on a condition variable in Pthreads, so a condition variable must always be used with a mutex. Windows forces you to use two or three separate function calls, and you need to do it in just the right way to avoid problems.


Another motivation for learning the CV model, besides the fact that it simplifies programs and is essential if you ever need to use Pthreads, is that several third parties implement OS-independent thread and synchronization classes based on the CV model. With the material here, you will be able to understand those implementations quickly.


Note:
Windows NT Version 4.0 introduced a new function, SignalObjectAndWait (SOAW), that performs the first two steps atomically. The later examples assume that this function is available, which means that the programs will not run on Windows 9x. Nonetheless, the CV model introduction does not use SOAW in order to motivate its later usage, and a few examples have alternative implementations on the book's Web site that use a CS in place of a mutex. (SOAW cannot be used with a CS.) Appendix C (Table C-5) shows that SignalObjectAndWait provides significant performance advantages.


Using the Condition Variable Model

The condition variable model, when implemented properly, works as follows.


  • The producer locks the mutex, changes state, pulses the event when appropriate, and unlocks the mutex. For example, the producer pulses the event when one or more messages are ready.

  • The event should be pulsed with the mutex locked so that no other thread can modify the object, perhaps invalidating the condition variable predicate.

  • The consumer tests the condition variable predicate with the mutex locked. If the predicate holds, there is no need to wait.

  • If the predicate does not hold, the consumer must unlock the mutex before waiting on the event. Otherwise, no thread could ever modify the state and set the event.

  • The event wait must have a time-out just in case the producer pulses the event in the interval between the mutex release (step 1) and the event wait (step 2). That is, without the finite time-out, there could be a lost signal, which is another example of a race condition. APCs, described later in this chapter, can also cause lost signals. The time-out value used in the producer/consumer segment is a tunable parameter. (See Appendix C for comments on optimal values.)

  • The consumer always retests the predicate after the event wait. Among other things, this is necessary in case the event wait has timed out. Also, the state may have changed. For example, the producer may have produced two messages and then released three waiting consumers, so one of the consumers will test the state, find no more messages, and wait again. Finally, the retest protects against spurious wakeups that might result from a thread setting or pulsing the event without the mutex locked.

  • The consumer always owns the mutex when it leaves the loop, regardless of whether the loop body was executed.


Condition Variable Model Variations

Notice, first, that the preceding code fragment uses a manual-reset event and calls PulseEvent rather than SetEvent. Is this the correct choice, and could the event be used differently? The answer is "yes" to both questions.


Referring back to Table 8-1, we see that the example has the property that multiple threads will be released. This is correct in this example, where several messages are produced and there are multiple consuming threads, and we need to broadcast the change. However, if the producer creates just one message and there are multiple consuming threads, the event should be auto-reset and the producer should call SetEvent to ensure that exactly one thread is released. This variation is the signal CV model rather than the broadcast CV model. It is still essential for the released consumer thread, which will then own the mutex, to modify the object to indicate that there is no available message (that is, the condition variable predicate will no longer hold).


Of the four combinations in Table 8-1, two are useful in the condition variable model. Considering the other two combinations, auto-reset/PulseEvent would have the same effect as auto-reset/SetEvent (the signal CV model) because of the time-out, but the dependence on the time-out would reduce responsiveness. The manual-reset/SetEvent combination causes spurious signals (the condition variable predicate test offers protection, however), because some thread must reset the event, and there will be a race among the threads before the event is reset.


In summary, auto-reset/SetEvent is the signal CV model, which releases a single waiting thread, while manual-reset/PulseEvent is the broadcast CV model, which releases all waiting threads. Pthreads has the same distinction but does not require the finite time-out in the event wait for the broadcast model, whereas the time-out is essential in Windows because the mutex release and event wait are not performed atomically. This will change, however, when we introduce SignalObjectAndWait.


An Example Condition Variable Predicate

Consider the condition variable predicate:



State.StateVar.Count >= K;


In this case, a consumer thread will wait until the count is sufficiently large, and the producer can increment the count by an arbitrary amount. This shows, for example, how to implement a multiple-wait semaphore; recall that normal semaphores do not have an atomic wait for multiple units. The consumer thread would then decrement the count by K after leaving the loop but before releasing the mutex.


Notice that the broadcast CV model is appropriate in this case because a single producer may increase the count so as to satisfy several but not all of the waiting consumers.


Semaphores and the Condition Variable Model

In some cases, a semaphore would be appropriate rather than an event, and semaphores have the advantage of specifying the exact number of threads to be released. For example, if each consumer were known to consume exactly one message, the producer could call ReleaseSemaphore with the exact number of messages produced. In the more general case, however, the producer does not know how the individual consumers will modify the state variable structure, so the condition variable model can be used to solve a wider class of problems.


The CV model is powerful enough to implement semaphores. As described earlier, the basic technique is to define a predicate stating that "the semaphore count is nonzero" and create a state structure containing the count and maximum value. Exercise 1011 shows a complete solution that allows for an atomic wait for multiple units. Pthreads do not provide semaphores because condition variables are sufficiently powerful.









    No comments: