Saturday, October 31, 2009

Using a COW String: Multithreaded




I l@ve RuBoard









Using a COW String: Multithreaded


Now let's return to our friend, the Error subsystem. Here's the problematic calling code again:



// thread A
String newerr = SetError( "A" );
newerr += " (set by A)";
cout << newerr << endl;

// thread B
String newerr = SetError( "B" );
newerr += " (set by B)";
cout << newerr << endl;

As before, the issue is that there has to be some serialization (using, for example, a mutex lock) in order to avoid corrupting the Error module's internal strings and produce reasonable output. But who should be responsible for doing the serialization? Consider the two levels again:



  1. Do locking within Optimized::String member functions.
    As before, there are two problems with this. It's at the wrong granularity to solve the SetError() function's interleaving, and it can seriously degrade performance because the locking would be done at least once for every mutating (and possibly even nonmutating) String operation. What's different from before, however, is that this time, Option 1 is necessary anyway (see below).


  2. Do locking in the code that owns/manipulates an Optimized::String object.
    Again, this is necessary; otherwise we haven't solved the SetError() interleaving problem.



Alas, now neither of the preceding does the job by itself. We need something heavyweight:



  1. [NECESSARY] Do both.
    This time the story isn't so good: Option 2 is necessary, but we have to do Option 1 too because the code that owns/manipulates a copy-on-write Optimized::String can't. Calling code alone can never lock the right strings in the right way, because it has no way of knowing which visible string objects might be sharing a representation at any given time. Look again at the thread-safe Error subsystem code from Example A-2. The locking it does is still necessary but is now insufficient, because even though we serialize all access to the internal err error string object, the calling code's threads are taking and manipulating copies of that object, which copies might share representation.


    For example, thread A could be safely serialized inside its SetError() call, manipulating the Error module's err string, at the same time thread B (already past its safely serialized SetError()) is appending to its newerr string. Note that B's newerr string began life as a copy of the err string.[3] There's nothing obvious linking the internal err string and B's external newerr string. In fact, the Error subsystem has no way of knowing that any newerr strings even exist, and vice versa. But if the two strings happen to be sharing representation, which is not only possible, but likely, in this example, two threads could both be attempting to execute String member functions on the same string representation even though everything outside String itself is safely serialized.[4]

    [3] Or, equivalently, a copy of a copy of the err string if the compiler didn't perform the return value optimization.


    [4] Certain popular commercial libraries shipping today have variants of precisely this COW bug, where modifying one visible string also incorrectly modifies a different visible string because the libraries fail to perform the deep copy soon enough.





That's why it's not enough for the Error subsystem to perform its usual duty of care. For COW strings, the Optimized::String objects must, in addition, always protect themselves. This forces the string class to incur some overhead. But the overhead can be disastrously expensive if the implementation incurs needless inefficiencies.



How severe is the problem? One developer recently reported that he has a single-threaded test program that exercises his (popular) vendor's COW-based library features. The program executes in 18 to 20 seconds when the library is compiled for multithreaded use, and in 0.25 seconds when it's compiled for single-threaded use. (The difference is greater than the numbers imply, because both times include program startup.) It's important to note here that this developer's test program was only single-threaded, not multithreaded. But because the library had been compiled for possible multithreaded use, the problem affected him anyway. If 20 programs share a library, and only one program is multithreaded, the library must be built for multithreaded use�and any problem will affect all 20 programs.








    I l@ve RuBoard



    No comments: