Tuesday, October 27, 2009

12.4. ThreadWeaver



12.4. ThreadWeaver


ThreadWeaver is now one of the KDE 4 core libraries.
It is discussed here because its genesis contrasts in many ways with that of the
Akonadi project, and thus serves as an interesting comparison. ThreadWeaver
schedules parallel operations. It was conceived at a time when it was
technically pretty much impossible to implement it with the libraries used by
KDE, namely Qt. The need for it was seen by a number of developers, but it took
until the release of Qt 4 for it to mature and become mainstream. Today, it is
used in major applications such as KOffice and KDevelop. It is typically applied
in larger-scale, more complex software systems, where the need for concurrency
and out-of-band processing becomes more pressing.


ThreadWeaver is a job scheduler for
concurrency. Its purpose is to manage and arbitrate resource usage in
multi-threaded software systems. Its second goal is to provide application
developers with a tool to implement parallelism that is similar in its approach
to the way they develop their GUI applications. These goals are high-level, and
there are secondary ones at a smaller scale: to avoid brute-force
synchronization and offer means for cooperative serialization of access to data;
to make use of the features of modern C++ libraries, such as thread-safe,
implicit sharing, and signal-slot-connections across threads; to integrate with
the application's graphical user interface by separating the processing elements
from the delegates that represent them in the UI; to allow it to dynamically
throttle the work queue at runtime to adapt to the current system load; to be
simplistic; and many more.


The ThreadWeaver library was
developed to satisfy the needs of developers of event-driven GUI programs, but
it turned out to be more generic. Because GUI programs are driven by a central
event loop, they cannot process time-consuming operations in their main thread.
Doing so would freeze the user interface until the operation is finished. In
some windowing environments, the user interface can be drawn only from the main
thread, or the windowing system itself is single-threaded. So a natural way of
implementing responsive cross-platform GUI applications is to perform all
processing in worker threads and update the user interface from the main thread
when necessary. Surprisingly, the need for concurrency in user interfaces is
rarely ever as obvious as it should be, although it has been emphasized for
OS/2, Windows NT, and Solaris eons ago. Multithreaded programming is more
complicated and requires a better understanding of how the written code actually
functions. Multithreadings also seems to be a topic well understood by software
architects and designers, and badly disseminated to software maintainers and
less-experienced programmers. Also, some developers seem to think that most
operations are fast enough to be executed synchronously, even reading from
mounted filesystems, which is a couple of orders of magnitude slower than
anything processed in the CPU. Such mistakes surface only under extraordinary
circumstances, such as when the system is under heavy I/O load or, more
commonly, a mounted filesystem has been put to sleep to save power
or—heavens!—when, all of a sudden, the filesystem happens to be on the
network.


The following section will describe
the architecture of the library along with its underlying concepts. At the end
of the chapter, we will explore how it found its way into KDE 4.


12.4.1. Introduction to
ThreadWeaver: Or, How Complicated Can It Be to Load a File?


To convince programmers to make use of
concurrency, it needs to be conveniently available. Here is a typical example
for an operation performed in a GUI program—loading a file into a memory buffer
to process it and display the results. In an imperative program, where all
individual operations are blocking, it is of little complexity:






  1. Check whether the file exists and is
    readable.




  2. Open the file for reading.




  3. Read the file contents into memory.




  4. Process them.




  5. Display the results.


To be
user-friendly, it is sufficient to print a progress message to the command line
after every step (if the user requested verbose mode).


In a GUI program, things appear in a
different light because during all of these steps, it is necessary to be able to
update the screen, and users expect a way to cancel the operation. Although it
sounds unbelievable, even recent documentation for GUI toolkits mentions the
"check for events occasionally" approach. The idea is to periodically check for
events while processing the aforementioned steps in chunks and update or abort
if necessary. A lot of care needs to be applied in this situation because the
application state can change in unexpected ways. For example, the user might
decide to close the program, unaware that the program checks for an event in the
midst of a call stack of an operation. To put it shortly, this approach of
polling has never worked very well and is generally out of fashion.


A better approach is to use a thread
(of course). But without a framework to help, this often leads to weird
implementations as well. Since GUI programs are event-based, every step just
listed starts with an event, and an event notifies its completion. In the C++
world, signals are often used for the notification. Some programs look like
this:






  1. The user requests to load the file,
    which triggers a handler method by a signal or event.




  2. The operation to open and load the file
    is started and connected to a second method for notification about its
    completion.




  3. In this method, processing the
    data is started, connected to a third method.




  4. The last method finally displays
    the results.


This cascade of handler methods does
not track the state of the operation very well and is usually error-prone. It
also shows a lack of separation of operations and the view. Nevertheless, it is
found in many GUI applications.


This is exactly where ThreadWeaver is
there to help. Using jobs, the implementation will look like this:






  1. The user requests to
    load the file, which triggers a handler method by a signal or
    event.




  2. In the handler method, the user
    creates a sequence of jobs (a sequence is a job container that executes its jobs
    in the order they were added). He adds a job to load the file and one to process
    its contents. The sequence object is a job itself and sends a signal when all
    its contained jobs are completed. Up to this point, no processing has taken
    place; the programmer only declared what needs to be done in what order. Once
    the whole sequence is set up, the user queues it into the application-global job
    queue (a lazy initialized singleton). The sequence is automatically executed by
    worker threads.




  3. When the done() signal of
    the sequence is received, the data is ready to be
    displayed.


Two aspects are apparent. First, the
individual steps are all declared in one go and then executed. This alone is a
major relief to GUI programmers because it is a nonexpensive operation and can
easily be performed in an event handler. Second, the usual issues of
synchronization can largely be avoided by the simple convention that the queuing
thread only touches the job data after it has been prepared. Since no worker
thread will access the job data anymore, access to the data is serialized, but
in a cooperative fashion. If the programmer wants to display progress to the
user, the sequence emits signals after the processing of every individual job
(signals in Qt can be sent across threads). The GUI remains responsive and is
able to dequeue the jobs or request cancellation of processing.


Since it is much easier to implement I/O
operations this way, ThreadWeaver was quickly adopted by programmers. It solved
a problem in a nice, convenient way.


12.4.2. Core Concepts and
Features


In the previous example, job
sequences have been mentioned. Let us look at what other constructs are provided
in the library.


Sequences are a specialized form of
job collections. Job collections are containers that queue a set of jobs in an
atomic operation and notify the program about the whole set. Job collections are
composites, in the way that they are implemented as job classes themselves.
There is only one queuing operation in ThreadWeaver: it takes a Job pointer.
Composite jobs help keep the queue API minimal.


Job sequences use dependencies to make sure the
contained jobs are executed in the correct order. If a dependency is declared
between two jobs, it means that the depending job can be executed only after its
dependency has finished processing. Since dependencies can be declared in an m:n
fashion, pretty much all imaginable control flows of depending operations (which
are all directed graphs, since repetition of jobs is not allowed) can be modeled
in the same declarative fashion. As long as the execution graph remains
directed, jobs may even queue other jobs while being processed. A typical
example is that of rendering a web page, where the anchored elements are
discovered only once the text of the HTML document itself is processed. Jobs can
then be added to retrieve and prepare all linked elements, and a final job that
depends on all these preparatory jobs renders the page for display. Still, no
mutex necessary. Dependencies are what distinguish a scheduling system like
ThreadWeaver from mere tools for parallel processing. They relieve the
programmer of thinking how to best distribute the individual suboperations to
threads. Even with modern concepts such as futures, usually the programmer still
needs to decide on the order of operations. With ThreadWeaver, the worker
threads eagerly execute all possible jobs that have no unresolved dependencies.
Since the execution of concurrent flow graphs is inherently undeterministic, it
is very unlikely that a manually defined order is flexible enough to be the most
efficient. A scheduler can adapt much better here. Computer scientists tend to
disagree with this thesis, whereas economists, who are more used to analyzing
stochastic systems, often support it.


Priorities can be used to influence the order of execution as well.
The priority system used is quite simple: of all integer priorities assigned to
jobs in the queue, the highest ones are first handed to an available worker
thread. Since the job base class is implemented in a way that allows writing
decorators, changing a job's priority externally can be done by writing a
decorator that bumps the priority without touching the job's implementation. The
combination of priorities and dependencies can lead to interesting results, as
will be shown later.


Instead of relying on direct implementations of queueing behavior, ThreadWeaver
uses queue policies. Queue policies do not immediately affect when and how a
particular job is executed. Instead, they influence the order in which jobs are
taken from the queue by the worker threads. Two standard implementations come
with ThreadWeaver. One is the dependencies discussed earlier. The other is
resource restrictions. Using resource restrictions, it can be declared that of a
certain subset of all created jobs (for example, local filesystem I/O-expensive
ones), only a certain amount can be executed at the same time. Without such a
tool, it regularly happens that some subsystems get overloaded. Resource
restrictions act much like semaphores in traditional threading, except that they
do not block a calling thread, and instead simply mark a job as not yet
executable. The thread that checked whether the job can be executed is then able
to try to get another job to execute.


Queue policies are assigned to jobs,
and the same policy object can be assigned to many. As such, they are composed,
and every job can be managed by any combination of the available policies.
Inheriting specialized policy-driven job base classes would not have provided
such flexibility. Also, this way, job objects that do not need any extra
policies are in no way affected by a possible performance hit of evaluating the
policies.


12.4.3. Declarative Concurrency: A
Thumbnail Viewer Example


Another example explains how these different
ThreadWeaver concepts play together. It uses jobs, job composites, resource
restrictions, priorities, and dependencies, all to render wee little thumbnail
images in a GUI program. Let us first look at what operations are required to
implement this function, how they depend, and how the user expects to be
presented with the results. The example is part of the ThreadWeaver source
code.


In this example, it is assumed that
loading the thumbnail preview for a digital photo involves three operations: to
load the raw file data from disk, to convert the raw data into an image
representation without changing its size, and then to scale to the required size
of the thumbnail. It is possible to argue that the second and third steps could
be merged into one, but that is (a) not the point of the exercise (just like
streamed loading of the image data) and (b) would impose the restriction that
only image formats can be used where the drivers support scaling during load. It
is also assumed that the files are present on the hard disk. Since the
processing of each file does not influence or depend on the processing of any
other, all files can be processed in parallel. The individual three steps to
process one file need to be performed in sequence.


But that is not all. Since this is an
example with a graphical user interface, the expectations of the user have to be
kept in mind. It is assumed that the user is interested in visual feedback,
which also gives him the impression of progress. Image previews should be shown
as soon as they are available, and progress information in the form of a
reliable progress bar would be nice. The user also expects that the program will
not grind his computer to a halt, for example by excessive I/O operations.


Different ThreadWeaver tools can be
applied to this problem. First of all, processing an individual file is a
sequence of three job implementations. The jobs are quite generic and can be
part of a toolbox of premade job classes available to the application. The jobs
are (the class names match the ones in the example source code):




  • A FileLoaderJob, which loads a
    file on the file system into an in memory byte array



  • A QImageLoaderJob to convert
    the image's raw data into the typical representation of an in-memory image in Qt
    applications (providing the application access to all available image decoders
    in the framework or registered by the application)



  • A ComputeThumbNailJob, which
    simply scales the image to the wanted size of the preview


All of those are added to a JobSequence,
and each of these sequences is added to a JobCollection. The composite implementation of the
collection classes allow for implementations that represent the original problem
very closely and therefore feel somewhat natural and canonical to the
programmer.


This solves part one of the problem, the
parallel processing of
the different images. It could easily lead to other problems, though. With the
given declaration of the problem to the ThreadWeaver queue, there is nothing
that prevents it from loading all files at once and only then starting to
process images. Although this is unlikely, we haven't told the system otherwise
yet. To make sure that only so many file loaders are started at the same time, a
resource restriction is used. The code for it looks like this:

#include "ResourceRestrictionPolicy.h"

...

static QueuePolicy* resourceRestriction()
{
static ResourceRestrictionPolicy policy( 4 );
return &policy;
}


File loaders simply apply the policy
in their constructor or, if generic classes are used, when they are created:

fileloader->assignQueuePolicy( resourceRestriction() );


But that still does not completely
arrange the order of the execution of the jobs exactly as wanted. The queue
might now start only four file loaders at once, but it still might load all the
files and then calculate the previews (again, this is a very unlikely behavior).
It needs one more tool, and a bit of thinking against the grain, to solve the
problem, and this is where priorities come into play. The problem, translated
into ThreadWeaver lingo, is that file loader jobs have lowest priority but need
to be executed first; image loader jobs have precedence over file loaders, but a
file loader must have finished first before an image loader can be started; and
finally, thumbnail computer jobs have highest priority, even if they depend on
the other two phases of processing. Since the three jobs are already in a
sequence, which will make sure they are executed in the right order for every
image, assigning priority one to file loaders, two to image loaders, and three
to thumbnail computers finally solves the problem. Basically, the queue will now
complete one thumbnail as soon as possible, but will not stop to load the images
if slots for file loading become available. Since the problem is mostly I/O
bound, this means that the total time until the thumbnails for all images are
shown is a little more than the time it takes to load them from the hard disk
(other factors aside, such as extremely high-resolution RAW images). In any
sequential solution, the behavior would likely be much worse.


The description of the solution might have
felt complex, so lightening it up with a bit of code is probably in order. This
is how the jobs are generated after the user has selected a couple of hundred
images for processing:

m_weaver->suspend();
for (int index = 0; index < files.size(); ++index)
{
SMIVItem *item = new SMIVItem ( m_weaver, files.at(index ), this );
connect ( item, SIGNAL( thumbReady(SMIVItem* ) ),
SLOT ( slotThumbReady( SMIVItem* ) ) );
}
m_startTime.start();
m_weaver->resume();


To give correct progress feedback,
processing is suspended before the jobs are added. Whenever a sequence is
completed, the item object emits a signal to update the view. For every selected
file, a specialized item is created, which in turn creates the job objects for
processing one file:

m_fileloader = new FileLoaderJob ( fi.absoluteFilePath(),  this );
m_fileloader->assignQueuePolicy( resourceRestriction() );
m_imageloader = new QImageLoaderJob ( m_fileloader, this );
m_thumb = new ComputeThumbNailJob ( m_imageloader, this );
m_sequence->addJob ( m_fileloader );
m_sequence->addJob ( m_imageloader );
m_sequence->addJob ( m_thumb );
weaver->enqueue ( m_sequence );


The priorities are virtual properties of
the job objects and are set there. It is important to keep in mind that all
these objects are set to not process until they are queued, and in this case,
until the processing is explicitly resumed. So the whole operation to create all
these sequences and jobs really takes only a very short time, and the program
returns to the user immediately, for all practical matters. The view updates as
soon as a preview image is available.


12.4.4. From Concurrency to
Scheduling: How to Implement Expected Behavior Systematically


The previous
examples have shown how analyzing the problem completely really helps to solve
it (I hope this does not come as a surprise). To make sure concurrency is used
to write better programs, it is not enough to provide a tool to move stuff to
threads. The difference is scheduling: to be able to tell the program what
operations have to be performed and in what order. The approach is remotely
reminiscent of PROLOG programming lessons, and sometimes requires a similar way
of thinking. Once the minds involved are sufficiently assimilated, the results
can be very rewarding.


One design decision of the
central Weaver class has not been discussed yet.
There are two very disjunct groups of users of the Weaver classes API. The internal Thread objects access it to retrieve their jobs to process, whereas
programmers use it to manage their parallel operations. To make sure the public
API is minimal, a combination of decorator and facade has been applied that
limits the publicly exposed API to the functions that are intended to be used by
application programmers. Further decoupling of the internal implementation and
the API has been achieved by using the PIMPL idiom, which is generally applied
to all KDE APIs.


12.4.5. A Crazy Idea


It has been mentioned earlier that
at the time ThreadWeaver started to be developed, it was not really possible to
implement all its ideas. One major obstacle was, in fact, a prohibitive one: the
use of advanced implicit sharing features, which included reference counting, in
the Qt library. Since this implicit sharing was not thread-safe, the passing of
every simple Plain Old Data object (POD) was a synchronization point. The author
assumed this to be impractical for users and therefore recommended against using
the prototype developed with Qt 3 for any production environments. The
developers of the KDEPIM suite (the same people who now develop Akonadi) thought
they really knew better and immediately imported a preliminary ThreadWeaver
version into KMail, where it is used to this day. Having run into many of the
problems ThreadWeaver promised to solve, the KMail developers eagerly embraced
it, willing to live with the shortcomings pointed out by its author, even
against his express wishes.


The fact that an imperfect version of the
library was in active use in KDE served as a motivating factor for quickly porting it to
Qt4 when that became usable in beta versions. Thus it was available rather early
in the KDE 4 development cycle, if only in a secondary module and not yet as
part of KDELibs. Over the course of two years, the author gave a number of
presentations on the library, presenting an ever-easier and more complete API as
he kept improving it. It was, one could say, a solution looking for a problem.
The majority of developers working on KDE needed time to realize that this
library was not only academic, but could improve their software significantly
given that they make the investment in taking a step back and rethinking some of
their architectural structures. There was no concrete need by a group of
developers driving the library's progress; it was progressed by an individual
because of his belief in the growing relevance of the problem and the importance
of making available a good solution for the KDE 4 platform. Especially following
the 2005 Akademy conference in Malaga, Spain, more programs started to use
ThreadWeaver, including KOffice and KDevelop, which created enough momentum for
it to be integrated into the main KDE 4 set of libraries.


ThreadWeaver represents the
case of an alternative solution to a problem that once it had matured to
critical point and once the author and the prospective user community agreed
that the time had come for it to be adopted by developers in their projects, it
was quickly promoted to a cornerstone of KDE 4. After that, the attitudes of
community members changed from mild amusement to appreciation and recognition of
the effort that had gone into it. This is an example of how efficient this
community can be at making technical decisions and adapting its stance when an
approach proves itself in practice. There can be no doubt that ThreadWeaver is a
much better library now than it would have been if it not taken three to four
years of rubbing up against the KDE project until its inclusion. And this
includes the rogue premature adoption by the KMail developers. There is also
little doubt that applications written for KDE 4 can deal with concurrency a lot
better and thus provide a better experience to their users, because it succeeded
in the end.


ThreadWeaver will be extended mostly by adding GUI components to visually represent queue activity, and by
including more predefined job classes. Another idea is the integration with
operating system IPC mechanisms (to allow for host-global resource restrictions,
for example), but those are hindered by the requirement to be cross-platform.
The approaches taken by the different operating systems are very diverse. With
the public availability of the KDE 4 line, it became visible to a large
audience. Since ThreadWeaver is not really KDE-specific, the question of where
to go next (Freedesktop.org?) is in the air. For now, the focus remains to
provide developers of applications and the desktop with a reliable scheduler for
concurrency.


 


No comments: