F.1 Chapter [*]

Quick Quiz [*].1: 
Come on now!!! Parallel programming has been known to be exceedingly hard for many decades. You seem to be hinting that it is not so hard. What sort of game are you playing?
 
Answer:
If you really believe that parallel programming is exceedingly hard, then you should have a ready answer to the question ``Why is parallel programming hard?'' One could list any number of reasons, ranging from deadlocks to race conditions to testing coverage, but the real answer is that it is not really all that hard. After all, if parallel programming was really so horribly difficult, how could a large number of open-source projects, ranging from Apache to MySQL to the Linux kernel, have managed to master it?

A better question might be: ''Why is parallel programming perceived to be so difficult?'' To see the answer, let's go back to the year 1991. Paul McKenney was walking across the parking lot to Sequent's benchmarking center carrying six dual-80486 Sequent Symmetry CPU boards, when he suddenly realized that he was carrying several times the price of the house he had just purchased.F.1 This high cost of parallel systems meant that parallel programming was restricted to a privileged few who worked for an employer who either manufactured or could afford to purchase machines costing upwards of $100,000 -- in 1991 dollars US.

In contrast, in 2006, Paul finds himself typing these words on a dual-core x86 laptop. Unlike the dual-80486 CPU boards, this laptop also contains 2GB of main memory, a 60GB disk drive, a display, Ethernet, USB ports, wireless, and Bluetooth. And the laptop is more than an order of magnitude cheaper than even one of those dual-80486 CPU boards, even before taking inflation into account.

Parallel systems have truly arrived. They are no longer the sole domain of a privileged few, but something available to almost everyone.

The earlier restricted availability of parallel hardware is the real reason that parallel programming is considered so difficult. After all, it is quite difficult to learn to program even the simplest machine if you have no access to it. Since the age of rare and expensive parallel machines is for the most part behind us, the age during which parallel programming is perceived to be mind-crushingly difficult is coming to a close.F.2

Quick Quiz [*].2: 
How could parallel programming ever be as easy as sequential programming?
 
Answer:
It depends on the programming environment. SQL [Int92] is an underappreciated success story, as it permits programmers who know nothing about parallelism to keep a large parallel system productively busy. We can expect more variations on this theme as parallel computers continue to become cheaper and more readily available. For example, one possible contender in the scientific and technical computing arena is MATLAB*P, which is an attempt to automatically parallelize common matrix operations.

Finally, on Linux and UNIX systems, consider the following shell command:

get_input | grep "interesting" | sort

This shell pipeline runs the get_input, grep, and sort processes in parallel. There, that wasn't so hard, now was it?

Quick Quiz [*].3: 
Oh, really??? What about correctness, maintainability, robustness, and so on?
 
Answer:
These are important goals, but they are just as important for sequential programs as they are for parallel programs. Therefore, important though they are, they do not belong on a list specific to parallel programming.

Quick Quiz [*].4: 
And if correctness, maintainability, and robustness don't make the list, why do productivity and generality?
 
Answer:
Given that parallel programming is perceived to be much harder than is sequential programming, productivity is tantamount and therefore must not be omitted. Furthermore, high-productivity parallel-programming environments such as SQL have been special purpose, hence generality must also be added to the list.

Quick Quiz [*].5: 
Given that parallel programs are much harder to prove correct than are sequential programs, again, shouldn't correctness really be on the list?
 
Answer:
From an engineering standpoint, the difficulty in proving correctness, either formally or informally, would be important insofar as it impacts the primary goal of productivity. So, in cases where correctness proofs are important, they are subsumed under the ``productivity'' rubric.

Quick Quiz [*].6: 
What about just having fun?
 
Answer:
Having fun is important as well, but, unless you are a hobbyist, would not normally be a primary goal. On the other hand, if you are a hobbyist, go wild!

Quick Quiz [*].7: 
Are there no cases where parallel programming is about something other than performance?
 
Answer:
There are certainly cases where the problem to be solved is inherently parallel, for example, Monte Carlo methods and some numerical computations. Even in these cases, however, there will be some amount of extra work managing the parallelism.

Quick Quiz [*].8: 
Why all this prattling on about non-technical issues??? And not just any non-technical issue, but productivity of all things? Who cares?
 
Answer:
If you are a pure hobbyist, perhaps you don't need to care. But even pure hobbyists will often care about how much they can get done, and how quickly. After all, the most popular hobbyist tools are usually those that are the best suited for the job, and an important part of the definition of ``best suited'' involves productivity. And if someone is paying you to write parallel code, they will very likely care deeply about your productivity. And if the person paying you cares about something, you would be most wise to pay at least some attention to it!

Besides, if you really didn't care about productivity, you would be doing it by hand rather than using a computer!

Quick Quiz [*].9: 
Given how cheap parallel hardware has become, how can anyone afford to pay people to program it?
 
Answer:
There are a number of answers to this question:

  1. Given a large computational cluster of parallel machines, the aggregate cost of the cluster can easily justify substantial developer effort, because the development cost can be spread over the large number of machines.
  2. Popular software that is run by tens of millions of users can easily justify substantial developer effort, as the cost of this development can be spread over the tens of millions of users. Note that this includes things like kernels and system libraries.
  3. If the low-cost parallel machine is controlling the operation of a valuable piece of equipment, then the cost of this piece of equipment might easily justify substantial developer effort.
  4. If the software for the low-cost parallel produces an extremely valuable result (e.g., mineral exploration), then the valuable result might again justify substantial developer cost.
  5. Safety-critical systems protect lives, which can clearly justify very large developer effort.
  6. Hobbyists and researchers might seek knowledge, experience, fun, or glory rather than mere money.
So it is not the case that the decreasing cost of hardware renders software worthless, but rather that it is no longer possible to ``hide'' the cost of software development within the cost of the hardware, at least not unless there are extremely large quantities of hardware.

Quick Quiz [*].10: 
This is a ridiculously unachievable ideal! Why not focus on something that is achievable in practice?
 
Answer:
This is eminently achievable. The cellphone is a computer that can be used to make phone calls and to send and receive text messages with little or no programming or configuration on the part of the end user.

This might seem to be a trivial example at first glance, but if you consider it carefully you will see that it is both simple and profound. When we are willing to sacrifice generality, we can achieve truly astounding increases in productivity. Those who cling to generality will therefore fail to set the productivity bar high enough to succeed in production environments.

Quick Quiz [*].11: 
What other bottlenecks might prevent additional CPUs from providing additional performance?
 
Answer:
There are any number of potential bottlenecks:

  1. Main memory. If a single thread consumes all available memory, additional threads will simply page themselves silly.
  2. Cache. If a single thread's cache footprint completely fills any shared CPU cache(s), then adding more threads will simply thrash the affected caches.
  3. Memory bandwidth. If a single thread consumes all available memory bandwidth, additional threads will simply result in additional queuing on the system interconnect.
  4. I/O bandwidth. If a single thread is I/O bound, adding more threads will simply result in them all waiting in line for the affected I/O resource.

Specific hardware systems may have any number of additional bottlenecks.

Quick Quiz [*].12: 
What besides CPU cache capacity might require limiting the number of concurrent threads?
 
Answer:
There are any number of potential limits on the number of threads:

  1. Main memory. Each thread consumes some memory (for its stack if nothing else), so that excessive numbers of threads can exhaust memory, resulting in excessive paging or memory-allocation failures.
  2. I/O bandwidth. If each thread initiates a given amount of mass-storage I/O or networking traffic, excessive numbers of threads can result in excessive I/O queuing delays, again degrading performance. Some networking protocols may be subject to timeouts or other failures if there are so many threads that networking events cannot be responded to in a timely fashion.
  3. Synchronization overhead. For many synchronization protocols, excessive numbers of threads can result in excessive spinning, blocking, or rollbacks, thus degrading performance.

Specific applications and platforms may have any number of additional limiting factors.

Quick Quiz [*].13: 
Are there any other obstacles to parallel programming?
 
Answer:
There are a great many other potential obstacles to parallel programming. Here are a few of them:

  1. The only known algorithms for a given project might be inherently sequential in nature. In this case, either avoid parallel programming (there being no law saying that your project has to run in parallel) or invent a new parallel algorithm.
  2. The project allows binary-only plugins that share the same address space, such that no one developer has access to all of the source code for the project. Because many parallel bugs, including deadlocks, are global in nature, such binary-only plugins pose a severe challenge to current software development methodologies. This might well change, but for the time being, all developers of parallel code sharing a given address space need to be able to see all of the code running in that address space.
  3. The project contains heavily used APIs that were designed without regard to parallelism. Some of the more ornate features of the System V message-queue API form a case in point. Of course, if your project has been around for a few decades, and if its developers did not have access to parallel hardware, your project undoubtedly has at least its share of such APIs.
  4. The project was implemented without regard to parallelism. Given that there are a great many techniques that work extremely well in a sequential environment, but that fail miserably in parallel environments, if your project ran only on sequential hardware for most of its lifetime, then your project undoubtably has at least its share of parallel-unfriendly code.
  5. The project was implemented without regard to good software-development practice. The cruel truth is that shared-memory parallel environments are often much less forgiving of sloppy development practices than are sequential environments. You may be well-served to clean up the existing design and code prior to attempting parallelization.
  6. The people who originally did the development on your project have since moved on, and the people remaining, while well able to maintain it or add small features, are unable to make ``big animal'' changes. In this case, unless you can work out a very simple way to parallelize your project, you will probably be best off leaving it sequential. That said, there are a number of simple approaches that you might use to parallelize your project, including running multiple instances of it, using a parallel implementation of some heavily used library function, or making use of some other parallel project, such as a database.

One can argue that many of these obstacles are non-technical in nature, but that does not make them any less real. In short, parallelization can be a large and complex effort. As with any large and complex effort, it makes sense to do your homework beforehand.

Quick Quiz [*].14: 
Where are the answers to the Quick Quizzes found?
 
Answer:
In Appendix [*] starting on page [*].

Hey, I thought I owed you an easy one!

Quick Quiz [*].15: 
Some of the Quick Quiz questions seem to be from the viewpoint of the reader rather than the author. Is that really the intent?
 
Answer:
Indeed it is! Many are modeled after Paul--just ask anyone who has had the misfortune of being assigned to teach him. Others are quite similar to actual questions that have been asked during conference presentations and lectures covering the material in this book. Still others are from the viewpoint of the author.

Quick Quiz [*].16: 
These Quick Quizzes just are not my cup of tea. What do you recommend?
 
Answer:
There are a number of alternatives available to you:

  1. Just ignore the Quick Quizzes and read the rest of the book. You might miss out on the interesting material in some of the Quick Quizzes, but the rest of the book has lots of good material as well.
  2. If you prefer a more academic and rigorous treatment of parallel programming, you might like Herlihy's and Shavit's textbook [HS08]. This book starts with an interesting combination of low-level primitives at high levels of abstraction from the hardware, and works its way through locking and simple data structures including lists, queues, hash tables, and counters, culminating with transactional memory.
  3. If you would like an academic treatment of parallel programming that keeps to a more pragmatic viewpoint, you might be interested in the concurrency chapter from Scott's textbook [Sco06] on programming languages.
  4. If you are interested in an object-oriented patternist treatment of parallel programming focussing on C++, you might try Volumes 2 and 4 of Schmidt's POSA series [SSRB00,BHS07]. Volume 4 in particular has some interesting chapters applying this work to a warehouse application. The realism of this example is attested to by the section entitled ``Partitioning the Big Ball of Mud'', wherein the problems inherent in parallelism often take a back seat to the problems inherent in getting one's head around a real-world application.
  5. If your primary focus is scientific and technical computing, and you prefer a patternist approach, you might try Mattson et al.'s textbook [MSM05]. It covers Java, C/C++, OpenMP, and MPI. Its patterns are admirably focused first on design, then on implementation.
  6. If you are interested in POSIX Threads, you might take a look at David R. Butenhof's book [But97].
  7. If you are interested in C++, but in a Windows environment, you might try Herb Sutter's ``Effective Concurrency'' series in Dr. Dobbs Journal [Sut08]. This series does a reasonable job of presenting a commonsense approach to parallelism.
  8. If you want to try out Intel Threading Building Blocks, then perhaps James Reinders's book [Rei07] is what you are looking for.
  9. Finally, those preferring to work in Java might be well-served by Doug Lea's textbooks [Lea97,GPB+07].
In contrast, this book meshes real-world machines with real-world algorithms. If your sole goal is to find an optimal parallel queue, you might be better served by one of the above books. However, if you are interested in principles of parallel design that allow multiple such queues to operate in parallel, read on!

Paul E. McKenney 2011-12-16