3.4.1 Work Partitioning

Work partitioning is absolutely required for parallel execution: if there is but one ``glob'' of work, then it can be executed by at most one CPU at a time, which is by definition sequential execution. However, partitioning the code requires great care. For example, uneven partitioning can result in sequential execution once the small partitions have completed [Amd67]. In less extreme cases, load balancing can be used to fully utilize available hardware, thus attaining more-optimal performance.

In addition, partitioning of work can complicate handling of global errors and events: a parallel program may need to carry out non-trivial synchronization in order to safely process such global events.

Each partition requires some sort of communication: after all, if a given thread did not communicate at all, it would have no effect and would thus not need to be executed. However, because communication incurs overhead, careless partitioning choices can result in severe performance degradation.

Furthermore, the number of concurrent threads must often be controlled, as each such thread occupies common resources, for example, space in CPU caches. If too many threads are permitted to execute concurrently, the CPU caches will overflow, resulting in high cache miss rate, which in turn degrades performance. On the other hand, large numbers of threads are often required to overlap computation and I/O.

Quick Quiz 3.12: What besides CPU cache capacity might require limiting the number of concurrent threads? End Quick Quiz

Finally, permitting threads to execute concurrently greatly increases the program's state space, which can make the program difficult to understand, degrading productivity. All else being equal, smaller state spaces having more regular structure are more easily understood, but this is a human-factors statement as much as it is a technical or mathematical statement. Good parallel designs might have extremely large state spaces, but nevertheless be easy to understand due to their regular structure, while poor designs can be impenetrable despite having a comparatively small state space. The best designs exploit embarrassing parallelism, or transform the problem to one having an embarrassingly parallel solution. In either case, ``embarrassingly parallel'' is in fact an embarrassment of riches. The current state of the art enumerates good designs; more work is required to make more general judgements on state-space size and structure.

Paul E. McKenney 2011-12-16