Although these four capabilities are fundamental,
good engineering practice uses composites of
these capabilities.
For example, the data-parallel approach first
partitions the data so as to minimize the need for
inter-partition communication, partitions the code accordingly,
and finally maps data partitions and threads so as to maximize
throughput while minimizing inter-thread communication,
as shown in
Figure .
The developer can then
consider each partition separately, greatly reducing the size
of the relevant state space, in turn increasing productivity.
Of course, some problems are non-partitionable but on the other hand,
clever transformations into forms permitting partitioning can greatly
enhance
both performance and scalability [Met99].