In the 1980s, it often took less time for a microprocessor to load a value from memory than it did to execute an instruction. In 2006, a microprocessor might be capable of executing hundreds or even thousands of instructions in the time required to access memory. This disparity is due to the fact that Moore's Law has increased CPU performance at a much greater rate than it has increased memory performance, in part due to the rate at which memory sizes have grown. For example, a typical 1970s minicomputer might have 4KB (yes, kilobytes, not megabytes, let alone gigabytes) of main memory, with single-cycle access. In 2008, CPU designers still can construct a 4KB memory with single-cycle access, even on systems with multi-GHz clock frequencies. And in fact they frequently do construct such memories, but they now call them ``level-0 caches''.
Although the large caches found on modern microprocessors can do quite a bit to help combat memory-access latencies, these caches require highly predictable data-access patterns to successfully hide memory latencies. Unfortunately, common operations, such as traversing a linked list, have extremely unpredictable memory-access patterns -- after all, if the pattern was predictable, us software types would not bother with the pointers, right?
Therefore, as shown in
Figure ,
memory references are often severe obstacles for modern CPUs.
Thus far, we have only been considering obstacles that can arise during a given CPU's execution of single-threaded code. Multi-threading presents additional obstacles to the CPU, as described in the following sections.
Paul E. McKenney 2011-12-16