Although the cache structure shown in
Figure
provides good performance for repeated reads and writes from a given CPU
to a given item of data, its performance for the first write to
a given cache line is quite poor.
To see this, consider
Figure
,
which shows a timeline of a write by CPU 0 to a cacheline held in
CPU 1's cache.
Since CPU 0 must wait for the cache line to arrive before it can
write to it, CPU 0 must stall for an extended period of time.C.3
But there is no real reason to force CPU 0 to stall for so long -- after all, regardless of what data happens to be in the cache line that CPU 1 sends it, CPU 0 is going to unconditionally overwrite it.