C.3 Stores Result in Unnecessary Stalls

Although the cache structure shown in Figure [*] provides good performance for repeated reads and writes from a given CPU to a given item of data, its performance for the first write to a given cache line is quite poor. To see this, consider Figure [*], which shows a timeline of a write by CPU 0 to a cacheline held in CPU 1's cache. Since CPU 0 must wait for the cache line to arrive before it can write to it, CPU 0 must stall for an extended period of time.C.3

Figure: Writes See Unnecessary Stalls
\includegraphics{appendix/whymb/cacheSCwrite}

But there is no real reason to force CPU 0 to stall for so long -- after all, regardless of what data happens to be in the cache line that CPU 1 sends it, CPU 0 is going to unconditionally overwrite it.



Subsections

Paul E. McKenney 2011-12-16