In the previous section, memory barriers were used to mark entries in both the store buffer and the invalidate queue. But in our code fragment, foo() had no reason to do anything with the invalidate queue, and bar() similarly had no reason to do anything with the store queue.
Many CPU architectures therefore provide weaker memory-barrier instructions that do only one or the other of these two. Roughly speaking, a ``read memory barrier'' marks only the invalidate queue and a ``write memory barrier'' marks only the store buffer, while a full-fledged memory barrier does both.
The effect of this is that a read memory barrier orders only loads on the CPU that executes it, so that all loads preceding the read memory barrier will appear to have completed before any load following the read memory barrier. Similarly, a write memory barrier orders only stores, again on the CPU that executes it, and again so that all stores preceding the write memory barrier will appear to have completed before any store following the write memory barrier. A full-fledged memory barrier orders both loads and stores, but again only on the CPU executing the memory barrier.
If we update foo and bar to use read and write memory barriers, they appear as follows:
1 void foo(void) 2 { 3 a = 1; 4 smp_wmb(); 5 b = 1; 6 } 7 8 void bar(void) 9 { 10 while (b == 0) continue; 11 smp_rmb(); 12 assert(a == 1); 13 } |
Some computers have even more flavors of memory barriers, but understanding these three variants will provide a good introduction to memory barriers in general.
Paul E. McKenney 2011-12-16