5.3 Atomic Operations

Given that Figure [*] shows that the overhead of reader-writer locking is most severe for the smallest critical sections, it would be nice to have some other way to protect the tiniest of critical sections. One such way are atomic operations. We have seen one atomic operations already, in the form of the __sync_fetch_and_add() primitive on line 18 of Figure [*]. This primitive atomically adds the value of its second argument to the value referenced by its first argument, returning the old value (which was ignored in this case). If a pair of threads concurrently execute __sync_fetch_and_add() on the same variable, the resulting value of the variable will include the result of both additions.

The gcc compiler offers a number of additional atomic operations, including __sync_fetch_and_sub(), __sync_fetch_and_or(), __sync_fetch_and_and(), __sync_fetch_and_xor(), and __sync_fetch_and_nand(), all of which return the old value. If you instead need the new value, you can instead use the __sync_add_and_fetch(), __sync_sub_and_fetch(), __sync_or_and_fetch(), __sync_and_and_fetch(), __sync_xor_and_fetch(), and __sync_nand_and_fetch() primitives.

Quick Quiz 5.19: Is it really necessary to have both sets of primitives? End Quick Quiz

The classic compare-and-swap operation is provided by a pair of primitives, __sync_bool_compare_and_swap() and __sync_val_compare_and_swap(). Both of these primitive atomically update a location to a new value, but only if its prior value was equal to the specified old value. The first variant returns 1 if the operation succeeded and 0 if it failed, for example, if the prior value was not equal to the specified old value. The second variant returns the prior value of the location, which, if equal to the specified old value, indicates that the operation succeeded. Either of the compare-and-swap operation is ``universal'' in the sense that any atomic operation on a single location can be implemented in terms of compare-and-swap, though the earlier operations are often more efficient where they apply. The compare-and-swap operation is also capable of serving as the basis for a wider set of atomic operations, though the more elaborate of these often suffer from complexity, scalability, and performance problems [Her90].

The __sync_synchronize() primitive issues a ``memory barrier'', which constrains both the compiler's and the CPU's ability to reorder operations, as discussed in Section [*]. In some cases, it is sufficient to constrain the compiler's ability to reorder operations, while allowing the CPU free rein, in which case the barrier() primitive may be used, as it in fact was on line 28 of Figure [*]. In some cases, it is only necessary to ensure that the compiler avoids optimizing away a given memory access, in which case the ACCESS_ONCE() primitive may be used, as it was on line 17 of Figure [*]. These last two primitives are not provided directly by gcc, but may be implemented straightforwardly as follows:



#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
#define barrier() __asm__ __volatile__("": : :"memory")


Quick Quiz 5.20: Given that these atomic operations will often be able to generate single atomic instructions that are directly supported by the underlying instruction set, shouldn't they be the fastest possible way to get things done? End Quick Quiz

Paul E. McKenney 2011-12-16