C.9 Advice to Hardware Designers

There are any number of things that hardware designers can do to make the lives of software people difficult. Here is a list of a few such things that we have encountered in the past, presented here in the hope that it might help prevent future such problems:

  1. I/O devices that ignore cache coherence.

    This charming misfeature can result in DMAs from memory missing recent changes to the output buffer, or, just as bad, cause input buffers to be overwritten by the contents of CPU caches just after the DMA completes. To make your system work in face of such misbehavior, you must carefully flush the CPU caches of any location in any DMA buffer before presenting that buffer to the I/O device. And even then, you need to be very careful to avoid pointer bugs, as even a misplaced read to an input buffer can result in corrupting the data input!

  2. External busses that fail to transmit cache-coherence data.

    This is an even more painful variant of the above problem, but causes groups of devices--and even memory itself--to fail to respect cache coherence. It is my painful duty to inform you that as embedded systems move to multicore architectures, we will no doubt see a fair number of such problems arise. Hopefully these problems will clear up by the year 2015.

  3. Device interrupts that ignore cache coherence.

    This might sound innocent enough -- after all, interrupts aren't memory references, are they? But imagine a CPU with a split cache, one bank of which is extremely busy, therefore holding onto the last cacheline of the input buffer. If the corresponding I/O-complete interrupt reaches this CPU, then that CPU's memory reference to the last cache line of the buffer could return old data, again resulting in data corruption, but in a form that will be invisible in a later crash dump. By the time the system gets around to dumping the offending input buffer, the DMA will most likely have completed.

  4. Inter-processor interrupts (IPIs) that ignore cache coherence.

    This can be problematic if the IPI reaches its destination before all of the cache lines in the corresponding message buffer have been committed to memory.

  5. Context switches that get ahead of cache coherence.

    If memory accesses can complete too wildly out of order, then context switches can be quite harrowing. If the task flits from one CPU to another before all the memory accesses visible to the source CPU make it to the destination CPU, then the task could easily see the corresponding variables revert to prior values, which can fatally confuse most algorithms.

  6. Overly kind simulators and emulators.

    It is difficult to write simulators or emulators that force memory re-ordering, so software that runs just fine in these these environments can get a nasty surprise when it first runs on the real hardware. Unfortunately, it is still the rule that the hardware is more devious than are the simulators and emulators, but we hope that this situation changes.

Again, we encourage hardware designers to avoid these practices!

Paul E. McKenney 2011-12-16