September 1st, 2013 by
By Fran Litterio
We are squarely in the Symmetric Multiprocessor (SMP) era. Computer systems have ever-increasing numbers of processors and given that RTX and RTX64 support SMP execution environments, it’s natural for your application to run one or more threads on each real-time processor. One common task is sharing data between those threads. Nothing could be simpler: a thread on one processor stores data in memory and a thread on a different processor reads it.
Unfortunately, race conditions arise because of the way compilers optimize memory accesses and how processors implement memory ordering. To avoid problems caused by compiler optimizations, C and C++ programmers should make proper use of the volatile keyword, the atomic operations library defined in the C++11 standard, and (as a last resort) compiler intrinsics.
But how can programmers control processor memory ordering? The gory details are documented in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3: System Programming Guide available from Intel’s Web site. I’ll summarize the important parts here. The IA-32 (“x86”) and Intel® 64 (“x64”) architectures offer very strong memor- ordering guarantees compared to other architectures. These include:
- Reads are never reordered with other reads.
- Writes are never reordered with other writes, with some exceptions.
- Writes are never reordered with older reads.
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
- Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions (memory barriers).
The above rules apply within a single processor. Between processors in an SMP system, the rules include:
- Writes by a single processor are observed in the same order by all processors.
- Any two stores are seen in a consistent order by processors other than those performing the stores.
- Locked instructions have a total order that is seen the same by all processors.
The first one in this group is important. Many other architectures require you to use explicit memory barrier instructions to create that guarantee. Equally important is the last bullet: when combined with the earlier rule that reads and writes cannot be reordered with locked instructions, Intel offers the tools to create truly SMP-safe data sharing.
Does this mean you have to put locked instructions and memory barriers in your code to achieve SMP-safe data sharing? Not at all. Every Windows and RTX/RTX64 synchronization primitive and the C++11 atomic operations uses the correct instructions to prevent race conditions caused by memory ordering optimizations.
If you make proper use of standard synchronization primitives, such as SetEvent,WaitForSingleObject, EnterCriticalSection, and the C++11 atomic operations, you will have thread-safe and SMP-safe access to shared data.
But if you decide to roll your own synchronization primitives, because you think you can do it better than your operating system or your compiler vendor … beware! The exceptions and subtleties in the above rules can bite you, unless you make proper use of memory barriers, locked instructions, and compiler intrinsics.