{"id":254259,"date":"2013-09-26T18:20:20","date_gmt":"2013-09-26T10:20:20","guid":{"rendered":"http:\/\/blog.zhenglei.net\/?p=254259"},"modified":"2013-09-26T18:20:51","modified_gmt":"2013-09-26T10:20:51","slug":"fwd-weak-vs-strong-memory-models","status":"publish","type":"post","link":"https:\/\/blog.zhenglei.net\/?p=254259","title":{"rendered":"FWD: Weak vs. Strong Memory Models"},"content":{"rendered":"<p><a href=\"http:\/\/preshing.com\/20120930\/weak-vs-strong-memory-models\/\" title=\"weak-vs-strong-memory-models\">Weak vs. Strong Memory Models<\/a><\/p>\n<p>Sep 30, 2012<br \/>\nWeak vs. Strong Memory Models<\/p>\n<p>There are many types of memory reordering, and not all types of reordering occur equally often. It all depends on processor you\u2019re targeting and\/or the toolchain you\u2019re using for development.<\/p>\n<p>A memory model tells you, for a given processor or toolchain, exactly what types of memory reordering to expect at runtime relative to a given source code listing. Keep in mind that the effects of memory reordering can only be observed when lock-free programming techniques are used.<\/p>\n<p>After studying memory models for a while \u2013 mostly by reading various online sources and verifying through experimentation \u2013 I\u2019ve gone ahead and organized them into the following four categories. Below, each memory model makes all the guarantees of the ones to the left, plus some additional ones. I\u2019ve drawn a clear line between weak memory models and strong ones, to capture the way most people appear to use these terms. Read on for my justification for doing so.<\/p>\n<p>Each physical device pictured above represents a hardware memory model. A hardware memory model tells you what kind of memory ordering to expect at runtime relative to an assembly (or machine) code listing.<\/p>\n<p>Every processor family has different habits when it comes to memory reordering, and those habits can only be observed in multicore or multiprocessor configurations. Given that multicore is now mainstream, it\u2019s worth having some familiarity with them.<\/p>\n<p>There are software memory models as well. Technically, once you\u2019ve written (and debugged) portable lock-free code in C11, C++11 or Java, only the software memory model is supposed to matter. Nonetheless, a general understanding of hardware memory models may come in handy. It can help you explain unexpected behavior while debugging, and \u2014 perhaps just as importantly \u2014 appreciate how incorrect code may function correctly on a specific processor and toolchain out of luck.<br \/>\nWeak Memory Models<\/p>\n<p>In the weakest memory model, it\u2019s possible to experience all four types of memory reordering I described using a source control analogy in a previous post. Any load or store operation can effectively be reordered with any other load or store operation, as long as it would never modify the behavior of a single, isolated thread. In reality, the reordering may be due to either compiler reordering of instructions, or memory reordering on the processor itself.<\/p>\n<p>When a processor has a weak hardware memory model, we tend to say it\u2019s weakly-ordered or that it has weak ordering. We may also say it has a relaxed memory model. The venerable DEC Alpha is everybody\u2019s favorite example of a weakly-ordered processor. There\u2019s really no mainstream processor with weaker ordering.<\/p>\n<p>The C11 and C++11 programming languages expose a weak software memory model which was in many ways influenced by the Alpha. When using low-level atomic operations in these languages, it doesn\u2019t matter if you\u2019re actually targeting a strong processor family such as x86\/64. As I demonstrated previously, you must still specify the correct memory ordering constraints, if only to prevent compiler reordering.<br \/>\nWeak With Data Dependency Ordering<\/p>\n<p>Though the Alpha has become less relevant with time, we still have several modern CPU families which carry on in the same tradition of weak hardware ordering:<\/p>\n<p>    ARM, which is currently found in hundreds of millions of smartphones and tablets, and is increasingly popular in multicore configurations.<br \/>\n    PowerPC, which the Xbox 360 in particular has already delivered to 70 million living rooms in a multicore configuration.<br \/>\n    Itanium, which Microsoft no longer supports in Windows, but which is still supported in Linux and found in HP servers.<\/p>\n<p>These families have memory models which are, in various ways, almost as weak as the Alpha\u2019s, except for one common detail of particular interest to programmers: they maintain data dependency ordering. What does that mean? It means that if you write A-&gt;B in C\/C++, you are always guaranteed to load a value of B which is at least as new as the value of A. The Alpha doesn\u2019t guarantee that. I won\u2019t dwell on data dependency ordering too much here, except to mention that the Linux RCU mechanism relies on it heavily.<br \/>\nStrong Memory Models<\/p>\n<p>Let\u2019s look at hardware memory models first. What, exactly, is the difference between a strong one and a weak one? There is actually a little disagreement over this question, but my feeling is that in 80% of the cases, most people mean the same thing. Therefore, I\u2019d like to propose the following definition:<\/p>\n<p>    A strong hardware memory model is one in which every machine instruction comes implicitly with acquire and release semantics. As a result, when one CPU core performs a sequence of writes, every other CPU core sees those values change in the same order that they were written.<\/p>\n<p>It\u2019s not too hard to visualize. Just imagine a refinement of the source control analogy where all modifications are committed to shared memory in-order (no StoreStore reordering), pulled from shared memory in-order (no LoadLoad reordering), and instructions are always executed in-order (no LoadStore reordering). StoreLoad reordering, however, still remains possible.<\/p>\n<p>Under the above definition, the x86\/64 family of processors is usually strongly-ordered. There are certain cases in which some of x86\/64\u2019s strong ordering guarantees are lost, but for the most part, as application programmers, we can ignore those cases. It\u2019s true that a x86\/64 processor can execute instructions out-of-order, but that\u2019s a hardware implementation detail \u2013 what matters is that it still keeps its memory interactions in-order, so in a multicore environment, we can still consider it strongly-ordered. Historically, there has also been a little confusion due to evolving specs.<\/p>\n<p>Apparently SPARC processors, when running in TSO mode, are another example of a strong hardware ordering. TSO stands for \u201ctotal store order\u201d, which in a subtle way, is different from the definition I gave above. It means that there is always a single, global order of writes to shared memory from all cores. The x86\/64 has this property too: See Volume 3, sections 8.2.3.6-8 of Intel\u2019s x86\/64 Architecture Specification for some examples. From what I can tell, the TSO property isn\u2019t usually of direct interest to low-level lock-free programmers, but it is a step towards sequential consistency.<br \/>\nSequential Consistency<\/p>\n<p>In a sequentially consistent memory model, there is no memory reordering. It\u2019s as if the entire program execution is reduced to a sequential interleaving of instructions from each thread. In particular, the result r1 = r2 = 0 from Memory Reordering Caught in the Act becomes impossible.<\/p>\n<p>These days, you won\u2019t easily find a modern multicore device which guarantees sequential consistency at the hardware level. However, it seems at least one sequentially consistent, dual-processor machine existed back in 1989: The 386-based Compaq SystemPro. According to Intel\u2019s docs, the 386 wasn\u2019t advanced enough to perform any memory reordering at runtime.<\/p>\n<p>In any case, sequential consistency only really becomes interesting as a software memory model, when working in higher-level programming languages. In Java 5 and higher, you can declare shared variables as volatile. In C++11, you can use the default ordering constraint, memory_order_seq_cst, when performing operations on atomic library types. If you do those things, the toolchain will restrict compiler reordering and emit CPU-specific instructions which act as the appropriate memory barrier types. In this way, a sequentially consistent memory model can be \u201cemulated\u201d even on weakly-ordered multicore devices. If you read Herlihy &amp; Shavit\u2019s The Art of Multiprocessor Programming, be aware that most of their examples assume a sequentially consistent software memory model.<br \/>\nFurther Details<\/p>\n<p>There are many other subtle details filling out the spectrum of memory models, but in my experience, they haven\u2019t proved quite as interesting when writing lock-free code at the application level. There are things like control dependencies, causal consistency, and different memory types. Still, most discussions come back the four main categories I\u2019ve outlined here.<\/p>\n<p>If you really want to nitpick the fine details of processor memory models, and you enjoy eating formal logic for breakfast, you can check out the admirably detailed work done at the University of Cambridge. Paul McKinney has written an accessible overview of some of their work and its associated tools.<\/p>\n<p>\u00ab Acquire and Release Semantics This Is Why They Call It a Weakly-Ordered CPU \u00bb<br \/>\nRecent Posts<\/p>\n<p>    Acquire and Release Fences<br \/>\n    The Synchronizes-With Relation<br \/>\n    The Happens-Before Relation<br \/>\n    Atomic vs. Non-Atomic Operations<br \/>\n    The World&#8217;s Simplest Lock-Free Hash Table<br \/>\n    A Lock-Free&#8230; Linear Search?<br \/>\n    Introducing Mintomic: A Small, Portable Lock-Free API<br \/>\n    View Your Filesystem History Using Python<br \/>\n    This Hash Table Is Faster Than a Judy Array<br \/>\n    How to Generate a Sequence of Unique Random Integers<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Weak vs. Strong Memory Models Sep 30, 20 &hellip; <a href=\"https:\/\/blog.zhenglei.net\/?p=254259\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[],"class_list":["post-254259","post","type-post","status-publish","format-standard","hentry","category-software-download"],"_links":{"self":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/254259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=254259"}],"version-history":[{"count":3,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/254259\/revisions"}],"predecessor-version":[{"id":254262,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/254259\/revisions\/254262"}],"wp:attachment":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=254259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=254259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=254259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}