Processor consistency
Processor Consistency is one of the consistency models used in the domain of concurrent computing (e.g. in distributed shared memory, distributed transactions, etc.).
A system exhibits Processor Consistency if the order in which other processors see the writes from any individual processor is the same as the the order they were issued. It is weaker than the Causal Consistency model because it does not require writes from all processors to be seen in the same order, but stronger than the PRAM Consistency model because it requires Cache Coherence[1]. Another difference between Causal Consistency and Processor Consistency is that Processor Consistency removes the requirements for loads to wait for stores to complete, and for Write Atomicity[1]. Processor Consistency is also stronger than Cache Consistency because Processor Consistency requires all writes by a processor to be seen in order, not just writes to the same memory location[1].
By Example
P1 | W(x)1 | ||||
---|---|---|---|---|---|
P2 | R(x)1 | W(x)2 | |||
P3 | R(x)2 | R(x)1 | |||
P4 | R(x)1 | R(x)2 |
P1 | W(x)2 | W(y)4 | W(x)3 | W(y)1 | |
---|---|---|---|---|---|
P2 | R(y)4 | R(x)1 | R(y)2 | R(x)2 |
In the first example to the right, the strongest applicable consistency model is Processor Consistency. This is trivial to determine because there is only at most one write per processor. This example is not causally consistent, however, because with R(x)1 occurring before W(x)2, the value written in W(x)2 might differ, assuming W(x)2 is dependent on R(x)1.
The system in the second example is not processor consistent, because some writes by the same processor are seen out of order by other processors. More specifically, writes to a single location are seen in order, but the write to x by P1 is not seen by P2 before the write to y. The fact that the only writes seen in order are writes to the same memory location limits this example to Cache Consistency.
Processor Consistency vs. Sequential Consistency
Processor Consistency (PC) relaxes the ordering between older stores and younger loads that is enforced in Sequential consistency (SC)[2]. This allows loads to be issued to the cache and potentially complete before older stores, meaning that stores can be queued in a write buffer without the need for load speculation to be implemented (the loads can continue freely) [3]. In this regard, PC performs better than SC because recovery techniques for failed speculations aren’t necessary, which means fewer pipeline flushes [3]. The prefetching optimization that SC systems employ is also applicable to PC systems [3]. Prefetching is the act of fetching data in advance for upcoming loads and stores before it is actually needed, to cut down on load/store latency. Since PC reduces load latency by allowing loads to be re-ordered before corresponding stores, the need for prefetching is somewhat reduced, as the prefetched data will be used more for stores than for loads[3].
In terms of how well a PC system follows a programmer’s intuition, it turns out that in properly synchronized systems, the outcomes of PC and SC are the same[3]. This is a direct consequence of the fact that corresponding loads and stores in PC systems are still ordered with respect to each other[3]. In lock synchronization, the only operation whose behavior is not fully defined by PC is the lock-acquire store, where subsequent loads are in the critical section and their order affects the outcome[3]. This operation, however, is usually implemented with a store conditional or atomic instruction, so that if the operation fails it will be repeated later and all the younger loads will also be repeated[3]. All loads occurring before this store are still ordered with respect to the loads occurring in the critical section, and as such all the older loads have to complete before loads in the critical section can run.
See Also
References
- ^ a b c David Mosberger (1992). "Memory Consistency Models" (PDF). University of Arizona. Retrieved 2015-04-01.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, John Hennessy (1 August 1998). "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors" (PDF). ACM. Retrieved 2015-04-01.
{{cite journal}}
: Cite journal requires|journal=
(help)CS1 maint: multiple names: authors list (link) - ^ a b c d e f g h Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 297–299. ISBN 978-0-9841630-0-7.