Parallel external memory
Parallel external memory (model)
In computer science, a parallel external memory (PEM) model is a cache-aware, external-memory abstract machine.[1] It is the parallel-computing analogy to the single-processor external memory (EM) model. In a similar way, it is the cache-aware analogy to the parallel random-access machine (PRAM). The PEM model consists of a number of processors, together with their respective private caches and a shared main memory.
Model
Definition
The PEM model[1] is a combination of the EM model and the PRAM model. The PEM model is a computation model which consists of processors and a two-level memory hierarchy. This memory hierarchy consists of a large external memory (main memory) of size and small internal memories (caches). The processors share the The main memory. Each cache is exclusive to a single processor. A processor can't access another’s cache. The caches have a size which is partitioned in blocks of size . The processors can only perform operations on data which are in their cache. The data can be transferred between the main memory and the cache in blocks of size .
I/O complexity
The complexity measure of the PEM model is the I/O complexity[1], which determines the number of parallel blocks transfers between the main memory and the cache. During a parallel block transfer each processor can transfer a block. So if processors load parallelly a data block of size form the main memory into their caches, it is considered as an I/O complexity of not . A program in the PEM model should minimize the data transfer between main memory and caches and operate as much as possible on the data in the caches.
Read / Write conflicts
In the PEM model, there is no direct communication network between the P processors. The processors have to communicate indirectly over the main memory. If multiple processors try to access the same block in main memory concurrently read/write conflicts[1] occur. Like in the PRAM model, three different variations of this problem are considered:
- Concurrent Read Concurrent Write (CRCW): The same block in main memory can be read and written by multiple processors concurrently.
- Concurrent Read Exclusive Write (CREW): The same block in main memory can be read by multiple processors concurrently. Only one processor can write to a block at a time.
- Exclusive Read Exclusive Write (EREW): The same block in main memory cannot be read or written by multiple processors concurrently. Only one processor can access a block at a time.
The following two algorithms[1] solve the CREW and EREW problem if processors write to the same block simultaneously. A first approach is to serialize the write operations. Only one processor after the other writs to the block. This results in a total of parallel block transfers. A second approach needs parallel block transfers and an additional block for each processor. The main idea is to schedule the write operations in a binary tree fashion and gradually combine the data into a single block. In the first round processors combine their blocks into blocks. Then processors combine the blocks into . This procedure is continued until all the data is combined in one block.
Examples
Multiway partitioning
Let be a vector of d-1 pivots sorted in increasing order. Let be am unordered set of N elements. A d-way partition[1] of is a set , where and for . is called the i-th bucket. The number of elements in is greater than and smaller than . In the following algorithm[1] the input is partitioned into N/P-sized contiguous segments in main memory. The processor i primarily works on the segment . The multiway partitioning algorithm (PEM_DIST_SORT
[1]) uses a PEM prefix sum algorithmCite error: A <ref>
tag is missing the closing </ref>
(see the help page).
|
|
|-
!Euler tour[2]
|
|
|-
!Expression tree evaluation[2]
|
|
|-
!Finding a MST[2]
|
|
|}
Where is the time it takes to sort items with processors in the PEM model.
See also
- ...
References
- ^ a b c d e f g h Arge, Lars; Goodrich, Michael T.; Nelson, Michael; Sitchinava, Nodari (2008). "Fundamental parallel algorithms for private-cache chip multiprocessors". Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures - SPAA '08. New York, New York, USA: ACM Press. doi:10.1145/1378533.1378573. ISBN 9781595939739.
- ^ a b c Cite error: The named reference
:1
was invoked but never defined (see the help page).