Interleaved memory

In computing, interleaved memory is a design which compensates for the relatively slow speed of dynamic random-access memory (DRAM) or core memory, by spreading memory addresses evenly across memory banks. That way, contiguous memory reads and writes use each memory bank in turn, resulting in higher memory throughput due to reduced waiting for memory banks to become ready for the operations.

It is different from multi-channel memory architectures, primarily as interleaved memory does not add more channels between the main memory and the memory controller. However, channel interleaving is also possible, for example in freescale i.MX6 processors, which allow interleaving to be done between two channels.^{[citation needed]}

Overview

With interleaved memory, memory addresses are allocated to each memory bank in turn. For example, in an interleaved system with two memory banks (assuming word-addressable memory), if logical address 32 belongs to bank 0, then logical address 33 would belong to bank 1, logical address 34 would belong to bank 0, and so on. An interleaved memory is said to be n-way interleaved when there are $n$ banks and memory location $i$ resides in bank $i mod n$ .

Memory interleaving example with 4 banks. Red banks are refreshing and can't be used.

Interleaved memory results in contiguous reads (which are common both in multimedia and execution of programs) and contiguous writes (which are used frequently when filling storage or communication buffers) actually using each memory bank in turn, instead of using the same one repeatedly. This results in significantly higher memory throughput as each bank has a minimum waiting time between reads and writes.

Interleaved DRAM

Main memory (random-access memory, RAM) is usually composed of a collection of DRAM memory chips, where a number of chips can be grouped together to form a memory bank. It is then possible, with a memory controller that supports interleaving, to lay out these memory banks so that the memory banks will be interleaved.

Data in DRAM is stored in units of pages. Each DRAM bank has a row buffer that serves as a cache for accessing any page in the bank. Before a page in the DRAM bank is read, it is first loaded into the row-buffer. If the page is immediately read from the row-buffer (or a row-buffer hit), it has the shortest memory access latency in one memory cycle. If it is a row buffer miss, which is also called a row-buffer conflict because the new page has to be loaded into the row-buffer before it is read. Row-buffer misses happen as access requests on different memory pages in the same bank are serviced. A row-buffer conflict incurs a substantial delay for a memory access. In contrast, memory accesses to different banks can proceed in parallel with a high throughput.

In traditional (flat) layouts, memory banks can be allocated a continuous block of memory addresses, which is very simple for the memory controller and gives equal performance in completely random access scenarios, when compared to performance levels achieved through interleaving. However, in reality memory reads are rarely random due to locality of reference, and optimizing for close together access gives far better performance in interleaved layouts.

Note that the way memory is addressed has no effect on the access time for memory locations which are already cached, having an impact only on memory locations which need to be retrieved from DRAM.

Issues with interleaved memory

DRAM row buffer conflicts can cause significant memory access latency. Under the conventional interleaved memory mapping, DRAM row buffer conflicts come from the following three sources. (1) Conflict misses in the last level on-chip cache lead to DRAM row buffer misses. (2) Write back conflicts in the last level on-chip cache also lead to DRAM row buffer conflicts. (3) Certain sequential memory access patterns that make the distance of memory locations between consecutive data elements being accessed be a multiple of the accumulative size of all row buffers of the memory banks, will cause row buffer conflicts. The findings are documented in a paper published in 2000.^[1]

Permutation-based page interleaving

The two architecture related sources of DRAM row-buffer conflicts indicate that the address mapping symmetry between cache and DRAM is a structural problem in the memory hierarchy under the conventional interleaved memory mapping. To break this symmetry needs an external force, and the permutation-based page interleaving effectively serves this purpose. Zhao Zhang, Zhichun Zhu and Xiaodong Zhang invented the permutation-based interleaved memory^[1], which has the following three properties. (1) Conflict addresses of the last level on-chip cache are distributed onto different DRAM banks. (2) All addresses in the same memory page are still in the same page as the conventional interleaved memory. (3) Memory pages are uniformly mapped among memory banks. The cost of permutation to generate each memory bank index is trivial, which is not in the critical path in the deep memory hierarchy, and can be overlapped with operation at the cache level.

History

Early research into interleaved memory was performed at IBM in the 60s and 70s in relation to the IBM 7030 Stretch computer,^[2] but development went on for decades improving design, flexibility and performance to produce modern implementations.

The permutation-based solution^[1] in interleaved memory was first used by Sun Microsystems in the UltraSPARC IIIi processor in 2001 for its entry level servers, workstations, and desktop products.^[3] Today, permutation-based page interleaving can be found in almost all commercial microprocessors, such as the AMD Geode,^[4]^[5] Intel Core processors,^[6]^[7] and others for embedded systems, laptops, desktops, and enterprise servers.

References

^ ^a ^b ^c Z. Zhang; Z. Zhu; X. Zhang (2000). A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality (PDF). 33rd Annual International Symposium on Microarchitecture (Micro-33).
^ Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.
^ "Acknowledgment from Sun Microsystems Inc. of the use of Permutation-based Interleaved Memory" (PDF). July 15, 2005.
^ "AMD Geode LX Processors Data Book" (PDF).
^ "AMD Geode GX3 User's Manual" (PDF).
^ "Intel Core i7 Processor Family Datasheet" (PDF).
^ "Mobile Intel 945 Express Chipset Family" (PDF).

External links

[:0-1] Z. Zhang; Z. Zhu; X. Zhang (2000). A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality (PDF). 33rd Annual International Symposium on Microarchitecture (Micro-33).

[2] Mark Smotherman (July 2010). "IBM Stretch (7030) — Aggressive Uniprocessor Parallelism". clemson.edu. Retrieved 2013-12-07.

[3] "Acknowledgment from Sun Microsystems Inc. of the use of Permutation-based Interleaved Memory" (PDF). July 15, 2005.

[4] "AMD Geode LX Processors Data Book" (PDF).

[5] "AMD Geode GX3 User's Manual" (PDF).

[6] "Intel Core i7 Processor Family Datasheet" (PDF).

[7] "Mobile Intel 945 Express Chipset Family" (PDF).

[1]

[2]

[3]

[4]

[5]

[6]

[7]