Prefetch buffer

A prefetch buffer is a data buffer employed on modern DRAM chips that allows quick and easy access to data located on a common physical row in the memory.

DRAM read operations involve three phases. In the first phase the bitlines are precharged to reference voltage level. In the second phase a row in the memory is activated causing all the bits on the row to be read into sense amplifiers (the bits are connected to the bitlines, which connect to the sense amps at the edge of the memory array). As there are many bits on a row (1K or 2K are common row widths), a third phase is required to select a specific dataword out of the row (column select phase). If the row has 2,048 bits and the IO width of the memory chip is 16 bits, there are 128 (2048/16) datawords to choose from. If the next memory access happens to occur on the same row, there is the opportunity to skip the first two phases and just perform phase three again, at a significant speed advantage over a full three phase access. This is possible even with older DRAMs. The drawback of the older column access method was that a new column address had to be sent for each additional dataword on the row. A prefetch buffer automates this process.

When a memory access occurs to a row, the prefetch buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants 4 adjacent datawords in memory. In practice this is often the case. For instance when a 64 bit CPU accesses a DRAM chip that is only 16 bits wide, it will need 4 adjacent datawords to make up the 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip, 4n means the total burst comprises 4 sets of data on the IOs). An 8n prefetch buffer on a 8 bit wide DRAM would also allow a 64 bit transfer.

The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core. Thus a 200 MHz memory core is combined with IOs that each operate at 200x8 = 1600 megabits/second. If the memory has 16 IOs, the total read bandwidth is 1600 Mbps/IO x 16 IOs = 25.6 gigabits/second (Gbps), or 3.2 gigabytes/second (GBps). Modules with multiple DRAM chips will have correspondingly higher bandwidth.

Each generation of SDRAM has a different prefetch buffer depth:

DDR SDRAM's prefetch buffer width is 2-bit.
DDR2 SDRAM's prefetch buffer width is 4-bit.
DDR3 SDRAM's prefetch buffer width is 8-bit.

Increased Bandwidth

The speed of memory has not historically increased inline with CPU improvements. In order to increase the bandwidth of memory modules the prefetch buffer reads data from multiple memory chips simultaneously. This is similar to a RAID array in the storage world. Also it is similar to the concept of Dual Channel memory - but the extra channels are internal to each module. Sequential access bandwidth is markedly improved using prefetch buffers, but random access is mostly unchanged.

Increased Bandwidth

See also