Prefetch buffer

A prefetch buffer is a data buffer employed on modern DRAM chips that allows quick and easy access to multiple datawords located on a common physical row in the memory.

The prefetch buffer takes advantage of the specific characteristics of memory accesses to a DRAM. Typical DRAM memory operations involve three phases (bitline precharge, row access, column access). Row access is the heart of a read operation as it involves the careful sensing of the signals in the row of bits that are being accessed. This is the long and slow phase of memory operation. However once a row is read, subsequent column accesses to that same row can be very quick, as the sense amp latches act as a wide buffer for the data that was read out of the row (a typical row width is 2,048 bits, which are read into 2,048 sense amp latches during the row access).

Traditional DRAM architectures have long supported fast column access to the bits on an open row. For an 8 bit wide memory chip with a 2,048 bit wide row, accesses to any of the 256 datawords (2048/8) on the row can be very quick, provided no intervening access to another row occurs.

The drawback of the older column access method was that a new column address had to be sent for each additional dataword on the row. The the address bus had to operate at the same frequency as the data bus. A prefetch buffer automates this process and allows a single address request to result in multiple data words.

When a memory access occurs to a row, the prefetch buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants 4 adjacent datawords in memory. In practice this is often the case. For instance when a 64 bit CPU accesses a DRAM chip that is only 16 bits wide, it will need 4 adjacent datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip, 4n means the total burst comprises 4 sets of data on the IOs). An 8n prefetch buffer on a 8 bit wide DRAM would also accomplish a 64 bit transfer.

The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core. Thus a 200 MHz memory core is combined with IOs that each operate at 200x8 = 1600 megabits/second. If the memory has 16 IOs, the total read bandwidth is 1600 Mbps/IO x 16 IOs = 25.6 gigabits/second (Gbps), or 3.2 gigabytes/second (GBps). Modules with multiple DRAM chips will have correspondingly higher bandwidth.

Each generation of SDRAM has a different prefetch buffer depth:

DDR SDRAM's prefetch buffer width is 2-bit.
DDR2 SDRAM's prefetch buffer width is 4-bit.
DDR3 SDRAM's prefetch buffer width is 8-bit.

Increased Bandwidth

The speed of memory has not historically increased inline with CPU improvements. In order to increase the bandwidth of memory modules the prefetch buffer reads data from multiple memory chips simultaneously. This is similar to a RAID array in the storage world. Also it is similar to the concept of Dual Channel memory - but the extra channels are internal to each module. Sequential access bandwidth is markedly improved using prefetch buffers, but random access is mostly unchanged.

Increased Bandwidth

See also