Cache placement policies

This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template. Cache is a memory which holds the recent data utilized by the processor. A block of memory cannot be placed randomly in cache and is restricted to a single cache line by “Placement Policy”. In other words, Placement Policy determines where a particular memory block can be placed when it goes into the cache.

There are three different policies available for placement of a block.

1. Direct Mapping Technique:

In a direct mapped cache structure, the cache is organized into multiple sets with a single cache line per each set. Based on the address of the memory block, it can only occupy a single cache line. The cache can be framed as a (n*1) column matrix.

To place a block in the cache:

The set is determined with the index bits derived from the address of the memory block.
The memory block is placed in the set identified and the Tag is stored in the tag field associated with the set.
If the cache line is previously occupied, then the new data replaces the memory block in cache

To search a word in the cache:

The set is identified from the index bits of the address.
The Tag bits derived from the memory block address are compared with the Tag bits associated with the set. If the Tag matches, then there is a cache hit and the cache block is returned to the processor. Else there is a cache miss and the memory block is fetched from the lower memory

Advantages:

This placement policy is power efficient as it avoids the search through all the cache lines.
The placement policy and the replacement policy is simple as the index bits determine the set to be placed.
It requires cheap hardware as only one tag needs to be checked at a time

Disadvantage:

It has lower cache hit rate, as there is only one cache line available in a set. Every time a new memory is referenced to the same set, the cache line is replaced.

Example:

Consider Main memory of 16 Kilobytes, which is organized as 4 byte blocks and Cache of 256 bytes and block size of 4 bytes.

Since each cache block is of size 4 bytes, total number of sets in the cache is 256/4, which equals 64 sets or cache lines.

The incoming address to the cache is divided into bits for Offset, Index and Tag.

Offset corresponds to the bits used to determine the byte to be accessed from the cache line

In the example, the offset bits are 2 which are used to address the 4 bytes of the cache line

Index corresponds to bits used to determine the set of the Cache.

In the example, the index bits are 6 which are used to address the 64 lines of the cache.

Tag corresponds to the remaining bits.

In the example, the tag bits are 8 (16 – (6+2)), which are stored in tag to match the address on cache request.

Address 0x0000(Tag - 0000_0000, Index – 00_0000, Offset – 00) maps to block 0 of the memory and occupies the set 0 of the cache.

Address 0x0004(Tag - 0000_0000, Index – 00_0001, Offset – 00) maps to block 1 of the memory and occupies the set 1 of the cache.

Similarly, address 0x00FF(Tag – 0000_0000, Index – 11_1111, Offset – 11) maps to block 63 of the memory and occupies the set 63 of the cache.

Address 0x0100(Tag – 0000_0001, Index – 00_0000, Offset – 00) maps to block 64 of the memory and occupies the set 0 of the cache.

^[1]

2. Fully associative mapping:

In a Fully associative cache, the cache is organized into a single cache set with multiple cache lines. A memory block can occupy any of the cache lines. The cache organization can be framed as (1*m) row matrix.

While placing a block into the cache, the cache line is determined by the replacement policy of the cache.

To search a block into the cache:

The Tag field of the memory address is compared with tag bits associated with all the cache lines. If it matches, the block is present in the cache and is a cache hit. If it doesn’t match, then it’s a cache miss and has to be fetched from the lower memory

Based on the Offset, a byte is selected and returned to the processor

Advantages:

Fully associative cache structure provides us flexibility of placing memory block in any line and hence full utilization of the cache.
The placement policy is power hungry as it has to iterate over entire cache set to locate a block.
The placement policy provides better cache hit rate.
It offers flexibility of utilizing wide variety of replacements algorithms if cache miss occurs.

Disadvantage:

The placement policy is slow as it takes time to iterate through all the lines.
The most expensive of all methods, due to the high cost of associative-comparison hardware.

Example:

Consider Main memory of 16 Kilobytes, which is organized as 4 byte blocks and Cache of 256 bytes and block size of 4 bytes.

Since each cache block is of size 4 bytes, total number of sets in the cache is 256/4, which equals 64 sets or cache lines.

The incoming address to the cache is divided into bits for Offset and Tag.

Offset corresponds to the bits used to determine the byte to be accessed from the cache line.

In the example, the offset bits are 2 which are used to address the 4 bytes of the cache line and the remaining bits form the Tag.

In the example, the tag bits are 14 (16 – 2), which are stored in Tag field of the cache line to match the address on cache request.

Since any block of cache can be mapped to any cache line, the memory block can reside in any of the cache lines if available else cache line will be replaced based on replacement policy.

3. Set Associative Cache:

Set associative cache is a trade-off between Direct mapped cache and Fully associative mapped cache.

The Set associative cache can be imagined as a (n*m) matrix. The cache is divided into ‘n’ sets and each set contains ‘m’ cache lines. A memory block is first mapped onto a set and then placed into any cache line of the set.

The range of caches from direct mapped to fully associative is a continuum of levels of set associativity. (Direct mapped is one-way set associative and Fully associative cache with m blocks is m -way set associative.)

Many processor caches in today's’ design are either direct mapped, two-way set associative, or four-way set associative.

To place a block in the cache:

The set is determined with the index bits derived from the address of the memory block.
The memory block is placed in the set identified and the Tag is stored in the tag field associated with the set.
If the cache line is occupied, then the new data replaces the cache block identified with the help of replacement policy

To locate a block in the cache:

The set is determined with the index bits derived from the address of the memory block.
The tag bits are compared with the tag of all cache lines present in selected set. If the tag matches, then it’s a cache hit and appropriate byte is fetched and delivered to the processor. If the tag doesn’t match, then it’s a cache miss and is fetched from the lower memory.

Advantages:

The placement policy as previously mentioned is a trade-off between direct mapped and fully associative cache.

Disadvantages:

The placement policy will not effectively use all the available cache lines in the cache and suffers from conflict miss.

Example:

Consider Main memory of 16 Kilobytes, which is organized as 4 byte blocks and Cache of 256bytes and block size of 4 bytes and 2-way set associative.

Since each cache block is of size 4 bytes, total number of sets in the cache is 256/4, which equals 64 sets or cache lines.

In the example, the offset bits are 2 which are used to address the 4 bytes of the cache line, the index bits are 5 which are used to address the 32 lines of the cache and the tag bits are 9 (16 – (5+2)), which are stored in tag to match the address on cache request.

Address 0x0000(Tag – 0_0000_0000, Index – 0_0000, Offset – 00) maps to block 0 of the memory and occupies the set 0 of the cache. The block occupies one of the cache lines and is determined by the replacement policy for the cache.

Address 0x0004(Tag – 0_0000_0000, Index – 0_0001, Offset – 00) maps to block 1 of the memory and occupies one of the cache lines of the set 1 of the cache.

Similarly, address 0x00FF(Tag – 0_0000_0001, Index – 1_1111, Offset – 11) maps to block 63 of the memory and occupies one of the cache lines of the set 31 of the cache.

Address 0x0100(Tag – 0_0000_0010, Index – 0_0000, Offset – 00) maps to block 64 of the memory and occupies one of the cache lines of set 0 of the cache.

^ Solihin, Yan. Fundamentals of Parallel Multi-core Architecture by Yan Solihin. Chapman & Hall/CRC Computational Science. ISBN 978-1482211184.

[1] Solihin, Yan. Fundamentals of Parallel Multi-core Architecture by Yan Solihin. Chapman & Hall/CRC Computational Science. ISBN 978-1482211184.

[1]