4.2.2. Cache Memory

The purpose of cache memory is to act as a buffer between the very limited, very high-speed CPU registers and the relatively slower and much larger main system memory -- usually referred to as RAM[11]. Cache memory has an operating speed similar to the CPU itself so, when the CPU accesses data in cache, the CPU is not kept waiting for the data.
Cache memory is configured such that, whenever data is to be read from RAM, the system hardware first checks to determine if the desired data is in cache. If the data is in cache, it is quickly retrieved, and used by the CPU. However, if the data is not in cache, the data is read from RAM and, while being transferred to the CPU, is also placed in cache (in case it is needed again later). From the perspective of the CPU, all this is done transparently, so that the only difference between accessing data in cache and accessing data in RAM is the amount of time it takes for the data to be returned.
In terms of storage capacity, cache is much smaller than RAM. Therefore, not every byte in RAM can have its own unique location in cache. As such, it is necessary to split cache up into sections that can be used to cache different areas of RAM, and to have a mechanism that allows each area of cache to cache different areas of RAM at different times. Even with the difference in size between cache and RAM, given the sequential and localized nature of storage access, a small amount of cache can effectively speed access to a large amount of RAM.
When writing data from the CPU, things get a bit more complicated. There are two different approaches that can be used. In both cases, the data is first written to cache. However, since the purpose of cache is to function as a very fast copy of the contents of selected portions of RAM, any time a piece of data changes its value, that new value must be written to both cache memory and RAM. Otherwise, the data in cache and the data in RAM would no longer match.
The two approaches differ in how this is done. One approach, known as write-through caching, immediately writes the modified data to RAM. Write-back caching, however, delays the writing of modified data back to RAM. The reason for doing this is to reduce the number of times a frequently-modified piece of data must be written back to RAM.
Write-through cache is a bit simpler to implement; for this reason it is most common. Write-back cache is a bit trickier to implement; in addition to storing the actual data, it is necessary to maintain some sort of mechanism capable of flagging the cached data as clean (the data in cache is the same as the data in RAM), or dirty (the data in cache has been modified, meaning that the data in RAM is no longer current). It is also necessary to implement a way of periodically flushing dirty cache entries back to RAM.

4.2.2.1. Cache Levels

Cache subsystems in present-day computer designs may be multi-level; that is, there might be more than one set of cache between the CPU and main memory. The cache levels are often numbered, with lower numbers being closer to the CPU. Many systems have two cache levels:
  • L1 cache is often located directly on the CPU chip itself and runs at the same speed as the CPU
  • L2 cache is often part of the CPU module, runs at CPU speeds (or nearly so), and is usually a bit larger and slower than L1 cache
Some systems (normally high-performance servers) also have L3 cache, which is usually part of the system motherboard. As might be expected, L3 cache would be larger (and most likely slower) than L2 cache.
In either case, the goal of all cache subsystems -- whether single- or multi-level -- is to reduce the average access time to the RAM.


[11] While "RAM" is an acronym for "Random Access Memory," and a term that could easily apply to any storage technology allowing the non-sequential access of stored data, when system administrators talk about RAM they invariably mean main system memory.