CAR: CLOCK with ADAPTIVE REPLACEMENT

Sorav Bansal, Stanford University
Dharmendra S. Modha, IBM Research
The Caching Problem

expensive, but fast

cheap, but slow

How to manage the cache?

Assume demand paging: Which page to replace?

How to maximize the hit ratio?
<table>
<thead>
<tr>
<th></th>
<th>L</th>
<th>R</th>
<th>U</th>
</tr>
</thead>
<tbody>
<tr>
<td>Recency</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Constant Time</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Recency</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Constant Time</td>
<td>☻</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Scan Resistance</td>
<td>☻</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Frequency”</td>
<td>☻</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>LRU</td>
<td>LFU</td>
<td></td>
</tr>
<tr>
<td>----------------</td>
<td>-----</td>
<td>-----</td>
<td></td>
</tr>
<tr>
<td>Recency</td>
<td>😊</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Constant Time</td>
<td>😊</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Scan Resistance</td>
<td>😊</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Frequency”</td>
<td>😊</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>L R U</td>
<td>L F U</td>
<td>A R C</td>
</tr>
<tr>
<td>----------------</td>
<td>-------</td>
<td>-------</td>
<td>-------</td>
</tr>
<tr>
<td><strong>Recency</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td><strong>Constant Time</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td><strong>Scan Resistance</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td><strong>“Frequency”</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td></td>
<td>L</td>
<td>R</td>
<td>U</td>
</tr>
<tr>
<td>------------------------</td>
<td>---</td>
<td>---</td>
<td>---</td>
</tr>
<tr>
<td>Recency</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Constant Time</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Scan Resistance</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>“Frequency”</td>
<td>☺</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Lock Contention/</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MRU Overhead</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
“In Multics a paging algorithm has been developed that has the implementation ease and low overhead of the FIFO and is an approximation to LRU.”

Fernando J. Corbato, 1990 Turing Award Winner
<table>
<thead>
<tr>
<th></th>
<th>L R U</th>
<th>L F U</th>
<th>A R C</th>
<th>C L O K</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Recency</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td><strong>Constant Time</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td><strong>Scan Resistance</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td></td>
</tr>
<tr>
<td><strong>“Frequency”</strong></td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td></td>
</tr>
<tr>
<td><strong>Lock Contention/MRU Overhead</strong></td>
<td></td>
<td></td>
<td></td>
<td>☺</td>
</tr>
</tbody>
</table>
**HIT:** Set the reference bit to “1”

**MISS:** Insert at the TAIL, initialize the reference bit to “0”
REPLACEMENT POLICY: Evict the first “0” page
Reset “1” pages to “0”—“second chance”—“delayed MRU”
CLOCK Applications and Exposition

- Multics
- UNIX/AIX/LINUX/BSD
- VAX/VMS
- DB2
- Oracle? Windows? Solaris?

- Major OS Textbooks
  - Tanebaum & Woodhull
  - Silberschatz & Galvin
Prior Work on LRU versus CLOCK

- **LRU**
  - (LFU)
  - FBR
  - LRU-2
  - 2Q
  - LRFU
  - LIRS
  - MQ
  - ARC

- **CLOCK**
  - GCLOCK
  - Two-handed CLOCK
Prior Work on LRU versus CLOCK

**LRU**
- (LFU)
- FBR
- LRU-2
- 2Q
- LRFU
- LIRS
- MQ
- ARC

**CLOCK**
- GCLOCK
- Two-handed CLOCK

1968
Prior Work on LRU versus CLOCK

- **LRU**
  - (LFU)
  - FBR
  - LRU-2
  - 2Q
  - LRFU
  - LIRS
  - MQ
  - ARC

- **CLOCK**
  - GCLOCK
  - Two-handed CLOCK

1968 1998
<table>
<thead>
<tr>
<th>Feature</th>
<th>LRU</th>
<th>LFU</th>
<th>ARC</th>
<th>CLOCK</th>
<th>CAR</th>
</tr>
</thead>
<tbody>
<tr>
<td>Recency</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td>Constant Time</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
<tr>
<td>Scan Resistance</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td></td>
<td>☺</td>
</tr>
<tr>
<td>“Frequency”</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td></td>
<td>☺</td>
</tr>
<tr>
<td>Lock Contention/ MRU Overhead</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
<td>☺</td>
</tr>
</tbody>
</table>
"Recency"

T₁

B₁

"Frequency"

T₂

B₂
CLOCKs T1 and T2 contain cache pages
LRU lists B1 and B2 contain recently evicted history pages
Size of T1 roughly equals B2
Size of T2 roughly equals B1
"Recency"

T1(0) & B1 pages seen exactly once recently

T1(1), T2, & B2 pages seen at least twice recently

"Frequency"
Maintain a target size for CLOCK T1

"Recency"

"Frequency"

B1

B2
HIT in T1 or T2: Set reference bit to "1"
MISS in B1: Set reference bit to “0”, move to TAIL of T2, and increase target size of T1
MISS in B2: Set reference bit to “0”, move to TAIL of T2, and decrease target size of T1
"Recency"

TOTAL MISS: Set reference bit to “0”, move to TAIL of T1
CACHE REPLACEMENT POLICY:
Replace from T1 if larger than target; else from T2
During replacement in T1, if “1” page is found, make “0” and move to T2 TAIL, move evicted page to B1
During replacement in T2, if “1” page is found, make “0”, move evicted page to B2
DIRECTORY REPLACEMENT POLICY:
Replace from B1 if T1+B1 = c; else from B2
CART = CAR + Temporal Filtering

- CAR/ARC: two hits to a page is a criterion for promotion from T1 to T2
- CART: promotion from T1 to T2 happens only if two hits are “far”
# SPC-1 like Workload

<table>
<thead>
<tr>
<th>Cache Size (4K pages)</th>
<th>LRU</th>
<th>CLOCK</th>
<th>ARC</th>
<th>CAR</th>
<th>CART</th>
</tr>
</thead>
<tbody>
<tr>
<td>65536</td>
<td>0.37</td>
<td>0.37</td>
<td>0.82</td>
<td>0.84</td>
<td>0.90</td>
</tr>
<tr>
<td>131072</td>
<td>0.78</td>
<td>0.77</td>
<td>1.62</td>
<td>1.66</td>
<td>1.78</td>
</tr>
<tr>
<td>262144</td>
<td>1.63</td>
<td>1.63</td>
<td>3.23</td>
<td>3.29</td>
<td>3.56</td>
</tr>
<tr>
<td>524288</td>
<td>3.66</td>
<td>3.64</td>
<td>7.56</td>
<td>7.62</td>
<td>8.52</td>
</tr>
<tr>
<td>1048576</td>
<td>9.19</td>
<td>9.31</td>
<td>20.00</td>
<td>20.00</td>
<td>21.90</td>
</tr>
</tbody>
</table>
## Merge(S) Workload

<table>
<thead>
<tr>
<th>Cache Size (4k Pages)</th>
<th>LRU</th>
<th>CLOCK</th>
<th>ARC</th>
<th>CAR</th>
<th>CART</th>
</tr>
</thead>
<tbody>
<tr>
<td>16384</td>
<td>0.20</td>
<td>0.20</td>
<td>1.04</td>
<td>1.03</td>
<td>1.10</td>
</tr>
<tr>
<td>32768</td>
<td>0.40</td>
<td>0.40</td>
<td>2.08</td>
<td>2.07</td>
<td>2.20</td>
</tr>
<tr>
<td>65536</td>
<td>0.79</td>
<td>0.79</td>
<td>4.07</td>
<td>4.05</td>
<td>4.27</td>
</tr>
<tr>
<td>131072</td>
<td>1.59</td>
<td>1.58</td>
<td>7.78</td>
<td>7.76</td>
<td>8.20</td>
</tr>
<tr>
<td>262144</td>
<td>3.23</td>
<td>3.27</td>
<td>14.30</td>
<td>14.25</td>
<td>15.07</td>
</tr>
<tr>
<td>524288</td>
<td>8.06</td>
<td>8.66</td>
<td>24.34</td>
<td>24.47</td>
<td>26.12</td>
</tr>
<tr>
<td>1048576</td>
<td>27.62</td>
<td>29.04</td>
<td>40.44</td>
<td>41.00</td>
<td>41.83</td>
</tr>
<tr>
<td>1572864</td>
<td>50.86</td>
<td>52.24</td>
<td>57.19</td>
<td>57.92</td>
<td>57.64</td>
</tr>
<tr>
<td>2097152</td>
<td>68.68</td>
<td>69.50</td>
<td>71.41</td>
<td>71.71</td>
<td>71.77</td>
</tr>
<tr>
<td>4194304</td>
<td>87.30</td>
<td>87.26</td>
<td>87.26</td>
<td>87.26</td>
<td>87.26</td>
</tr>
</tbody>
</table>
The graph shows the hit ratio (%) as a function of cache size (number of 512 byte pages). The x-axis represents the cache size ranging from 1024 to 262144, while the y-axis represents the hit ratio ranging from 0 to 64. Two lines are plotted: one for CAR and one for CLOCK. As the cache size increases, the hit ratio increases for both CAR and CLOCK, with CAR consistently showing a higher hit ratio compared to CLOCK.
CAR: CONCLUSIONS

- Simple and Low Overhead
- Self-tuning: captures “recency” and “frequency”
- Scan-Resistant
- Low Lock Contention and MRU Overhead
- Outperforms CLOCK on all workloads examined
- Comparable to (sometimes even better than) ARC!
"Recency"

T1

MRU

TAIL

HEAD

B1

"Frequency"

T2

MRU

TAIL

HEAD

B2
CAR: Set-up

- Clocks T1 and T2 contain cache pages
- B1 and B2 contain recently evicted history pages
  - Alex Haley: “History is written by the winners”
  - CAR/ARC: “History is written by the losers”
- Size of T1 roughly equals B2
- Size of T2 roughly equals B1
- T1(0) and B1 contain pages that have been seen exactly once recently = “Recency”
- T1(1), T2, and B2 contain pages that have been seen at least twice recently = “Frequency”
CAR: Algorithm

- **HIT in T1 or T2:** Set reference bit to 1
- **MISS in B1:** Set reference bit to 0, move to tail of T2, and increase target size of T1
- **MISS in B2:** Set reference bit to 0, move to tail of T2, and decrease target size of T1
- **TOTAL MISS:** Set reference bit to 0, move to tail of T1
- **CACHE REPLACEMENT POLICY:** Replace from T1 if larger than target; else from T2
  - During Replacement in T1, if “1” page is found, make “0” and move to T2 tail, move evicted page to B1
  - During Replacement in T2, if “1” page is found, make “0”, move evicted page to B2
- **DIRECTORY REPLACEMENT POLICY:** Replace from B1 if T1+B1 = c; else from B2
CLOCK

- **HIT:** Set the reference bit to “1”
- **MISS:** Insert at the TAIL, initialize the reference bit to “0”

**REPLACEMENT POLICY:**
- Evict the first “0” page
- Reset “1” pages to “0”—giving them a “second chance”

**KEY INSIGHT:**
- Reseting “1” to “0” is “delayed MRU”—removing it from hit path to miss path