You've successfully subscribed to The Daily Awesome
Great! Next, complete checkout for full access to The Daily Awesome
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

Memory Performance and Dependable Memory Hierarchy

. 4 min read

Measuring Cache Performance

Components of CPU time

  • Program execution cycles 【execution】
  • Includes cache hit time
  • Memory stall cycles 【IO】
  • Mainly from cache misses

With simplifying assumptions:

$,,,,,,,Memory,stall,cycles\ = \frac{memory,access}{Program}\times Miss,Rate \times Miss , Penalty\= \frac{Instructions}{Program} \times \frac{Misses}{Instructions} \times Miss,Penalty$

!> 此处需要注意指令内存100%访问,而数据内存访问依指令比例而定

Average Access Time

  • Hit time is also important for performance
  • Average Memory Access Time (AMAT)
  • $AMAT = Hit,time + Miss,rate × Miss,penalty$

Example:

  • CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5%

Performance Summary

  • When CPU performance increased
  • Miss penalty becomes more significant
  • CPI=2, Miss=3.44, % of memory stall: 3.44/5.44=63%
  • CPI=1, Miss=3.44, % of memory stall: 3.44/4.44=77%
  • Decreasing base CPI
  • Greater proportion of time spent on memory stalls
  • Increasing clock rate
  • Memory stalls account for more CPU cycles
  • Can’t neglect cache behavior when evaluating system performance

Associative Cache Example

Direct Mapped Cache

  • Location determined by address
  • Direct mapped: only one choice
  • Capacity of cache is not fully exploited
  • Miss rate is high

Associative Cache Example

Associative Caches

  • Fully associative
  • Allow a given block to go in any cache entry
  • Requires all entries to be searched at once
  • Comparator per entry (expensive)
  • n-way set associative
  • Each set contains n entries
  • Block number determines which set
  • (Block number) modulo (#Sets in cache)
  • Search all entries in a given set at once
  • n comparators (less expensive)
  • Trade off:
  • Increased associativity decreases miss rate
  • But with diminishing returns

Example:

Replacement Policy

  • Direct mapped: no choice
  • Set associative
  • Prefer non-valid entry, if there is one
  • Otherwise, choose among entries in the set
  • Least-recently used (LRU)
  • Choose the one unused for the longest time
  • Simple for 2-way, manageable for 4-way, too hard beyond that
  • Random:
  • Gives approximately the same performance as LRU for high associativity

Multilevel Caches

  • Primary cache attached to CPU
  • Small, but fast
  • Level-2 cache services misses from primary cache
  • Larger, slower, but still faster than main memory
  • Main memory services L-2 cache misses
  • Some high-end systems include L-3 cache

Example:

  • Now add L-2 cache
  • Access time = 5ns
  • Global miss rate to main memory = 0.5%
  • Primary miss with L-2 hit
  • Penalty = 5ns/0.25ns = 20 cycles
  • Primary miss with L-2 miss
  • Extra penalty = 400 cycles
  • CPI = 1 + 0.02 × 20 + 0.005 × 400 = 3.4
  • Performance ratio = 9/3.4 = 2.6

还有一种题目告诉 L1 hit time determines clock cycle, 此时使用 L1 hit time 作为 clock cycle time 即可。

Multilevel Cache Considerations:

  • Primary cache
  • Focus on minimal hit time 【!>决定了时钟频率的上限】
  • L-2 cache
  • Focus on low miss rate to avoid main memory access
  • Hit time has less overall impact
  • Results
  • L-1 cache usually smaller than a single cache
  • L-1 block size smaller than L-2 block size

Software Optimization via Blocking

  • Goal: maximize accesses to data before it is replaced
  • Consider inner loops of DGEMM:

Dependability Measures

Reliability: mean time to failure (MTTF)

Service interruption: mean time to repair (MTTR)

Mean time between failures

  • MTBF = MTTF + MTTR

Availability = MTTF / (MTTF + MTTR)

Improving Availability

  • Increase MTTF: fault avoidance, fault tolerance, fault forecasting
  • Reduce MTTR: fault detection, fault diagnosis and fault repair

The Hamming SEC Code

  • Hamming distance
  • Number of bits that are different between two bit patterns
  • Minimum distance = 2 provides
  • single bit error detection
  • parity code
  • Minimum distance = 3 provides
  • single error correction
  • 2 bit error detection

Encoding SEC (Single Error Correction)

To calculate Hamming code:

  • Number bits from 1 on the left 【索引从1开始】
  • All bit positions that are a power 2 are parity bits 【parity bit 的位置】
  • Each parity bit checks certain data bits 【能够纠错的原理】

!> 这张图片的规律很重要【如何编码】

Decoding SEC

  • Value of parity bits indicates which bits are in error
  • Use numbering from encoding procedure
  • E.g.
  • Parity bits = 0000 indicates no error
  • Parity bits = 1010 indicates bit 10 was flipped

SEC/DED Code

  • Add an additional parity bit for the whole word ($p_n$)
  • Make Hamming distance = 4
  • Decoding:
  • Let H = SEC parity bits
  • H even, $p_n$ even, no error
  • H odd, $p_n$ odd, correctable single bit error
  • H even, $p_n$ odd, error in $p_n$ bit
  • H odd, $p_n$ even, double error occurred

Note: ECC DRAM uses SEC/DED with 8 bits protecting each 64 bits

Summary

  • Cache Performance
  • Mainly depends on miss rate and miss penalty
  • To improve cache performance:
  • Fully associative cache
  • Set-associative cache
  • Replacement policy
  • Multilevel cache
  • Dependability
  • MTTF, MTTR, reliability, availability
  • Hamming code: SEC/DED code