The Overview of Pipeline

Performance Issues

Longest delay determines clock period

  • Critical path: load instruction
  • Instruction memory → register file → ALU → data memory → register file

Not feasible to vary period for different instructions

Violates design principle

  • Making the common case fast

We will improve performance by pipelining

MIPS Pipeline

Pipeline: an implementation technique in which multiple instructions are overlapped in execution

Five stages, one step per stage

  • IF: Instruction fetch from memory
  • ID: Instruction decode & register read
  • EX: Execute operation or calculate address
  • MEM: Access memory operand
  • WB: Write result back to register

Performance: 耗时最长的stage决定时钟周期

Speedup:

  • If all stages are balanced (all take the same time)
  • $Time,between,instructions_{pipelined} = \frac{Time,between,instructions_{nonpipelined}}{Number,of,stages}$
  • else
  • 除以耗时最长的stage

Speedup due to increased throughput, Latency does not decrease.

Pipelining and ISA Design

与以x86为代表的CISC指令集比较

  • MIPS ISA designed for pipelining
  • All instructions are 32-bits
  • Easier to fetch and decode in one cycle
  • Few and regular instruction formats
  • Can decode and read registers in one step
  • Load/store addressing
  • Can calculate address in 3rd stage, access memory in 4th stage
  • Alignment of memory operands
  • Memory access takes only one cycle

Hazards

Situations that prevent starting the next instruction in the next cycle

Structure hazards

  • A required resource is busy
  • RAM数据的读写与ROM指令数据的读取
  • Improved by Separating instruction/data caches

Data hazard

  • Need to wait for previous instruction to complete its data read/write
  • 如赋值后紧接着读取
  • Improved by Forwarding & Code Scheduling

Control hazard

  • Decisions of control action depends on the previous instruction
  • 跳转信号结果的计算
  • Improved by Predicting

Structure Hazards

  • Conflict for use of a resource
  • If MIPS pipeline has only one memory (data and instructions all in one), then
  • Load/store requires data access
  • Instruction fetch would have to stall for that cycle
  • Hence, pipelined datapaths require separate instruction/data memories
  • Or separate instruction/data caches

Data Hazards

An instruction depends on completion of data access by a previous instruction

add $s0, $t0, $t1
sub $t2, $s0, $t3

$s0 的写入后立即读取产生 data hazard,产生两个 bubble

Control Hazards

Branch determines flow of control

Fetching next instruction depends on branch outcome

Pipeline can’t always fetch correct instruction

  • Still working on ID stage of branch

In MIPS pipeline

  • Need to compare registers and compute target early in the pipeline
  • Add hardware to do it in ID stage

Improvement of hazard

Forwarding (aka Bypassing)【Improve Data Hazard】

Forwarding can help to solve data hazard

Core idea: Use result immediately when it is computed

  • Don’t wait for it to be stored in a register
  • Requires extra connections in the data path
  • Add a bypassing line to connect the output of EX to the input

通过额外的连线来传输数据

Can’t always avoid stalls by forwarding

  • If value not computed when needed
  • Can’t forward backward in time

不能完全避免因为内存读取产生的stall,(结果不能再EX stage后获取,只能减少一个bubble)

Code Scheduling to Avoid Stalls【Improve Data Hazard】

  • Reorder code to avoid use of load result in the next instruction (avoid “load + exe” pattern)

尽量避免数据从内存载入后的立即使用

Branch Prediction

通过概率避免 stall,不能完全解决

  • Longer pipelines can’t readily determine branch outcome early
  • Stall penalty becomes unacceptable
  • Predict outcome of branch
  • Stall only if prediction is wrong
  • In MIPS pipeline
  • Can predict branches not taken
  • Fetch instruction after branch, with no delay

More-Realistic Branch Prediction

Static branch prediction

编译器根据语法进行预测

  • Based on typical branch behavior
  • Example: loop and if-statement branches
  • Predict backward branches taken
  • Predict forward branches not taken

Dynamic branch prediction

基于硬件的预测:统计历史;假设与历史相同

  • Hardware measures actual branch behavior
  • record recent history of each branch
  • Assume future behavior will continue the trend
  • When wrong, stall while re-fetching, and update history

Summary

  • Pipelining improves performance by increasing instruction throughput
  • Executes multiple instructions in parallel
  • Each instruction has the same latency
  • Subject to hazards
  • Structure, data, control
  • Instruction set design affects complexity of pipeline implementation