Instruction-Level Parallelism

不同于pipeline,parallelism需要额外的硬件资源

Instruction-Level Parallelism (ILP)

  • Instruction-level parallelism: parallelism among instructions
  • Pipelining is one type of ILP: because pipeline executes multiple instructions in parallel
  • To increase ILP
  • Deeper pipeline
  • Less work per stage => shorter clock cycle
  • Multiple issue (start multiple instructions in one clock)
  • Replicate pipeline stages => multiple pipelines
  • Start multiple instructions per clock cycle
  • CPI (cycle per ins.)< 1, so use Instructions Per Cycle (IPC)
  • E.g., for a 4GHz 4-way multiple-issue, peak rate is 16 BIPS (billion ins. per second), peak CPI = 0.25, peak IPC = 4, but dependencies reduce this in practice.

Two key responsibilities of multiple issue

  • Packaging instructions into issue slots
  • How many instructions can be issued
  • Which instructions should be issued
  • Dealing with data and control hazards

Multiple Issue

  • Static multiple issue – decision made by compiler
  • Compiler groups instructions to be issued together
  • Packages them into “issue slots"
  • Compiler detects and avoids hazards
  • Dynamic multiple issue – decision made by processor
  • CPU examines instruction stream and chooses instructions to issue each cycle
  • Compiler can help by reordering instructions
  • CPU resolves hazards using advanced techniques at runtime

Static Multiple Issue

  • Compiler groups instructions into “issue packets”
  • Group of instructions that can be issued on a single cycle
  • Determined by pipeline resources required
  • Think of an issue packet as a very long instruction
  • Specifies multiple concurrent operations
  • => Very Long Instruction Word (VLIW)

Dynamic Multiple Issue

  • The decision is made by the processor during execution
  • also called “Superscalar” processors
  • CPU decides whether to issue 0, 1, 2, … each cycle
  • Avoiding structural and data hazards
  • No need for compiler scheduling
  • Though it may still help
  • Code semantics ensured by the CPU

Why need dynamic multiple issue

  • Not all stalls are predicable
  • cache misses
  • Can’t always schedule around branches
  • Branch outcome is dynamically determined
  • Different implementations of an ISA have different latencies and hazards

MIPS with Static Dual Issue

Two-issue packets

  • Divide instructions into two types: (Whether relates to memory)
  • Type 1: ALU or branch instructions
  • Type 2: load or store instructions
  • In each cycle, execute a type1 and a type2 ins. simultaneously
  • avoid data hazard

More instructions executing in parallel

EX data hazard

不能使用forwarding来避免两个packet中的指令引发的stall

  • Forwarding avoided stalls with single-issue
  • Now can’t use ALU result in load/store in same packet

Load-use hazard

  • Still one cycle use latency, but now twoinstructions

More aggressive scheduling required

Improvement of multiple issue

Scheduling Static Multiple Issue

Compiler must remove some/all hazards

Reorder instructions into issue packets

No dependencies with a packet

Possibly some dependencies between packets

  • Varies between ISAs; compiler must know!

Pad with nop if necessary

Loop Unrolling

  • Replicate loop body to expose more parallelism
  • Reduces loop-control overhead
  • Use different registers per replication
  • Called “register renaming”
  • Avoid loop-carried “anti-dependencies”
  • Store followed by a load of the same register
  • Aka “name dependence” – Reuse of a register name

Dynamic Pipeline Scheduling

Hardware support for reordering the order of instruction execution

Allow the CPU to execute instructions out of order to avoid stalls

  • But commit result to registers in order

Example

lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20

Can start sub while addu is waiting for lw

Speculation

  • “Guess” what to do with an instruction
  • Start operation as soon as possible
  • Check whether guess was right
  • If so, complete the operation
  • If not, roll-back and do the right thing
  • Common to static and dynamic multiple issue
  • Examples:
  • Speculate on branch outcome
  • Roll back if path taken is different
  • Speculate on load
  • Roll back if location is updated

Compiler/Hardware Speculation

  • Compiler can reorder instructions
  • e.g., move load before branch
  • Can include “fix-up” instructions to recover from incorrect guess
  • Hardware can look ahead for instructions to execute
  • Buffer results until it determines they are actually needed
  • Flush buffers on incorrect speculation

Speculation and Exceptions

  • What if exception occurs on a speculatively executed instruction?
  • e.g., speculative load before null-pointer check
  • Static speculation
  • Can add ISA support for deferring exceptions 【汇编语言的exception】
  • Dynamic speculation
  • Can buffer exceptions until instruction completion (which may not occur)

Does Multiple Issue Work?

  • Yes, but not as much as we’d like
  • Programs have real dependencies that limit ILP
  • Some dependencies are hard to eliminate
  • e.g., pointer aliasing
  • Some parallelism is hard to expose
  • Limited window size during instruction issue
  • Memory delays and limited bandwidth
  • Hard to keep pipelines full
  • Speculation can help if done well

Power Efficiency (Power Wall)

  • Complexity of dynamic scheduling and speculations requires power
  • Multiple simpler cores may be better

Fallacies

  • Pipelining is easy (!)
  • The basic idea is easy
  • The devil is in the details
  • e.g., detecting data hazards
  • Pipelining is independent of technology
  • So why haven’t we always done pipelining?
  • More transistors make more advanced techniques feasible
  • Pipeline-related ISA design needs to take account of technology trends
  • e.g., predicated instructions

Pitfalls

  • Poor ISA design can make pipelining harder
  • e.g., complex instruction sets (VAX, IA-32)
  • Significant overhead to make pipelining work
  • IA-32 micro-op approach
  • e.g., complex addressing modes
  • Register update side effects, memory indirection
  • e.g., delayed branches
  • Advanced pipelines have long delay slots

Concluding Remarks

ISA influences design of datapath and control

Datapath and control influence design of ISA

Pipelining improves instruction throughput using parallelism

  • More instructions completed per second
  • Latency for each instruction not reduced

Hazards: structural, data, control

Multiple issue and dynamic scheduling (ILP)

  • Dependencies limit achievable parallelism
  • Complexity leads to the power wall