Parallel Processors from Client to Cloud

Introduction

  • Goal: replacing large inefficient processors with many smaller, efficient processors to get better performance per joule
  • Multiprocessors, cluster
  • Scalability, availability, power efficiency
  • Task-level (process-level) parallelism
  • High throughput for independent jobs
  • Parallel processing program
  • Single program run on multiple processors
  • Multicore microprocessors
  • Chips with multiple processors (cores)
  • Shared Memory Processors (SMP)

Hardware and Software

Challenge: hardware and software design that enables parallel processing programs, which can be efficiently executed (in performance and energy) when number of cores scales.

Hardware

  • Serial: e.g., Pentium 4
  • Parallel: e.g., quad-core Xeon e5345

Software

  • Sequential: e.g., matrix multiplication
  • Concurrent: e.g., operating system

Sequential/concurrent software can run on serial/parallel hardware

We use “parallel processing program” to mean either sequential or concurrent software running on parallel hardware

Parallel Programming

  • It’s hard to create parallel software
  • Parallel programming needs to achieve significant performance improvement
  • Otherwise, just use a faster uniprocessor, since it’s easier!
  • Difficulties of parallel programming:
  • Partitioning
  • Coordination
  • Communications overhead

Amdahl’s Law

Sequential part can limit speedup

Example: 100 processors, 90× speedup?

  • $T_{new} = \frac{T_{parallelizable}}{100} + T_{sequential}$
  • $Speedup = \frac{1}{(1-F_{parallelizable}) + \frac{F_{parallelizable}}{100}}{90}$
  • Solving: $F_{parallelizable} = 0.999$

Need sequential part to be 0.1% of original time

Parallel Processing

  • The following techniques can enable parallel processing
  • SIMD, vector (section 6.3)
  • single instruction stream, single data stream
  • Uniprocessor, Intel Pentium 4
  • Multithreading (section 6.4)
  • multiple instruction, multiple data
  • Multi-core processor, Intel Core i7
  • SMPs and clusters (section 6.5)
  • single program, multiple data
  • Typical way to write program on a multi-core processor
  • One program run on multiple processors
  • Different processors execute on different sections of code
  • GPUs (section 6.6)