You've successfully subscribed to The Daily Awesome
Great! Next, complete checkout for full access to The Daily Awesome
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

Parallel Processors from Client to Cloud

. 1 min read


  • Goal: replacing large inefficient processors with many smaller, efficient processors to get better performance per joule
  • Multiprocessors, cluster
  • Scalability, availability, power efficiency
  • Task-level (process-level) parallelism
  • High throughput for independent jobs
  • Parallel processing program
  • Single program run on multiple processors
  • Multicore microprocessors
  • Chips with multiple processors (cores)
  • Shared Memory Processors (SMP)

Hardware and Software

Challenge: hardware and software design that enables parallel processing programs, which can be efficiently executed (in performance and energy) when number of cores scales.


  • Serial: e.g., Pentium 4
  • Parallel: e.g., quad-core Xeon e5345


  • Sequential: e.g., matrix multiplication
  • Concurrent: e.g., operating system

Sequential/concurrent software can run on serial/parallel hardware

We use “parallel processing program” to mean either sequential or concurrent software running on parallel hardware

Parallel Programming

  • It’s hard to create parallel software
  • Parallel programming needs to achieve significant performance improvement
  • Otherwise, just use a faster uniprocessor, since it’s easier!
  • Difficulties of parallel programming:
  • Partitioning
  • Coordination
  • Communications overhead

Amdahl’s Law

Sequential part can limit speedup

Example: 100 processors, 90× speedup?

  • $T_{new} = \frac{T_{parallelizable}}{100} + T_{sequential}$
  • $Speedup = \frac{1}{(1-F_{parallelizable}) + \frac{F_{parallelizable}}{100}}{90}$
  • Solving: $F_{parallelizable} = 0.999$

Need sequential part to be 0.1% of original time

Parallel Processing

  • The following techniques can enable parallel processing
  • SIMD, vector (section 6.3)
  • single instruction stream, single data stream
  • Uniprocessor, Intel Pentium 4
  • Multithreading (section 6.4)
  • multiple instruction, multiple data
  • Multi-core processor, Intel Core i7
  • SMPs and clusters (section 6.5)
  • single program, multiple data
  • Typical way to write program on a multi-core processor
  • One program run on multiple processors
  • Different processors execute on different sections of code
  • GPUs (section 6.6)