## Monday, 19 October 2009

### Amdahl's law

A few months ago I made a post mentioning how I don't conform to the Amdahl's law way of thinking but never went into any details.

The law describes the speedup that can be obtained if you can parallelize a section of your problem. The speedup that can be obtained is described by the following equation:

\\${\sfrac{1}{(1-P)+\sfrac{P}{S}}}\\$

Where P is the proportion of the problem that can be parallelized / sped up and S is the speedup amount.

Assuming that S->infinity  then P/S -> 0  this leave us with \\${\sfrac{1}{(1-P)}}\\$

This implies that no matter how many processors / speed improvements we make to the P portion of the problem we can never do better than  \\${\sfrac{1}{(1-P)}}\\$   And the biggest % improvement from the baseline comes with low values of S (or relatively low numbers of parallel processors). This result is observed in the field time and again. Very seldom does throwing more than 4 or 8 processors at a problem speed it up any more than the large gains you get from the first 2 or 4 processors.

This equation does expand with multiple P and associated S terms in order to describe a more complex / lengthly problem: (P1+P2+P3 = 100%)

\\${\sfrac{1}{(1-P1)+\sfrac{P1}{S1}}}+{\sfrac{1}{(1-P2)+\sfrac{P2}{S2}}}+{\sfrac{1}{(1-P3)+\sfrac{P3}{S3}}}\\$

Certain problems where P is large do respond well to the increase in processors these are known as "embarrassingly parallel", ray tracing is rather a good example of this.

So why do I not agree with this if the equation makes sense?

The assumption that only P areas can be accelerated by S and strung together in a serial fashion is rather simplistic.

Why do we have to finish P1 before beginning P2?  Even if the P2 area has dependancies on P1 its rare to have the entire section of P2 to depend on a single result (of course there are cases - reduction kernels etc)

Maybe P3 can overlap P1 and P2, some may benefit by having more processors while others may reach an optimal at two. Why not overlap the sections and supply them with their optimal processing power? This is easy to achieve with Directed Acyclic Graphs (DAG's) and can even be computed on the "fly" although they do get rather large!

Quoting Amdahl's law as a reason why no further speed benefits are available in a system is really just showing that thinking is still stuck in serial mode with little bursts of parallelism thrown in.  Lets starting thinking parallel in all areas and make the most of all available compute resources.