Posts Tagged ‘Amdahl’s law’

Amdahl’s law

Monday, October 19th, 2009

A few months ago I made a post mentioning how I don’t conform to the Amdahl’s law way of thinking but never went into any details.

The law describes the speedup that can be obtained if you can parallelize a section of your problem. The speedup that can be obtained is described by the following equation:

{\frac{1}{(1-P)+\frac{P}{S}}}

Where P is the proportion of the problem that can be parallelized / sped up and S is the speedup amount.

Assuming that S->infinity  then P/S -> 0  this leave us with {\frac{1}{(1-P)}}

This implies that no matter how many processors / speed improvements we make to the P portion of the problem we can never do better than  {\frac{1}{(1-P)}}   And the biggest % improvement from the baseline comes with low values of S (or relatively low numbers of parallel processors). This result is observed in the field time and again. Very seldom does throwing more than 4 or 8 processors at a problem speed it up any more than the large gains you get from the first 2 or 4 processors.

This equation does expand with multiple P and associated S terms in order to describe a more complex / lengthly problem: (P1+P2+P3 = 100%)

{\frac{1}{(1-P1)+\frac{P1}{S1}}}+{\frac{1}{(1-P2)+\frac{P2}{S2}}}+{\frac{1}{(1-P3)+\frac{P3}{S3}}}

Certain problems where P is large do respond well to the increase in processors these are known as “embarrassingly parallel”, ray tracing is rather a good example of this.

 

So why do I not agree with this if the equation makes sense?

The assumption that only P areas can be accelerated by S and strung together in a serial fashion is rather simplistic.

Why do we have to finish P1 before beginning P2?  Even if the P2 area has dependancies on P1 its rare to have the entire section of P2 to depend on a single result (of course there are cases - reduction kernels etc)

Maybe P3 can overlap P1 and P2, some may benefit by having more processors while others may reach an optimal at two. Why not overlap the sections and supply them with their optimal processing power? This is easy to achieve with Directed Acyclic Graphs (DAG’s) and can even be computed on the “fly” although they do get rather large!

Quoting Amdahl’s law as a reason why no further speed benefits are available in a system is really just showing that thinking is still stuck in serial mode with little bursts of parallelism thrown in.  Lets starting thinking parallel in all areas and make the most of all available compute resources.

Glass half full or half empty?

Thursday, June 25th, 2009

Or as this is a hpc/cuda/parallel processing site:

Gustafson’s Law or Amdahl’s law?

Personally I prefer Gustafson’s Law …. it seems more logical to me or is this just because I’m inherently an optimist?

I would be quite interested on hearing your views on this - so comments /forum posts most welcome.

In other news:  The thermal monitor downloads have gone over 80! :)  The updated version (v0.2) is ready after mucking around with subclassing a control…. I will release it soon…

Otherwise I have been extremely busy on debugging a sparse matrix solver - bugs in huge datasets can be hard to find! Even with the aid of Gold / Sivler / Bronze kernels mentioned in the last post they have been proving remarkably tricky to isolate. Rather surprising to me is the fact that the long long data type doesn’t consume a lot more processing time than a normal unsigned int - so I have been using that wherever there is a risk of exceeding 2^32.

CFD code coming soon too - although I have unwound a lot of the optimizations in order to make it easier to understand and possibly be a good foundation for your own optimizations.

Right …. back to the grindstone!