Two weeks ago I attended a seminar at the Daresbury laboratory near Warrington. Unfortunately I only had the day off so couldn't attend the workshops on offer and buttercup, my trusty landy, had to drive there and back on the same day. Sorry to everyone on the m4, m5 and m6 that day :p
Jack Dongarra presented a rather nice summary of supercomputing / parallel computing or as he called it "Throughput Computing". I had never heard this term before and now really like it. It implies that you should make the best of available resources in order to maximize throughput. It's very easy to get into a mindset where everything has to be in parallel but it's possible by doing so you neglect to notice that some parts work very well in serial.
Of course serial parts can be a bottleneck with lots of fork and join bits which reduces your throughput, a better way of keeping your compute nodes busy while dealing with the inevitable serial sections is to use directed acyclic graphs.
He also mentioned that effectively programming these large and complex machines is becomming more and more complex and therefore costly and there is currently no language that can adequately describe the systems needed.
In the current and next generation of supercomputers there will be less memory per core than there currently is. Memory is a big consumer of power and in the drive for power efficiency this will have to be reduced possibly with small chunks of memory layered onto the processing elements in a 3d fashion. This will of course mean a rethink of existing algorithms as compute elements may not necessarily share a common memory area.
Another very interesting topic he mentioned, which I had not considered before, is the area of fault tolerance. We are used to expecting hard drive / storage to fail and so have various systems to negate the impact a failure has (raid / mirrors etc) but if a compute node fails in a large cluster in the middle of a matrix operation for example how do we firstly detect there has been a problem and secondly how to recover from it without having to restart the entire run.
Lastly he mentioned:
5 Important Features to Consider:
1) Many Core and hybrid machines will require block data layout and dynamic data driven execution.
2) Mixed precision - you don't always need maximum precision
3) Self adapting / auto tuning Software
4) Fault Tolerant Algorithms
5) Communication avoiding Algorithms
All in all a very informative talk and well worth the 10 hours of driving time :)