Wednesday, 31 May 2017


NVidia announced CUDA 9 a few weeks ago. I've been using CUDA since v1.1 and compute capability 1, and things have matured significantly over the years.

The new CUDA adds support for the new Volta architecture, C++14, faster libraries and Tensor core matrix multiply, which is clearly targeting deep learning applications. But, for me, there is one stand out feature:  Cooperative Groups.

The release says that it is a new programming model for managing groups of communicating threads. What does that really mean?

Previously you could synchronize threads across a thread block with the __syncthreads() function. Cooperative groups allow you to define groups of threads at the sub-block and multi block levels and synchronization across the entire grid.

The grid sync means you now longer have to have multiple kernels operating in successive launches in order to complete a complex task on a data set. A single kernel can now operate on the data and using something like:

thread_group group_grid = this_grid();

//do something here


//do something else here


You also get a this_multi_grid() variant which will synchronize the kernel across all GPU's its been launched on!

You no longer have to wait to the sync at the end of a kernel launch and launch another kernel from the cpu code.  Presumably you will still be restricted by the timeout on the driver for your primary device.

This coupled with the pinned memory / zero copy means you can have long running kernels running all sorts of operations on memory that can be streamed into the device.

Exciting stuff!

Monday, 22 May 2017

The blog springs back to life.

This last weekend I was going through some old hard drives whilst cleaning up / getting rid of old hardware and discovered a backup of the old blog.

Not everything was there but most of the images and some of the old code. So I've been updating the old posts and changing the html so that blogger renders the equations.

Nothing new yet but I've been busy in the last few years and have quite a bit of stuff I'll release here.

Anyone still interested in the GPU Thermal monitor: which, unbelievably still works on Windows 10, can find it on GitHub here:

There is a slight issue with the background bitmap not always displaying on the new version of Windows but the functionality is still there.