## Thursday 30 July 2009

### 3D Gaussian Convolution

There hasn't been much in the way of posts here lately as I've been really busy at work getting some new components built into the systems I work on. Not really hard but it's frustrating things like trying to get various components and libraries written in different languages to work together. So lately I've not had the energy to do much work on the computer once I get home...

There has been a bit of interest in my 3D Gaussian convolution kernels. Although I explained the technique mathematically in an earlier post I never actually posted the code. As it is rather quick and quite a novel way of calculating the convolution for a xy plane I decided to post it so everyone can benefit from / improve upon the technique.  As always comments / bug reports etc are always welcome :)

The code is a series of kernels that will do a Gaussian Convolution (smoothing) on a 3D data set of floats.   The Gaussian kernel that I used is a 5x5x5 one but it is relatively easy to extend this.  The data set is a volume 256x256x256 of floats. Again with some modifications the kernels will handle a greater or lesser volume - up to the max memory of your card.

In a trivial implementation you will read 5x5x5 data elements centred around your x,y,z point and then multiple them with their corresponding co-efficient. Finally summing them and dividing before saving into a seperate memory area. Of course this means 125 reads and associated calculations for every single data element in our 256x256x256 block resulting in over 2 billion reads - not very efficient!

Using the fact that the Gaussian filter we are using is actually a convolution we can do each direction in turn before summing the result and dividing. This gives us 256x256x256 x 5 x 3 = 251658240 reads - which is over 8 times lower than the trivial implementation!

As we are using the co-efficients many times it is beneficial to pre-calculate these and store them in constant memory. Constant memory on the device is cached and is really fast. Please note that this may not always be the case but for the Gaussian co-efficients that require more than one floating point operation to calculate this results in a large time saving.  This is implemented in "calcGaussianCoefficients" in the source code here:  [download id="5"] .