## Friday, 17 April 2009

### Large Data Sets and 3D Gaussian

Firstly a small fix:

In the post about the 2D gaussian there was a typo:

\$f(2,2) = Ae^{-(\sfrac{2^2+2^2}{1.4^2})}\$

\$f(2,2) = Ae^{-(\frac{2^2+2^2}{2(1.4^2)})}\$

this has now been fixed in the original article.

Last night I successfully made my 2048*2048*2048 * 2 bytes data set. The random data was generated using the mersenne twister algorithm with a fixed seed so I can recreate it later if needed. I also created a smaller set of 512x512x512 in order to interactively test my kernels.

As mentioned in my last post I was then going to use a 3D Gaussian kernel on this data and implemented a very basic non-optimized version based on my simple gold (cpu) kernel. The simple one takes a point and traverses from -2 to 2 in each direction. Very simple and extremely inefficient but easy to understand and I needed a baseline for testing.

It is worth pointing out that the standard way of indexing in a large 3D array will not work.  2048*2048*2048 exceeds a 32 bit integers maximum value. It's very easy to fall into a integer wrap around trap when working with lots of data and it can be quite hard to locate so be careful!

I ran this kernel on my 512^3 data and it produced a nice smoothing/blurring effect. But the volume is still rather uninteresting. To give it some layers I will apply a probability function that varies along the y axis with jumps in probability at certain levels to simulate rock strata.  After which I will be running the 3D Gaussian kernel.

Unfortunately no screenshots yet as it got too late to finish the code last night.

Now lets derive the 3D Gaussian function and then refactor it to make it better to parallelize and allow for better memory access patterns.

You will remember from the previous post that in 2D the function is:

\$f(x,y) = Ae^{- \frac{(x-b_x)^2+(y-b_y)^2} {2\sigma^2} } \$

extending this to 3D is trivial - simply add the z term into the 'distance' equation

\$f(x,y,z) = Ae^{- \frac{(x-b_x)^2+(y-b_y)^2+(z-b_z)^2} {2\sigma^2} } \$

now to make it more friendly to CUDA:

\$f(x,y,z) = Ae^{- \frac{(x-b_x)^2}{2\sigma^2} -\frac{(y-b_y)^2}{2\sigma^2} -\frac{(z-b_z)^2} {2\sigma^2} } \$

\$f(x,y,z) = A(e^{- \frac{(x-b_x)^2}{2\sigma^2} }e^{- \frac{(y-b_y)^2}{2\sigma^2} }e^{- \frac{(z-b_z)^2}{2\sigma^2} })\$

Which is easily recognizable as the product of the 1D Gaussians. If you'd like to keep each one entirely self contained at the expense of two extra multiplies place the cube root of A along with each 1D function. I chose not to do this but rather have the last kernel multiply by the A.

As we ultimately take the derivative of the Gaussian at a point (or an approximation in this case) we need to calculate the total sum of all the weights. (Remember that although the function is continuous we use discrete points along it) so:

\${{\sum_{x=-n}^n} {\sum_{y=-n}^n} {\sum_{z=-n}^n}{A(e^{- \frac{(x-b_x)^2}{2\sigma^2} }e^{- \frac{(y-b_y)^2}{2\sigma^2} }e^{- \frac{(z-b_z)^2}{2\sigma^2}})}}\$

where \${x,y,z \in Z}\$

Assuming n=2 then we have a 5x5x5 matrix as our kernel (if we are not parallelizing) which after we mutiply with our data points we divide by the sum calculated above.