During the course of developing my raytracer I've investigated various acceleration structures and made comments on them in previous posts.
My primary focus has been ray tracing of dynamic scenes where you would need to re-calculate the acceleration structure for each frame. As a result of this I ruled out KD-Trees (even SAH partitioned ones) early on and concentrated on methods which could be implemented well in parallel. The voxel method with fixed maximum object counts served me very well in early implementations. It is very quick to compute in parallel and its memory accesses on traversal are very kind to a CUDA implementation. However as the number of objects in my scenes have increased, it has caused problems with this fixed max element voxel structure. It is possible for the number of element to exceed the maximum in the cell thereby resulting in visual artifacts / lost objects. Increasing the maximum elements per cell slows down the raytracing stage too much even though the objects are no longer lost. To overcome this limitation I increased the number of cells in the grid. Again this works, but slows down the raytracing due to the higher number of voxels needing to be traversed. I was using a mesh of slightly overlapping spheres as they are much quicker to run intersection tests against rather than the standard cube approach.
In the latest incarnation of cuda-sp (yet to be uploaded to the download section) it uses dynamic maximum object count per voxel element. This makes memory management more tricky but does solve the artifacts / lost objects in high density regions. The downside is that these cells now take longer to run intersection tests on. To speed this up I have been playing with sorting algorithms (as mentioned in previous posts) and while successful, still only works efficiently with a smallish numbers of objects.
A KD-tree would solve the maximum elements per cell and ray traversals over the grid problems very well. The problem is that even the best implementations of a SAH KD-tree take a while to calculate. To get any further speed increases for complex scenes I am going to have to move away from my grid of voxel spheres approach as I have pretty much hit the performance limit with it. I will be investigating ways of speeding up the tree construction in order to be useful for realtime rendering. As always watch this space. Screenshots of current implementation will be online soon.