To prevent confusion on my timing figures, here is how I calculate them: As my target framerate is 24 fps I base my timings on creating a full frame from scratch every cycle and doing that 24 times to get my timing. Best demostrated with pseudocode:
setup stuff and create scene objects
for framecount 1 to 25 (25 instead of 24 to allow for some overhead later)
I've now set up my scene with 100 objects (the goal amount) and one light source (the goal is 2). I've enabled lighting and shadows but as yet no colour and reflections. (screenshot below)
My initial timings for the above configuration was +- 41000 ms -> 0.61 fps.... quite pathetic really
After a few tweaks I got down to 20000ms -> 1.25 fps and eventually with a lot of optimization down to 800ms -> 31fps which is greater than my goal rate! :)
Unfortunately once reflections are added in and the second light source I think I will really struggle to make 24fps. Although CUDA devices have a lot of memory bandwith the latency on a global memory/read is huge (+-600 cycles) so the greater the number of objects the more you struggle to get a decent framerate. However with my recent changes I'm getting a nearly linear response in timing increases to number of objects, and once the acceleration structures are actually implemented each ray will only need to intersect a small number of objects.
[caption id="attachment_178" align="aligncenter" width="300" caption="One light and 100 objects (most hidden) at 31fps"][/caption]
Next on the list: colours and reflections...