My goal for cudart is to run 24 frames per second at 800*600 with a decent object and light count. Decent is a bit hard to define but for now 100 objects and 2 light sources would make me pretty happy.
Most of the screen shots until now have been only partially computed on the GPU as I have been trying to develop my algorithms. I've found it is very easy to fall into the trap of making a photo realistic renderer and as a result the earlier screenshots you see are running at about 1 second a frame - ie 1 fps... So although they might look impressive they are hardly running in realtime. Over the last few days I have been converting everything to run on the GPU and here is the first screenshot.
[caption id="attachment_173" align="aligncenter" width="300" caption="First purely GPU output"][/caption]
There is actually no lighting,colour or reflection in it at the moment but as a result it can run at about 90 fps. The problem is that the scene only contains 14 objects and 2 light sources, as soon as this increases to a decent amount the fps drops off dramatically. The bottleneck is caused by the huge latency in global memory accesses on the device. So while you can fit a small amount of objects into shared memory as soon as you go over the shared memory limit you hit a massive performance penalty. As always I do have a plan and a quick prototype of it last night ran 14000 objects at about 20fps. I will try and build it into the main system tonight and see if it still performs as well. This will of course require a lot more objects to be added to my test scene in order to generate proper timing information.