Tuesday 19 August 2008

Optimizations

My goal for cudart is to run 24 frames per second at 800*600 with a decent object and light count. Decent is a bit hard to define but for now 100 objects and 2 light sources would make me pretty happy.

Most of the screen shots until now have been only partially computed on the GPU as I have been trying to develop my algorithms. I've found it is very easy to fall into the trap of making a photo realistic renderer and as a result the earlier screenshots you see are running at about 1 second a frame - ie 1 fps...  So although they might look impressive they are hardly running in realtime. Over the last few days I have been converting everything to run on the GPU and here is the first screenshot.

[caption id="attachment_173" align="aligncenter" width="300" caption="First purely GPU output"]First purely GPU output[/caption]

 

There is actually no lighting,colour or reflection in it at the moment but as a result it can run at about 90 fps. The problem is that the scene only contains 14 objects and 2 light sources, as soon as this increases to a decent amount the fps drops off dramatically. The bottleneck is caused by the huge latency in global memory accesses on the device. So while you can fit a small amount of objects into shared memory as soon as you go over the shared memory limit you hit a massive performance penalty.  As always I do have a plan and a quick prototype of it last night ran 14000 objects at about 20fps. I will try and build it into the main system tonight and see if it still performs as well. This will of course require a lot more objects to be added to my test scene in order to generate proper timing information.

No comments:

Post a Comment