Wednesday, 17 December 2008

NVIDIA Tesla Personal Supercomputing

I attended the NVIDIA Tesla personal supercomputing launch on the 4th December in London. It was really interesting to meet other people using cuda and gpu based devices for solving incredibly compute intensive problems.

What some groups have managed to do in the fields of Tomography / CFD / Molecular modelling and Seismology is truly awe inspiring and its only a matter of time before they have a real inpact on the average persons life. The advances in medical imaging along with the power of Tesla is a prime example in this regard.

Tuesday, 16 December 2008

Bingo and CUDA

When I worked at a bingo software supplier I used to design and implement the server side systems. To most developers solving a bingo card based on a random draw seems trivial, and so it is. Factor in the following: tens of thousands of cards, very short time span to calculate results, who won,  3,2,1 to go, bonus shapes - and it soon becomes quite a challenging realtime problem. Thereafter you need to produce output which the (typically flash) client is able to read and process quickly which is a really big string handling problem (very often 10 million string pieces).

Slot Machines

Up until two years ago I worked on gaming / casino related projects. Mostly stuck on the server side making sure all the calculations added up and random numbers were properly random. But early on in my casino days I did come up with an entire end to end solution (I have to admit I did have some graphics artist help at the final stages - I should NOT be allowed to do art! ).   This involved writing the client side game , server side systems and reporting / banking systems.   I thought I had lost most of the code but while looking in some backups last night I came accross the original source of two of them - written entirely in assembler of course :) 

Friday, 28 November 2008

Tracelight and Another 8800 GT

Last week I purchased another 8800 GT.  Although I would really like a 280 GT they are just too expensive and by buying another 8800 I can play with multiple GPU code. After some initial teething problems with video drivers and my machine going blue screen all the time I have the 2 cards working happily.  Unfortunately my machine dropped from x16 pcie to x8 to each card. This is not too serious for me as I don't move too much data over the bus between cycles (apart from the rendered image).

Using two cards gives a lot of flexibility as you can essentially run two kernels at once (Same or different ones). Currently I'm playing with each card rendering half the screen and then stitching them back together on the primary card.

The api I am developing for molecular visualization will now get support for multiple gpus thus fitting in rather nicely to the launch of the Tesla personal supercomputers. Somehow I got invited to the launch of these beasts next week. I'll be sure to post about that.

As mentioned in the title "Tracelight" will be the new name for "cudart" due largely to the fact that cudart.dll is a cuda dll and a lot of confusion has resulted. When I originally thought of the name I didn't realize the naming conflict. As a result there are a large amount of search hits on this site for cudart help referring to problems with the dll installation rather than ray tracing related searches.

Expect some more demos of tracelight soon from both the molecular api branch and the triangle mesh branch. It has its own domain ( but for now just links back to


Although I don't usually post personal stuff on my site or anywhere on the internet for that matter I think this is worth making an exception for (and kinda explains the lack of updates in the last few weeks). An added benefit is that it becomes searchable and I cannot forget the date...

On the 18th October I got engaged :)   Yes I did the one-knee thing and everything and she said "yes" :)  Quite brilliant.  We are getting married early next year and it has been quite a few mad weeks making arrangements. Its just incredible how far ahead venues and wedding related stuff gets booked up.

In unrelated news:  BT managed to cut off my entire street for 4 / 5 days. When reporting the fault they first said:  "there is nothing wrong with the line"   after me insisting there was a problem I got "well you better contact your ISP as it could be there problem".  This was rather amusing as they ARE my isp.... and not only was the internet dead the phone line was completely dead too.  Eventually I got hold of a technician who immediately told me (literally in the first sentence) that it was not his fault. Finally after many calls to them and lots of wasted time they admitted "a fault" and then took 4 days to fix it. Still no speed increase though. We live in the m4 corridor and still only get 768kps even though they sell the product as 8mb. No competition so what do they care.... /rant

Expect more rendering related posts shortly.

Thursday, 23 October 2008


It had to happen sooner or later... Yes thats right, a raytraced Stanford Bunny  (Thanks to Standford for doing the 3d scan).

To be a little different my bunny is facing away from the camera :p  I'm currently getting 1fps at 800 by 800 - which isn't too bad for just under 280 000 triangles and 2 light sources (reflections are currently turned off).  I think I can increase the speed of this by quite a bit - the current problem is my acceleration structure ran out of memory hence the artifacts. Rendering a more complex scene like this has definately shown up shortcomings in my current algorithm. I have already started work on version 2 :)

For now just a screenshot...

[caption id="attachment_307" align="aligncenter" width="400" caption="Stanford Bunny at 1 fps"]Stanford Bunny at 1 fps[/caption]


The nendril project took a big step forward yesterday with it becoming self aware.... just kidding! But it is now correctly firing its neurons. More updates soon.

Thursday, 9 October 2008

Finally an update!

What with work / projects / email / spore, updating the web site has definately taken a back seat recently. The good news is that I have answered every email I've been sent so if you haven't received a reply the chances are my spam filter ate it - just resend or post a comment here if this is the case.

On the project front - have a look at this screen shot...

[caption id="attachment_299" align="aligncenter" width="286" caption="721 Spheres and 2 lights with 1 level of reflection at +- 14fps"]721 Spheres and 2 lights with 1 level of reflection at +- 14fps[/caption]

I am actually really happy with this as having 8000 objects on the screen only drops the frame rate to +-9. I can still make a lot of optimizations to it but the basic algorithm seems to be working well. Soon it will be time to switch back to triangle meshes and render something useful :) I've recorded a short video of it which is available here , unfortunately the screen capture application I use to generate the recording dropped the frame rate by +-6fps...  At the start of the recording I turn reflections on and you will see the (already diminished) framerate drop accordingly. Note that the camera isn't moving - all the objects are moving.

Tuesday, 23 September 2008

Various updates and a challenge

The last week and weekend has been really busy with guests staying at our house so unfortunately not much done in the way of coding. Still it's always good to catch up with old friends. You know you are getting old when you realize that you have had some of your workshop tools and friends longer than not having them....

If you have sent an email in the last week and have not received a reply please be patient I do try and answer each one.

Tuesday, 16 September 2008

COCR, markov-chains and QED*

Although ray-tracing is fun :) it is not my only CUDA project at the moment. The first one I did was entitled "cocr"  - I really have to come up with a better naming convention...  It was my cuda competition entry and was a bit of a rushed job only having obtained a cuda capable graphics card a few days before the deadline.

I've held off posting many details of it here until judging was finished. But as there doesnt seem to be any news I think I can safely assume it didn't place anywhere (pity as the prizes were rather yummy for a developer).  Cocr is an implementation of a very simple template matching ocr system.

Friday, 12 September 2008

New Downloads

Please visit the cudart downloads link to get the *hopefully* bug fixed versions of cudart for CUDA v1.1 and v2.0.   The v2.0 one also includes the viewport bug fix.

Although the previous versions worked fine on the majority of systems there were issues with any CUDA 2.0 toolkit install / new forceware drivers. Hopefully these are now sorted out.  Please make sure you extract all the files from the zips before you run the .exe's.

Thanks again to Audun for his help on this one. Its nice to see that there are still people out there who know how to use a debugger in native mode.

Thursday, 11 September 2008

Bug Fixes

While working on my much talked about acceleration structure I have kept noticing artifacts and distortions as soon as it is applied to the scene. This has had me rather stumped for two days as the code / raw output seemed correct. Finally it occured to me to check my scene code and in particular my viewport code.

As it turns out my viewport code has been flawed since I converted from purely triangles to include spheres. Here is a new screenshot showing the improvement in the image quality now that its been fixed. You will also notice that you can clearly see the spiral reflected in the sphere as well as the red light that is behind the viewer (have to look closely for this one). The image is scaled down from the original size in order to save a bit of bandwith - lots of traffic from nvidia site :) 


[caption id="attachment_274" align="alignnone" width="403" caption="After bug fix of viewport code"]After bug fix of viewport code[/caption]

I am yet to fix the downloadable version (downloads section) but will do so tonight.

Wednesday, 10 September 2008

More links working again...

I seem to have broken all the "more" links on the posts by enabling the permalink feature yesterday. I've set it back to the way it was until I can work out why this happened. Or rather until Rich tells me what I broke :p

Apologies if you couldn't read some posts or download yesterday.

Acceleration of ray tracers on CUDA 1.1 devices

As those of you who follow the blog will know, I have been working on an acceleration mechanism for my raytracer over the past week.

As it turns out converting my large stack of A4 sheets of diagrams and equations into proper code has not been as easy as I had hoped. With about a third of it implemented I am getting no speed up what so ever on cudart. In fact I have lost about 2fps. This is largely to be expected as I am pretty much hitting the limits of what my card can do.

Monday, 8 September 2008

Cudart on Nvidia's CUDA page

After noticing a bit of a spike in traffic to the webserver, I see that Nvidia added cudart to the CUDA page on their site.

Although the application is fairly small the videos are quite big so I'm hoping my allocated bandwith will be enough.

Almost inevitably there has been a reported problem with the application. Thanks to Audun who posted a comment. My development PC (AMD 4400 with windows XP) is currently running a 8800GT with 512mb and a 7800GTX with a monitor attached. All calculations are done on the 8800GT and the display is done on the 7800GTX. Initially I thought that this was the problem as most people run their displays on the same card. A quick fiddle around and monitor swapping over has ruled this out - its working perfectly.

Friday, 5 September 2008

Ray Tracing Acceleration

There are a number of ways to accelerate ray tracing which mostly fall into two categories: (a) Reduce the number of rays we trace. (b) decrease the number of intersection tests

Of course you can also increase the efficiency of your ray / object intersections but as that problem is largely solved and optimized it can be ignored.

For (a) a subsampling method is often used which operates on the primary rays. The problem is that it can produce artifacts especially with lots of small objects.

For (b) a variety of space partitioning methods exist which reduce the number of objects a ray is intersected with, the most popular being the kd tree.

Thursday, 4 September 2008

Realtime "interactive" ray tracing

While implementing my acceleration structures for cudart ray tracer (more on this in another post)  I have been reading other papers / research in the field. Some of the results / implementations are very impressive with decent framerates at a high resolution and a lot of primitives in the scene.

What I did notice is that in all these demos / screen shots they tend to move the camera around while nothing in the scene moves. So although they are all using various types of kd-tree type acceleration implementations they never actually need to recalculate the tree between frames. In a scene with a lot of objects moving at once (games for example) this kd-tree restructuring becomes non-trivial.

I think the realtime ray tracing community really needs to come up with some decent reference scene with movement in order to compare improvements in future. The days of the static bunny or cornell box no longer provide enough information as to the realtime performance of the raytracing and acceleration algorithms.

Sunday, 31 August 2008

CUDA v2.0

I have put off downloading the version 2.0 documentation for a while now, largely as I cannot afford a board with compute capability 2 just yet. This weekend I eventually succumbed to temptation and downloaded the pdf.

For those of you playing with raytracers figure 5.4 on page 58 should be of particular interest :)  Its nice to see how much the nice NVidia engineers are improving their products all the time. Now to scrape together my pennies for a 2.0 card...   Still developing on a 8800GT isnt a bad idea as a lot of people have them - I'll just keep telling myself that for now :)

Friday, 29 August 2008

Acceptable performance guidelines for gaming

According to the PC Gaming Alliance Performance Definition these are the acceptable performance guidelines for gaming:

Pass/Fail Game Metrics (fps):
First Person Shooters: 24+ frames per second
Real Time Strategy: 20+ frames per second
Role Playing Games: 20+ frames per second
Racing Simulation: 50+ frames per second
Flight Simulation: 20+ frames per second
Playability = FPS + Responsiveness + no Graphical Problems

Assuming this is correct it could mean that raytracing will shortly be a viable alternative to rasterising for RTS and role playing games.

4 more fps

4 more fps! ... well on average anyway :)  So its now running at 25-30 continually. Last night did a small update to optimize the number of threads. Also added more colours to the spheres - and you can now press 'R' to get some very primitive physics (bouncing)

The download link is the same as the previous post but inside the zip is a version 0.02.

This is still without my new acceleration structures. They are taking a while to implement so as to maximize the efficiency in the way the GPU wants to read the memory. I'm hoping for at least a 2x speed up.

Wednesday, 27 August 2008

Cudart spheres: movie and download

Here is a sample movie of the spheres in cudart, the average fps code is not quite correct but gives a good idea of the current speed of the system. Well over 25 times the CPU implementation now.

Cudart Spheres movie

Here is the promised download - note it is only tested on a 8800GT with 512mb of memory. Any problems / comments / suggestions please let me know.


Again this one is only doing untextured spheres with 2 light sources.

Tuesday, 26 August 2008


For whatever reason spheres in an image seem to appeal to people in general a lot more than a collection of big triangles, no matter how well shaded / textured and lit they are... So to get a bit of "ooooo" and "aaaaah" factor going I decided to add spheres to cudart over the weekend.

Of course, as well as looking better than a triangle, computationally they are much quicker to compute. So even with 111 objects in the scene and 2 levels of reflection it is still possible to get close to 20fps at 800x800. I have just finished an interactive demo version which I will make available for download here as soon as I've sorted out the license agreement. In the meantime here are some screenshots.


Friday, 22 August 2008


As mentioned in earlier posts I have been calculating u,v coordinates for every triangle intersection. I have avoided actually texturing my triangles as I wanted to get most of the other things working well first - I find textures tend to disguise shading / intersection glitches.

Last night I decided to implement some basic texturing and see how much longer the scene took to render. So I added the Lena image to my modified Cornell box - if you are going to be a cliche you may as well go all the way :)    Screenshot with texturing is below.

Thursday, 21 August 2008

Modern art (or when raytracing goes wrong) and some timings...

I wanted to get some proper timing figures before I implemented reflections in order to tell how badly (or well) my reflection code is performing. So I removed one or two small optimizations which would affect the timings based on the scene layout and got this:

[caption id="attachment_186" align="aligncenter" width="300" caption="When raytracing goes wrong"]When raytracing goes wrong[/caption]

If I need something to fall back onto - modern art might just be my thing :p

Wednesday, 20 August 2008

Speed update

To prevent confusion on my timing figures, here is how I calculate them: As my target framerate is 24 fps I base my timings on creating a full frame from scratch every cycle and doing that 24 times to get my timing. Best demostrated with pseudocode:

setup stuff and create scene objects

start timing

for framecount 1 to 25         (25 instead of 24 to allow for some overhead later)

     generate frame

stop timing

I've now set up my scene with 100 objects (the goal amount) and one light source (the goal is 2). I've enabled lighting and shadows but as yet no colour and reflections. (screenshot below)

Tuesday, 19 August 2008


My goal for cudart is to run 24 frames per second at 800*600 with a decent object and light count. Decent is a bit hard to define but for now 100 objects and 2 light sources would make me pretty happy.

Most of the screen shots until now have been only partially computed on the GPU as I have been trying to develop my algorithms. I've found it is very easy to fall into the trap of making a photo realistic renderer and as a result the earlier screenshots you see are running at about 1 second a frame - ie 1 fps...  So although they might look impressive they are hardly running in realtime. Over the last few days I have been converting everything to run on the GPU and here is the first screenshot.

[caption id="attachment_173" align="aligncenter" width="300" caption="First purely GPU output"]First purely GPU output[/caption]


Friday, 15 August 2008

Managing scene objects

After a small change to the lighting (again...) I'm fairly happy with the basic ray / light model now. I still need to implement refraction but that can wait a bit. The next task is to work out an efficient way of manipulating the many triangles in the scene. I need to group them into objects - remembering that OO is not supported on CUDA, and make methods to translate / rotate / scale these objects.

[caption id="attachment_162" align="aligncenter" width="300" caption="Yet another lighting change"]Yet another lighting change[/caption]

Thursday, 14 August 2008

Colour bleeding

Just a very quick update today as I'm driving all over the place before work this morning. Even the tail end of rush hour on the m4 will be fun. For those of you who take the m4 in the mornings - don't panic! I'm not taking Buttercup, so there wont be any massive delays :p

As you can see from the images below I fixed some of the artifacts and coloured in my Cornel Box. It has a white light source very nearly in the centre. In the screenshot with reflections you can clearly see the colour bleeding (which is the desired effect).  Next steps are to get the dimensions of the box correct and add some objects to it.


Wednesday, 13 August 2008


Busy night last night - a variety of bug fixes in cudart. Amazing how your lighting improves when you are not taking a cubic root :p   I was for some reason, probably a really good one at the time, getting the magnitude of a vector and then taking the square root of the scalar. This explains rather nicely why my reflections were so weak.  Secondly fixed an error in some intersection calculations which was having an inpact on light reflections.  As you can see from the screenshots these fixes have caused a massive improvement in quality.

Archive Section

You will notice a new page link on the right today - "archives".  This is going to be a collection of things from the old site along with some old code / graphics etc taken from my old archives which I recently discovered. It covers stuff like assembler, openGL, directX, casinos, video, neural networks, completion ports etc - hopefully some of it is still useful to someone.  Archives

Tuesday, 12 August 2008

Lighting and texturing

It's been a hectic few days, what with house guests and trying to keep Buttercup (my Landy) running. Actually she features in most of the example image processing pictures, but I digress.  With limited time only had a chance to fix the coordinate systems in cudart - the y axis was inverted right from the first version which hasn't really been a problem until the surface normals weren't in the usual direction causing a bit of a debugging headache. 

[caption id="attachment_117" align="aligncenter" width="400" caption="Colours and reflections"]Colours and reflections[/caption]

Thursday, 7 August 2008

Finding a point on a ray closest to a point

This sounds like a rather trivial calculation, but rather than actually thinking about I just reached for a linear algrebra book - no luck....  Then for a calculus book ... still no luck, although I had some fun with the Taylor series.

A quick google search later and I had a few results, most were closest point on two rays and not exactly what I wanted. Some others failed, or deliberately chose not to, note that a ray is defined as r = P0 + td where (and here's the important bit) t is element of R AND t>=0 .  Which would fail to account for the closest point being at a -t value. (invalid)

Tuesday, 5 August 2008

image rotation

As I am yet to post any meaningful code I thought I'd start with a cuda image rotation tutorial.

To keep things simple we are going to use the standard sin/cos (or matrix) rotation method and not the sheer method.

Monday, 4 August 2008


Here are some more screenshots of cudart after some changes to lighting code and some other minor changes to the intersection code (It now produces u,v values so texturing will be coming soon)

Notice the artifacts along the front edge of the prism (mostly visible in the second image) which seem to be caused by the self intersection tests and my epsilon value possibly being too high.


Canny Edge Detection

As part of my entry to the CUDA challenge I implemented a Gaussian Point Spread function and a Sobel filter.

I'm not sure how my project went - badly I suppose - as I had only obtained a 8800 GT device a week before deadline :( I'll post the full project here once judging has finished. Actually I will probably keep working on improving it as I beleive there is a lot of scope for a fully implemented version.

Friday, 1 August 2008

Compiler optimizations

I've noticed when comparing assembly outputs, that despite advances in compiler optimizations,  they still dont optimize dependencies correctly.  Register dependencies can really stall the multi stage pipelining so it is well worth hand optimizing these in speed critical sections. (a certain ray tracer springs to mind :p )

Thursday, 31 July 2008


cudart is my very primitive implementation of a ray tracer on a cuda enabled device. At the moment crudrt would be a more appropriate name :p

It still needs stacks of work and optimizing.  There are a few flaws in the way I've converted the traditionally recursive ray tracing algorithms but at least it more or less works.

Here is the very first output from it (scaled down in size from original) showing a ground plane, 3 triangles and some lights.  No shading or reflections / refractions yet - but coming soon.

[caption id="attachment_52" align="alignnone" width="300" caption="First Output from cudart"]First Output from cudart[/caption]

CUDA occupancy calculator

D'oh!  After all my mucking around with performance calculations related to register, shared mem and global mem usage I discovered "CUDA_Occupancy_calculator.xls" lurking in the tools directory which does it all for you. Its even mentioned in the docs ... another D'oh!

I don't feel it was a complete waste of time as I now understand the inner workings of the multiprocessors a lot better.

If you do want to use the spreadsheet dont forget to compile with the -cubin option which will tell you the register usage / shared mem usage etc.

BV2 has a new home

After our previous hosting provider effectively changed our contract to 1/10 of the previous bandwidth and then last month changed its billing procedures making it effectively impossible to pay them I decided to switch to a new host.

The new providers had the server up and going exactly when they said it would be and completely untrue to form all the backups of the old server worked.  mySQL is now really much easier to backup and restore using their management tools than it used to be. Admittedly I last moved servers years ago.

The new host is here in the UK so the latency will be about 1/4 as it used to be to Canada. I will post a review of them here after a few weeks.

Friday, 25 July 2008

WARP power 9 not the quickest

Ive been playing with GPU coding and especially CUDA for quite a while now and thought I'd start posting my findings here. NVIDIA have done a great job with CUDA and their GPU families and I thoroughly recommend their products. I'm currently running a 8800GT as my coding board and a 7900GTX as my display card.

I was optimizing my grayscale kernel last night. Its quite an interesting one to optimize as the calculation it performs is trivial.  There is no warp divergence at all as there are no conditionals in the kernel. But it depends heavily on global reads and writes.

XBox360 Game Demos

Recently I've taken to downloading the demo for any upcoming XBox game that I am even vaguely considering purchasing. I would now go so far to say that I wont buy any game unless I've played the demo.

The try before you buy approach is a good one but I cant help feeling that the demos that I have tried recently are letting the game studios and the gamers down.

Wednesday, 23 July 2008


I have recently been implementing a highly parallel bitmap grayscale kernel as part of another project (more on this later).

Interestingly almost all sources on the web, including wikipedia ( still quote the:

Y = (r * 0.3) + (g * 0.59) + (b * 0.11)

This is the correct technique for older displays.  Improvements in display technology (LCD / plasma) etc mean this is no longer the case.

The more correct form for modern displays is:

Y =  (r * 0.2125) + (g * 0.7154) + (b * 0.0721)

As soon as I find the link to the paper / author that conducted this research I will post it here as it is not immediately findable on

The two equations produce markedly different results, with the second one producing a much fuller / smoother luminance range on my monitors.

I wonder how many legacy (or new) video and image editing software suites still use the older version? With the proliferation of LCD displays its well worth a quick update.

First Post

With so many ideas and projects on the go I thought its about time I started a blog or website. The site as you see it now is based on wordpress ( - thanks for a great piece of software! In the next few days (more likely weeks...) I'll be adding more stuff.

In the meantime have a look at the projects and code section to see whats on the go. (the careful observer will note that neither of these pages exist yet either....)