Posts Tagged ‘ray tracing’

SPH Screenshot

Friday, March 19th, 2010

Finally the promised screenshot :)

SPH with symmetry

SPH with symmetry

It’s not all that impressive to look at as I’ve restricted all the particles to 2d although it does use 3d calculations. I do this to help look for any issues in the code as I find it hard to spot errors in a 3d particle rendering.

This particular screenshot has 64000 particles that have been dropped into the box in a column formation and are now starting to slosh around at the bottom.

The unusual thing with regards to a CUDA implementation is that it is using symmetry in the interactions thereby decreasing the memory/processing load. I’ve still got more work to do but its showing a lot of promise in running superfast particle interaction simulations.

I’ve aso been doing a bit of work on my second version of my raytracer. I’ve once again stepped away from KD-trees and Octrees and am using a type of BVH, ray marching system. Screenshots once I have a decent scene rendered :)

In other news I’m now compiling all my new C++/CUDA code in 64bit with the CUDA 3.0 beta. Although I think putting in c++ object support into CUDA was a mistake the new version does produce decent code.

Amdahl’s law

Monday, October 19th, 2009

A few months ago I made a post mentioning how I don’t conform to the Amdahl’s law way of thinking but never went into any details.

The law describes the speedup that can be obtained if you can parallelize a section of your problem. The speedup that can be obtained is described by the following equation:

{\frac{1}{(1-P)+\frac{P}{S}}}

Where P is the proportion of the problem that can be parallelized / sped up and S is the speedup amount.

Assuming that S->infinity  then P/S -> 0  this leave us with {\frac{1}{(1-P)}}

This implies that no matter how many processors / speed improvements we make to the P portion of the problem we can never do better than  {\frac{1}{(1-P)}}   And the biggest % improvement from the baseline comes with low values of S (or relatively low numbers of parallel processors). This result is observed in the field time and again. Very seldom does throwing more than 4 or 8 processors at a problem speed it up any more than the large gains you get from the first 2 or 4 processors.

This equation does expand with multiple P and associated S terms in order to describe a more complex / lengthly problem: (P1+P2+P3 = 100%)

{\frac{1}{(1-P1)+\frac{P1}{S1}}}+{\frac{1}{(1-P2)+\frac{P2}{S2}}}+{\frac{1}{(1-P3)+\frac{P3}{S3}}}

Certain problems where P is large do respond well to the increase in processors these are known as “embarrassingly parallel”, ray tracing is rather a good example of this.

 

So why do I not agree with this if the equation makes sense?

The assumption that only P areas can be accelerated by S and strung together in a serial fashion is rather simplistic.

Why do we have to finish P1 before beginning P2?  Even if the P2 area has dependancies on P1 its rare to have the entire section of P2 to depend on a single result (of course there are cases - reduction kernels etc)

Maybe P3 can overlap P1 and P2, some may benefit by having more processors while others may reach an optimal at two. Why not overlap the sections and supply them with their optimal processing power? This is easy to achieve with Directed Acyclic Graphs (DAG’s) and can even be computed on the “fly” although they do get rather large!

Quoting Amdahl’s law as a reason why no further speed benefits are available in a system is really just showing that thinking is still stuck in serial mode with little bursts of parallelism thrown in.  Lets starting thinking parallel in all areas and make the most of all available compute resources.

Work and Spam…

Friday, September 25th, 2009

Yes I am still alive… :)    I’ve been exceptionally busy at work lately, not complaining though as work pays the bills and gpu / compute things are just a hobby.

What I do find rather frustrating is in the limited time I have to maintain the blog, it all gets eaten up by wading through mountains of spam. Akismet does a fantastic job, but on the forums side the bot detection captcha is pathetic and I have around 100 signups / 100’s of messages to check every day. So until a later date when I can fix / upgrade it I will be suspending the forums from today.

On a rather more positive note:  Nvidia has announced OptiX ray tracing engine. Some examples are out already so worth having a look at.  Nvidia has also announced Nexus which looks incredible. I have heard that they may charge for it but with the features it apparently has I would be willing to pay.

For some of the latest conferences papers and pre-prints have a look at Ke-Sen Huang’s page which he keeps updated with some really nice stuff.

As for me - no new cuda / compute development for a while. When I get a few mins free I’m tending to work on the maths for the stuff I want to do but never get enough time to implement anymore. Hopefully this will change in 2 or 3 months once work settles down a bit. On the subject of work there have been one or two questions over email and on the blog of the steps taken after an edge detector has been applied - unfortunately I cannot answer this as it gets a bit close to what I do for work (OCR and document processing). Rather interesting one of my ray tracing acceleration algorithms which turned out pathetically for its purpose has been rather good at identifying features in 2d/3d images but more on this at a later date once its a bit more complete.

Happy blog day!

Thursday, July 23rd, 2009

My little blog is officially one year old today! :)

Over the last year I’ve made many more posts than I was originally planning to make and judging by the number of subscribers and variety of institutions and people who visit it’s not entirely all rubbish, or so I like to tell myself.

Ongoing projects that will hopefully get completely or improved in the coming year are:

Thermal monitor - networking coming soon

More raytracing (of course) - linking PhysX to it (thanks to Timothy’s post here for inspiration) to improve on my rather primitive bouncing balls.

SPH and Lattice methods for fluid simulation

Sorting algorithms - I haven’t posted much on this lately but some decent results so far

Massive Data set processing.

Anything else that grabs my interest

Voxels, Dynamic Grids, KD Trees

Thursday, February 26th, 2009

During the course of developing my raytracer I’ve investigated various acceleration structures and made comments on them in previous posts.

My primary focus has been ray tracing of dynamic scenes where you would need to re-calculate the acceleration structure for each frame. As a result of this I ruled out KD-Trees (even SAH partitioned ones) early on and concentrated on methods which could be implemented well in parallel.  The voxel method with fixed maximum object counts served me very well in early implementations.  It is very quick to compute in parallel and its memory accesses on traversal are very kind to a CUDA implementation.  However as the number of objects in my scenes have increased, it has caused problems with this fixed max element voxel structure.  It is possible for the number of element to exceed the maximum in the cell thereby resulting in visual artifacts / lost objects.  Increasing the maximum elements per cell slows down the raytracing stage too much even though the objects are no longer lost.  To overcome this limitation I increased the number of cells in the grid.  Again this works, but slows down the raytracing due to the higher number of voxels needing to be traversed. I was using a mesh of slightly overlapping spheres as they are much quicker to run intersection tests against rather than the standard cube approach.

(more…)

Sorting Algorithms

Friday, January 23rd, 2009

In most programs there comes a time when you need to sort your data into some sort of order. Most recently I have needed to do this in both the “collapsing universe” simulation and my real time ray tracing acceleration. Ideally if you can get away without sorting data you should, but often you need to in order to accelerate a later stage of the algorithm thereby making a nett saving in time.

The popular sorting algorithms such as bubble sort, insertion sort, quicksort, radix sort, heapsort etc  range in worst case order times from  O(n log n) to O(n^2) .  The order here is based on the number of element comparisons. And for memory requirements they range from O(1) to O(log n)   (quicksort)  although some not quite so popular algorithms use higher memory requirements.

(more…)

Bunny!

Thursday, October 23rd, 2008

It had to happen sooner or later… Yes thats right, a raytraced Stanford Bunny  (Thanks to Standford for doing the 3d scan).

To be a little different my bunny is facing away from the camera :p  I’m currently getting 1fps at 800 by 800 - which isn’t too bad for just under 280 000 triangles and 2 light sources (reflections are currently turned off).  I think I can increase the speed of this by quite a bit - the current problem is my acceleration structure ran out of memory hence the artifacts. Rendering a more complex scene like this has definately shown up shortcomings in my current algorithm. I have already started work on version 2 :)

For now just a screenshot…

Stanford Bunny at 1 fps

Stanford Bunny at 1 fps

 

The nendril project took a big step forward yesterday with it becoming self aware…. just kidding! But it is now correctly firing its neurons. More updates soon.

Finally an update!

Thursday, October 9th, 2008

What with work / projects / email / spore, updating the web site has definately taken a back seat recently. The good news is that I have answered every email I’ve been sent so if you haven’t received a reply the chances are my spam filter ate it - just resend or post a comment here if this is the case.

On the project front - have a look at this screen shot…

721 Spheres and 2 lights with 1 level of reflection at +- 14fps

721 Spheres and 2 lights with 1 level of reflection at +- 14fps

I am actually really happy with this as having 8000 objects on the screen only drops the frame rate to +-9. I can still make a lot of optimizations to it but the basic algorithm seems to be working well. Soon it will be time to switch back to triangle meshes and render something useful :) I’ve recorded a short video of it which is available here , unfortunately the screen capture application I use to generate the recording dropped the frame rate by +-6fps…  At the start of the recording I turn reflections on and you will see the (already diminished) framerate drop accordingly. Note that the camera isn’t moving - all the objects are moving.

(more…)

Bug Fixes

Thursday, September 11th, 2008

While working on my much talked about acceleration structure I have kept noticing artifacts and distortions as soon as it is applied to the scene. This has had me rather stumped for two days as the code / raw output seemed correct. Finally it occured to me to check my scene code and in particular my viewport code.

As it turns out my viewport code has been flawed since I converted from purely triangles to include spheres. Here is a new screenshot showing the improvement in the image quality now that its been fixed. You will also notice that you can clearly see the spiral reflected in the sphere as well as the red light that is behind the viewer (have to look closely for this one). The image is scaled down from the original size in order to save a bit of bandwith - lots of traffic from nvidia site :) 

 

After bug fix of viewport code

After bug fix of viewport code

I am yet to fix the downloadable version (downloads section) but will do so tonight.

Acceleration of ray tracers on CUDA 1.1 devices

Wednesday, September 10th, 2008

As those of you who follow the blog will know, I have been working on an acceleration mechanism for my raytracer over the past week.

As it turns out converting my large stack of A4 sheets of diagrams and equations into proper code has not been as easy as I had hoped. With about a third of it implemented I am getting no speed up what so ever on cudart. In fact I have lost about 2fps. This is largely to be expected as I am pretty much hitting the limits of what my card can do.

(more…)