February 26th, 2010 by Barrett
Nearly 3 months since my last post
Work has been exceptionally busy: In the last two months on top of my normal product maintenance and improvement duties I have prepared and filed a patent application, architected and largely completed a distributed, resilient document processing framework and found a bit of time to eat and sleep!
I’ve noticed other blogs in the raytracing / graphics / visualization space have been very quiet lately - maybe everyone else is also working like crazy?
Not a huge amount has happened in my raytracer and SPH projects although got some interesting effects running with a non-uniform mass particle system when I had time over Christmas. Screenshots soon.
I do have the beta release of Nexus (the NVidia Visual Studio plugin) but sadly it only runs on Windows Vista or Windows 7 which leads nicely on to my next point:
I am a bit irritated with Microsoft for two reasons: Even though I purchased a 64 bit Windows XP professional about 6 or 8 months ago there is no upgrade path to Windows 7… Secondly even though visual studio 2008 standard has a switch for openMP it doesnt contain the openmp headers. Only the more expensive professional version does. Not something that was immediately obvious from the documentation before I purchased…
Although I also run Linux (centos) I prefer to develop on a Windows GUI - less buggy and more responsive than gnome / kde in my opinion. For running code the Linux os does usually win though! I would really like to run Nexus so am a bit stuck about what to do…. Succumb and buy Windows 7 and get Nexus on Visual Studio? or just forget entirely about Windows development / environment and use Linux / gcc / Intel compilers instead? While the Intel compilers are great (if a bit expensive) for an IDE I really do like Visual Studio. Most of my code is cross platform and for graphics I mostly use openGL so could switch without too much trouble… But direct compute is so tempting…..
Arrrrgh what to do!
Tags: Centos, CUDA, Linux, Nexus, openMP, Visual Studio, Windows 7
Posted in Development, Uncategorized | 2 Comments »
December 3rd, 2009 by Barrett
I noticed this linked from both Atom and Real-time Rendering blogs. Cyril Crassin, the guy behind the amazing gigavoxels raytracing, has got a 3d Mandelbrot fractal rendering in real time.
As we know there isn’t actually a 3rd dimension to the imaginary plane so some manipulation is required. The chap who discovered a good way of transforming it to 3d ( the Mandelbulb ) has a website which you can find here. Well worth reading and some pretty amazing images!
As a side note: I’ve owned the book: “Real-time Rendering” for a number of years now and it is an invaluable resource. The Real-time Rendering blog mentioned above is the blog by the authors of the book.
Tags: gigavoxels, Mandelbulb
Posted in Maths, Uncategorized | No Comments »
October 28th, 2009 by Barrett
Before everyone gets really upset with the rest of this post, as is the trend in the OO community… I thought I’d start, rather than end, with a disclaimer: I use C++ and STL on a daily basis in my job, although I don’t use all of what stl has to offer it does make coding in c++ much easier. C++ in itself does allow fairly elegant code (if constructed carefully) whilst providing a decent level of code performance. So I do actually like C++ and stl and they make my life at work much better
But this blog isn’t about my day job…. It’s about my tinkering with the wonderful world of parallel algorithms and CUDA code.
What a lot of people don’t realize is that you *can* use stl, c++ classes and templates in a .cu file. As long as its client side code you should be fine. I’ve had a few compiler crashes when using stl especially the sort. To sort this out I used the overloaded < operator in your class, don’t try and define a custom < method it will crash the compiler.
Read the rest of this entry »
Tags: C++, C++ and STL, DAG, STL
Posted in Development, cudart (ray tracing) | No Comments »
October 27th, 2009 by Barrett
As of writing the combined download count of the GPU Thermal Monitor has hit 520
So far I’m yet to receive any major feedback on bugs etc which leads me to believe it: a) works perfectly or b) no-one is bothering to report issues. As I’m an optimist I’m going with option a
I’ve had more requests for remote monitoring of the GPU temperature via a simple http request. This is something I need myself in order to keep track of temperatures in remote machines. This is now built in and in testing and bug fixing, hopefully to be released soon. I’ve not used completion ports as they seemed like overkill for what should be a light traffic application but as the source is included and under creative commons license please feel free to add them if needed. Secondly having it open source allows for some code review, which is important for security reasons as it now allows remote connections.
If you have found a bug or would like another feature added please drop me a comment or email.
Tags: GPU Heat Monitor, GPU Temperature, GPU Temperature Monitor, GPU Thermal, GPU Thermal Monitor, nvidia gpu temperature, Temperature Monitor, Thermal Monitor
Posted in BV2 Thermal Monitor | 3 Comments »
October 19th, 2009 by Barrett
A few months ago I made a post mentioning how I don’t conform to the Amdahl’s law way of thinking but never went into any details.
The law describes the speedup that can be obtained if you can parallelize a section of your problem. The speedup that can be obtained is described by the following equation:

Where P is the proportion of the problem that can be parallelized / sped up and S is the speedup amount.
Assuming that S->infinity then P/S -> 0 this leave us with 
This implies that no matter how many processors / speed improvements we make to the P portion of the problem we can never do better than
And the biggest % improvement from the baseline comes with low values of S (or relatively low numbers of parallel processors). This result is observed in the field time and again. Very seldom does throwing more than 4 or 8 processors at a problem speed it up any more than the large gains you get from the first 2 or 4 processors.
This equation does expand with multiple P and associated S terms in order to describe a more complex / lengthly problem: (P1+P2+P3 = 100%)

Certain problems where P is large do respond well to the increase in processors these are known as “embarrassingly parallel”, ray tracing is rather a good example of this.
So why do I not agree with this if the equation makes sense?
The assumption that only P areas can be accelerated by S and strung together in a serial fashion is rather simplistic.
Why do we have to finish P1 before beginning P2? Even if the P2 area has dependancies on P1 its rare to have the entire section of P2 to depend on a single result (of course there are cases - reduction kernels etc)
Maybe P3 can overlap P1 and P2, some may benefit by having more processors while others may reach an optimal at two. Why not overlap the sections and supply them with their optimal processing power? This is easy to achieve with Directed Acyclic Graphs (DAG’s) and can even be computed on the “fly” although they do get rather large!
Quoting Amdahl’s law as a reason why no further speed benefits are available in a system is really just showing that thinking is still stuck in serial mode with little bursts of parallelism thrown in. Lets starting thinking parallel in all areas and make the most of all available compute resources.
Tags: Amdahl's law, parallel processing, ray tracing
Posted in Development, Uncategorized | No Comments »
October 1st, 2009 by Barrett
I’ve just completed reading the white paper released by nvidia which you can find here.
Rather interestingly no mention of graphics performance has been made which, in a way, is really exciting. This has clearly been aimed at the high performance or throughput computing markets with the notable inclusion of ECC memory and increased double precision throughput along with the updated IEEE 754-2008 floating point support.
Concurrent kernel execution and faster context switching will allow, with the use of DAG’s, the optimization of execution on the devices rather than just working out the most efficient order of kernels to execute sequentially.
Also tucked away in the white paper is the mention of have predication at the instruction level which should give greater control of divergent paths in your kernels.
The inclusion of C++ support will appeal to a lot of people but am I rather unconvinced this is the correct way to go for throughput computing as it will encourage the use of all the old patterns that may work well in serial cases but are often rather poor for enabling maximum throughput.
There is a lot more in the paper and already an announcement by Oak Ridge that they will be using it in a new supercomputer.
All in all its a wonderful development and I can’t help feeling that computing took a substantial leap forward today.
Tags: CUDA, Fermi, NVidia, NVidia Fermi
Posted in CUDA, Uncategorized | No Comments »
September 25th, 2009 by Barrett
Yes I am still alive… :) I’ve been exceptionally busy at work lately, not complaining though as work pays the bills and gpu / compute things are just a hobby.
What I do find rather frustrating is in the limited time I have to maintain the blog, it all gets eaten up by wading through mountains of spam. Akismet does a fantastic job, but on the forums side the bot detection captcha is pathetic and I have around 100 signups / 100’s of messages to check every day. So until a later date when I can fix / upgrade it I will be suspending the forums from today.
On a rather more positive note: Nvidia has announced OptiX ray tracing engine. Some examples are out already so worth having a look at. Nvidia has also announced Nexus which looks incredible. I have heard that they may charge for it but with the features it apparently has I would be willing to pay.
For some of the latest conferences papers and pre-prints have a look at Ke-Sen Huang’s page which he keeps updated with some really nice stuff.
As for me - no new cuda / compute development for a while. When I get a few mins free I’m tending to work on the maths for the stuff I want to do but never get enough time to implement anymore. Hopefully this will change in 2 or 3 months once work settles down a bit. On the subject of work there have been one or two questions over email and on the blog of the steps taken after an edge detector has been applied - unfortunately I cannot answer this as it gets a bit close to what I do for work (OCR and document processing). Rather interesting one of my ray tracing acceleration algorithms which turned out pathetically for its purpose has been rather good at identifying features in 2d/3d images but more on this at a later date once its a bit more complete.
Tags: OptiX, ray tracing
Posted in Uncategorized | 2 Comments »
July 30th, 2009 by Barrett
There hasn’t been much in the way of posts here lately as I’ve been really busy at work getting some new components built into the systems I work on. Not really hard but it’s frustrating things like trying to get various components and libraries written in different languages to work together. So lately I’ve not had the energy to do much work on the computer once I get home…
There has been a bit of interest in my 3D Gaussian convolution kernels. Although I explained the technique mathematically in an earlier post I never actually posted the code. As it is rather quick and quite a novel way of calculating the convolution for a xy plane I decided to post it so everyone can benefit from / improve upon the technique. As always comments / bug reports etc are always welcome
Read the rest of this entry »
Tags: Gaussian Convolution, xy plane
Posted in CUDA, Maths | No Comments »
July 23rd, 2009 by Barrett
My little blog is officially one year old today!
Over the last year I’ve made many more posts than I was originally planning to make and judging by the number of subscribers and variety of institutions and people who visit it’s not entirely all rubbish, or so I like to tell myself.
Ongoing projects that will hopefully get completely or improved in the coming year are:
Thermal monitor - networking coming soon
More raytracing (of course) - linking PhysX to it (thanks to Timothy’s post here for inspiration) to improve on my rather primitive bouncing balls.
SPH and Lattice methods for fluid simulation
Sorting algorithms - I haven’t posted much on this lately but some decent results so far
Massive Data set processing.
Anything else that grabs my interest
Tags: Lattice Boltzman, PhysX, ray tracing, sorting algorithms, SPH, Thermal Monitor
Posted in Uncategorized | No Comments »
July 21st, 2009 by Barrett
Two weeks ago I attended a seminar at the Daresbury laboratory near Warrington. Unfortunately I only had the day off so couldn’t attend the workshops on offer and buttercup, my trusty landy, had to drive there and back on the same day. Sorry to everyone on the m4, m5 and m6 that day :p
Jack Dongarra presented a rather nice summary of supercomputing / parallel computing or as he called it “Throughput Computing”. I had never heard this term before and now really like it. It implies that you should make the best of available resources in order to maximize throughput. It’s very easy to get into a mindset where everything has to be in parallel but it’s possible by doing so you neglect to notice that some parts work very well in serial.
Read the rest of this entry »
Tags: directed acyclic graphs, Jack Dongarra, Throughput Computing
Posted in Uncategorized | 2 Comments »