ComputeCube: C / C++ and STL

Before everyone gets really upset with the rest of this post, as is the trend in the OO community... I thought I'd start, rather than end, with a disclaimer: I use C++ and STL on a daily basis in my job, although I don't use all of what stl has to offer it does make coding in c++ much easier. C++ in itself does allow fairly elegant code (if constructed carefully) whilst providing a decent level of code performance. So I do actually like C++ and stl and they make my life at work much better :)

But this blog isn't about my day job.... It's about my tinkering with the wonderful world of parallel algorithms and CUDA code.

What a lot of people don't realize is that you *can* use stl, c++ classes and templates in a .cu file. As long as its client side code you should be fine. I've had a few compiler crashes when using stl especially the sort. To sort this out I used the overloaded < operator in your class, don't try and define a custom < method it will crash the compiler.

I was lucky enough to have Monday off so managed to find a bit of time over my extended weekend to do a bit of coding on the GPU Thermal Monitor and my ray-tracer. I've had drawings and code snippets written for my Proof of Concept ray tracer for a while now and just not had the time to implement them. For simplicity of debugging (come on, release Nexus!! ) I decided to implement my idea on the CPU and for speed of coding decided to use c++ and stl - after all it has served me well in the past.

My idea is highly parallel, after all ray tracing is a trivially parallel problem, and will eventually use persistent kernels like my cross-bridging thing did.

I got stuck into coding and made my first in long string of poor coding decisions. I decided to make my Rays into a class along with a vector and point class. After all there is a fairly limited set of operations on a ray and I could overload an operator to move t units along the ray thereby keeping the code nice and simple.

I read in all the triangles from my Stanford bunny model into a triangle class and assigned them into my modified grid data structure (also a class) using push_back on stl vectors (I prefer to use a .resize and index into them rather than using a 3d vector). From there a stl::sort got them in the order I wanted within the grid cells.

I now generated all the rays (rayclass) in the traditional manner from the eye through the viewport and ... yes... assigned them to a stl vector.

After a bit of care making sure the threads behaved in accessing the data structures I was done.

Good programming practice so far? In a purely OO / readability sense then yes. Nicely overloaded operators and hierarchy of classes including a few more support classes I have not mentioned. And it worked first try - apart from having to adjust the bunny position.

Success? Proof of concept working?

Er.. no :( In deciding to make the rays and other things a class I had inadvertently scuppered my whole idea. What I was wanting to do is group and process rays in packets based on position and direction in the grid. But by using a class for the rays I'd started down a serial path. Read in a ray, assign ray to optimal grid traversal thread based on pos/dir, intersect ray with grid, intersect ray with grid cells contents (if any), move ray onwards in grid.

What I'd originally wanted to was: optimal grid thread(s) fetches chunk of rays from pool based on pos/dir, the whole chunk gets intersected with grid then with objects (if any) and moves on

This seems like a trivial change and in fact its perfectly possible to do it with stl / c++. I could define a method for each ray class that would return a boolean to the calling grid thread indicating if it should be assigned to it. This again is inefficient as each ray object would have to be queried in turn - exposing the ray pos/dir as a public would be slightly more efficient (although bad, coding practice) and does not solve the problem of looking up a pointer to each ray class. That said, it's still possible to work out a quick way to traverse the pool of ray classes in the stl vector to determine which ones the grid thread should process.

The point here is not that c++ / stl worked perfectly BUT the way OO tends to force you into a particular way of thinking / implementation path.

Although OO can and does work well in a multithreading environment it does come from an era in software development where things were largely serial in nature and most of the design patterns etc tend to steer you away from a optimal solution in a "throughput computing" environment. OO has the added disadvantage of encouraging you to code for the single case and not for the group.

From now on I'm going to be very careful to avoid using "multithreaded" or "massively multithreaded" and "throughput computing" interchangeably as they are not the same thing at all. Although not mutually exlusive multithreaded implies lots of things running together in serial doing their own job and sometimes talking to each other via a variety of synchronization / sharing methods. Throughput computing is more about getting the job done efficiently, in general the higher the degree of sustained parallelism the higher the throughput.

So, how would I change my implementation?

Beware design patterns! Yes, great to use to get your work done - but efficient? Think carefully.

Rays would be generated and stored in a pool with only origin, direction, last grid intersection and tri intersection. The grid threads can then easily operate on chunks of this data and store results quickly and efficiently for the next kernel/grid (if traversing) to pick up.

The triangles could still be stored as classes but as we are only interested in the colour (I'm not using textures), apex, 2 sides and a normal it is much more efficient to store them in a flat structure and have the grid blocks store the index to the triangles that are within its bounds.

Arranging the data in blocks also allows us to re-arrange it to be in a format that is more friendly to the memory access patterns of the device / cpu.

Ultimately I see Objects starting to take more of a back seat in development especially in server side and throughput computing code. They still have an important role to play in many things - UI design is a good example. I can see some sort of DAG entity being the new "object" probably stored in pools of similar ones all needing the same sort of processing or dependencies. We will probably get a whole new bunch of design patterns to go along with them too - exciting stuff! Now who wants to write the new language / compiler??

So think carefully, make sure your implementation hasn't changed your way of thinking. The code is meant to describe your algorithm not dictate its direction!

Now just to find some time to re-do my ray tracing code.... :)

ComputeCube

Wednesday, 28 October 2009

C / C++ and STL

No comments:

Post a Comment