ComputeCube: May 2009

Friday, 29 May 2009

Thermal Monitor Installation

Last night I received an email asking how to install the BV2 Thermal Monitor application. The simple answer is: you don't. I can see that it may not be clear from the original post. As the application is entirely self contained and does not depend on any C runtime you only need to:

extract the zip file to a directory of your choosing

double click the .exe file

You do need to have the NVidia drivers installed. You can make desktop shortcuts / start menu shortcuts to it in the usual manner. Please post any more queries to the forums.

Thursday, 28 May 2009

BV2 Forums - take 2

The forums are back online :) I switched from BBpress to phpBB, not that there is anything wrong with bbpress but I am more familiar with phpBB and the various options etc in the setups. phpBB is a bit overkill for such a little site but all the features and options are rather fun.

The three users who registered on the old forums will need to re-register but there were no posts apart from mine - so no loss there :)

I've had the odd email go missing from the forums but I think I've sorted out the mail server properly now. Please let me know if your registration mail goes missing.

BV2 Forums

For some reason most vistors to the site tend to email me rather than leave a comment. I really don't mind and try and answer them as quickly as possible. If you don't get a reply within a day the chances are my spam filter has eaten them...

In order to promote more open discussions about the topics on this site I have installed some forum software from bbpress (the makers of wordpress). You will find the forums here or click on the link on the top right hand of every page. BBpress integrates into wordpress so your blog user accounts should work in the forums too.

I am moving the employment / contract pages to the forum so people can post things themselves. The forums are protected by spam filters etc but if something does get through and I don't notice - please let me know. Although at the moment there is very little chance of me not noticing as there are almost no topics!

Wednesday, 27 May 2009

BV2 Thermal Monitor

Finally v0.1 of the BV2 GPU Thermal Monitor is ready for download. It is available in the downloads section of the site. The source code is included! Please note that I only partially translated the nvapi.h file to an .inc file.

Please note that it is licensed differently from other content here and is under the Creative Commons "Attribution-Share Alike 2.0 UK: England & Wales" License. Basically this means you are free to copy, use and modify the code/application in any way you like as long as: you give the original author credit and if you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one.

By downloading you agree to be bound by the license.

Ok legalities out the way :)

BV2 Thermal Monitor:

[caption id="attachment_861" align="aligncenter" width="268" caption="BV2 Thermal Monitor v0.1 Screenshot"]

[/caption]

Currently it is version 0.1 and does basic temperature monitoring of up to two NVidia GPU's.

It updates approximately every 500ms.

It can be minimized to the system tray where hovering the mouse over the icon will display a tooltip with the temperatures detected.

GPU Temperature Monitor

I was hoping to release the GPU temperature monitor to the downloads section sometime during this last bank holiday weekend. I had also planned to sort out my somewhat ailing / overheating computer and perform some upgrades. Unfortunately the repair / upgrades took almost 2 full days. My pc is now running a lot cooler and a bit quieter and I am now something of an expert with heat spreader / heat sink cleaning and thermal grease application. By the way artic silver is really good! Even before the 200 hour break in period mentioned on their site I'm already seeing more than 10 degrees lower temperature on the CPU.

As to the upgrades: more posts on this later :) But it did include me purchasing Windows XP Professional 64bit.

The longer than expected pc maintenance time has impacted the GPU Thermal Monitor application and it won't be ready for a few more days. A bit of good news about it though: it works perfectly on Windows XP 64bit without any modifications. Don't you just love asm :) Some of my other C/C++ applications were not quite so happy on the new OS.

Keep watching this space for release date :)

Bletchley Park

As part of our "honeymoon" Catriona and I did a series of day trips around the South and Midlands of our not-so-little island. We are both members of the National Trust but don't often have time to visit all the locations on offer. We saw some lovely homes and parks and with Catriona having studied history she made a very capable tour-guide :)

One place that is worth a special mention is Bletchley Park. Sadly, it isn't part of the National Trust and receives no Government Funding (well done to Milton Keynes Council for giving £600,000 for urgent restoration).

GPU Thermal Monitor

(update 27/5/09 - I've now released v0.1 - see this post )

It's been a while since my last post as I've been rather busy with work and some other projects - so time for a quick update.

While working on some long running kernels I wanted to keep track of the GPU's temperature and ran the monitor app that came on their installation disks. The problem with the monitoring apps that I have is that they are quite large and take up a lot of screen space and for some unknown reason they crash / stop working when viewed over VNC (or similar) remote control app. Now as I mostly use my CUDA machines over the network this is not a good situation.

So decided to write my own :) Here is a screenshot of the initial version. It still needs config and some cleaning up (can you spot the glitch by the minimize button?) and possibly some remote reporting over the network functionality. Currently it can display the temperature from two GPU's and updates every 500ms. When minimized it sits in the system tray and updates its tooltip with the reported temperatures. I will be releasing it to the downloads section of the website sometime over the weekend. The current .exe is 146k (mostly the skin) and doesn't require any installer.

[caption id="attachment_823" align="aligncenter" width="252" caption="GPU Thermal Monitoring Application"]

[/caption]

Nothing wrong with a bit of self promotion sometimes :)

Saturday, 16 May 2009

Tesla C1060 memory performance #2

In my post a few days ago I mentioned that some of the numbers reported by the Visual Profiler needed further investigation.

In particular the device to device memory bandwith reported by the profiler differed from the value reported by the bandwith test sample. This was easily tracked down: according to the documentation, the profiler divides the bytes read/written by 10^9 whereas the bandwith test sample divides by ( time in milliseconds * 2^20). So: 1000*2^20/10^9 = 1.048576 i.e. the bandwith sample reports a lower number.

D3Q19 Lattice

I managed to get some time last night to convert my LBM implementation to CUDA. Its far from optimal at the moment. Here is a screen shot showing the lid-driven-cavity with two obstacles, one horizontal and one vertical in the box. The visualization is from the middle plane. The obstacles are not shaded but are easily seen in the picture as an area of black. This image was taken after 16000 timeslices and some nice stable vortices have developed.

Lid driven cavity with obstacles

Once again the importance of making a gold kernel cannot be understated as I had quite a number of bugs in my initial CUDA implementation. I use a "multi-tap" type approach when debugging where the kernels write intermediate results to device memory as they go along. This can be easily compared to data coming from the various stages of the gold kernel and makes it much easier to identify the source of the error. Keep in mind the CUDA floats will never be the same as the CPU's floats (CPU uses 80 bits to compute intermediate results).

Tesla C1060 memory performance

According to the CUDA programming guide the memory coalescing rules have been relaxed in devices with compute capability 1.2 or greater. Chapter 5 has a subsection entitled "Coalescing on Devices with Compute Capability 1.2 and Higher" which gives more information.

As I now have a C1060 which has compute capability of 1.3 I thought I'd run my old coalescing tests on it to see how it has improved. I modified the launch configuration to 32760 blocks in order to maximize utilization of all the Multiprocessors and increased the thread count to 256. These changes cause the kernels to report 100% utilization in profiler. I expected the uncoalesced part of the old tests to be quite kind to the Tesla as they fall within the parameters given in the programming guide and indeed the memory transfer rates were similar to a pure device to device copy. I then modified the uncoalesced kernels to avoid the auto coalescing of memory accesses in a half-warp.

D3Q19 Lattice

After playing with a D2Q9 lattice mentioned in a previous post I felt I'd learned enough to progress to the wonderful 3D world :) The arrival of my Tesla has also given me more processing power to move into three dimensions.

So far, in order to get data I can compare the CUDA kernels against, I have come up with a cpu gold kernel. For the first test have constructed a Box with 5 sides and constantly moving flow in the top layer of the box. I am only simulating incompressible at the moment.

I have made a very simple OpenGL viewer that can either show me the entire box or a plane through it. The direction of the velocity vectors is indicated both by their colour and direction of line while the magnitude of the velocity is indicated by the length of the line (thats why some spill over the edges of the box). The below image shows a section through the middle of the box.

[caption id="attachment_763" align="aligncenter" width="292" caption="Visualization from the gold kernel"]

[/caption]

The next step is to construct the CUDA kernels and compare their outputs. I'm hoping for a massive increase in lattice update speed whilst maintaining numerical stability. The gold kernel uses doubles for reference purposes.

Tesla C1060

On Tuesday I received my brand new Tesla C1060 :) right on the day it was promised by Armari.

A quick word about Tesla suppliers... Although Armari was efficient to deal with, some of the others were absolutely terrible. Possibly because I am a private individual only buying one unit or maybe just because their overall service is terrible. Their names are not mentioned here as I have complained already and it's only fair to give them a chance...

For those not familiar with the specs it has 30 multiprocessors giving 240 stream cores with 4GB of 512bit wide DDR3 memory. It is a rather large double slot 10.5 inch long PCI card.

ComputeCube

Friday, 29 May 2009

Thermal Monitor Installation

Thursday, 28 May 2009

BV2 Forums - take 2

BV2 Forums

Wednesday, 27 May 2009

BV2 Thermal Monitor

Tuesday, 26 May 2009

GPU Temperature Monitor

Bletchley Park

Friday, 22 May 2009

GPU Thermal Monitor

Saturday, 16 May 2009

Tesla C1060 memory performance #2

Tuesday, 12 May 2009

D3Q19 Lattice

Tesla C1060 memory performance

Thursday, 7 May 2009

D3Q19 Lattice

Tesla C1060