<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-73331669704224959</id><updated>2012-02-16T15:51:48.911-08:00</updated><category term='SFF'/><category term='Visual Studio'/><category term='C1060'/><category term='Sobel'/><category term='Radix Sort'/><category term='SPIE Medical Imaging Conference'/><category term='FPU precision'/><category term='asus'/><category term='buttercup'/><category term='NVidia Tesla c1060'/><category term='Stanford Bunny'/><category term='8800 GT'/><category term='QMicro2'/><category term='GPU Heat Monitor'/><category term='Syntax Highlighting'/><category term='Armari'/><category term='Numerical Stability'/><category term='xy plane'/><category term='Visual Studio 2008'/><category term='Rendering'/><category term='PC Design Lab'/><category term='NVidia Drivers'/><category term='Visual Profiler'/><category term='memory performance'/><category term='Lattice Boltzmann Method'/><category term='Asus Rampage II Gene'/><category term='Flex'/><category term='NVidia Tesla'/><category term='Compute Capability 1.3'/><category term='Colossus Computer'/><category term='office 2010'/><category term='C++ and STL'/><category term='Wedding'/><category term='SPIE Medical Imaging'/><category term='FPU'/><category term='KD-Tree'/><category term='C1060 memory performance'/><category term='Loch Ness Lodge'/><category term='tomographic image reconstruction'/><category term='GPU Temperature'/><category term='parallel processing'/><category term='Comparing CUDA results'/><category term='DirectCompute'/><category term='Lattice Boltzman'/><category term='Snow'/><category term='Gustafson&apos;s Law'/><category term='Intellisense'/><category term='gold kernel'/><category term='560Ti'/><category term='SAH Partitions'/><category term='SPIE'/><category term='Intel'/><category term='Sorting'/><category term='Gaussian Kernel'/><category term='n Dimensional'/><category term='Volumetric Rendering'/><category term='Scientific Computing'/><category term='CUDA jobs'/><category term='ARM'/><category term='Tutorial'/><category term='GPU Sort'/><category term='Computing Conference'/><category term='half-warp'/><category term='BV2 Thermal Monitor'/><category term='Edge Detection'/><category term='Turing Complete'/><category term='Eurographics'/><category term='CUDA'/><category term='CUDA Positions'/><category term='Silver Kernel'/><category term='Seismic Data'/><category term='windows XP x64'/><category term='openMP'/><category term='Holographic Display'/><category term='ray tracer'/><category term='Landrover'/><category term='D3Q19'/><category term='Gaussian Function'/><category term='realtime ray tracing'/><category term='Tesla'/><category term='3rd UK GPU computing conference'/><category term='Linux'/><category term='adsense'/><category term='Fermi'/><category term='Kernel Optimization'/><category term='Bletchley Park'/><category term='Memory Access Patterns'/><category term='GPU Heat'/><category term='Canny'/><category term='modern art'/><category term='GPU'/><category term='Dark Matter'/><category term='Tomography'/><category term='Tesla C1060'/><category term='Thermal Monitor'/><category term='CUDA Intellisense'/><category term='Qt Creator'/><category term='Large Data Sets'/><category term='Gaussian'/><category term='directed acyclic graphs'/><category term='CUDA contracts'/><category term='Artic Silver'/><category term='overall memory throughput'/><category term='sorting algorithms'/><category term='Oil and Gas'/><category term='Coalescing'/><category term='Medical Imaging'/><category term='CUDA emulator'/><category term='Voxels'/><category term='Projects'/><category term='Lid Driven Cavity'/><category term='realtime'/><category term='Uncoalesced Reads'/><category term='Sparse Matrix'/><category term='performance'/><category term='ocr'/><category term='Process Control'/><category term='masm'/><category term='OpenGL'/><category term='ray tracing acceleration'/><category term='GPU Thermal Monitor'/><category term='NVidia Fermi'/><category term='Bronze Kernel'/><category term='AIR'/><category term='GPU Thermal'/><category term='CUDA Numerical Stability'/><category term='Gaussian Convolution'/><category term='Jack Dongarra'/><category term='VS2008'/><category term='D2Q9'/><category term='Floating point precision'/><category term='Mandelbulb'/><category term='Development'/><category term='Nexus'/><category term='CUDA employment'/><category term='STL'/><category term='Cosmology'/><category term='Lattice Boltzmann'/><category term='coalesced memory'/><category term='Qt'/><category term='Temperature Monitor'/><category term='nvapi'/><category term='Watchdog Timer'/><category term='Throughput Computing'/><category term='ComputeCube'/><category term='cudart (ray tracing)'/><category term='Turing'/><category term='masm32'/><category term='Married'/><category term='floats'/><category term='Uncoalesced Writes'/><category term='Personal Supercomputer'/><category term='texturing'/><category term='doubles'/><category term='3D Gaussian Kernel'/><category term='ray tracing'/><category term='multiple GPU'/><category term='Gaussian Point Spread Function'/><category term='CFD'/><category term='EVOPAR'/><category term='NVidia'/><category term='gigavoxels'/><category term='C++'/><category term='GPU Temperature Monitor'/><category term='Nvidia PSC'/><category term='Colossus'/><category term='PC Design Lab QMICRA2'/><category term='Compute Cube'/><category term='OptiX'/><category term='Geometic Process Control'/><category term='National Trust'/><category term='SPH'/><category term='OpenCL'/><category term='Personal Supercomputing'/><category term='nvidia parallel nsight'/><category term='Volumetric Data'/><category term='Loch Ness'/><category term='coalesced'/><category term='Zebra Imaging'/><category term='bv2 forums'/><category term='Asus A8N-SLIR'/><category term='Centos'/><category term='Windows 7'/><category term='LBM'/><category term='CUDA large data sets'/><category term='nd Visualization'/><category term='PhysX'/><category term='DAG'/><category term='Numerical precision'/><category term='Compute Capability'/><category term='nd to 2d'/><category term='Sun VirtualBox'/><category term='TraceLight'/><category term='Amdahl&apos;s law'/><category term='Tesla Personal Supercomputer'/><category term='3D Gaussian'/><category term='Gravitational Lensing'/><category term='Maths'/><category term='Windows XP 64bit'/><category term='OpenCL contracts'/><category term='Curvaceous'/><category term='CUDA debugging'/><category term='Great08 Challenge'/><category term='nvidia gpu temperature'/><category term='Gold Silver Bronze kernels'/><category term='Land Rover'/><category term='Server Crash'/><category term='uncoalesced'/><category term='FPU rounding'/><title type='text'>ComputeCube</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default?start-index=26&amp;max-results=25'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>115</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4059908047887270890</id><published>2011-12-13T08:21:00.000-08:00</published><updated>2011-12-13T08:21:48.464-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='3rd UK GPU computing conference'/><title type='text'>3rd UK GPU Computing Conference</title><content type='html'>Quick reminder that the 3rd UK GPU computing conference is tomorrow (14th Dec 2011). From their web site it looks like there are a few spots available in case you havent already booked.&lt;br /&gt;&lt;br /&gt;I'm looking forward to seeing what everyone is up to in the GPU space here in the UK. Feel free to come say "Hi" tomorrow, always fun to chat and network.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4059908047887270890?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4059908047887270890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2011/12/3rd-uk-gpu-computing-conference.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4059908047887270890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4059908047887270890'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2011/12/3rd-uk-gpu-computing-conference.html' title='3rd UK GPU Computing Conference'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4353787508490016744</id><published>2011-10-28T06:10:00.000-07:00</published><updated>2011-10-28T06:10:08.379-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Computing Conference'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><title type='text'>3rd UK GPU Computing Conference</title><content type='html'>I keep forgetting to post this one:&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.doc.ic.ac.uk/UKGPU2011/"&gt;3rd UK GPU Computing Conference&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I went to the 2009 one in Oxford but&amp;nbsp;missed the 2010 in Cambridge as I just couldn't face the journey in my Landy.&lt;br /&gt;&lt;br /&gt;Well worth attending as you get a glimpse of what everyone else is working on&amp;nbsp;in the GPU field. A great source of inspiration for new projects.&lt;br /&gt;&lt;br /&gt;The deadline for abstract submission is 18th November.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4353787508490016744?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4353787508490016744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2011/10/3rd-uk-gpu-computing-conference.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4353787508490016744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4353787508490016744'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2011/10/3rd-uk-gpu-computing-conference.html' title='3rd UK GPU Computing Conference'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3996628679963470354</id><published>2011-10-28T01:23:00.000-07:00</published><updated>2011-10-28T01:23:44.506-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ARM'/><category scheme='http://www.blogger.com/atom/ns#' term='NVidia'/><category scheme='http://www.blogger.com/atom/ns#' term='Intel'/><title type='text'>64-bit ARM server processor</title><content type='html'>This is very exciting news.&amp;nbsp;A 64bit quad core chip that supports out-of-order execution will, in my opinion,&amp;nbsp;turn the server market on its head.&lt;br /&gt;&lt;br /&gt;When these get released I wouldn't even consider using anything else for my server needs. Coupled with the new stuff NVIDIA is doing with ARM or even ARM's own Mali GPU stuff and you will have a very powerful, low power consumption server.&lt;br /&gt;&lt;br /&gt;From what I recall&amp;nbsp; from an interview with&amp;nbsp;the UK's Intel boss on BBC breakfast TV&amp;nbsp;a&amp;nbsp;month or two&amp;nbsp;ago (I cant seem to find the link), he all but admitted they had already lost the mobile market and were going to concentrate on Ultrabooks but said that they still pretty much controlled the server market.&lt;br /&gt;&lt;br /&gt;This news should really shake them up a lot. Intel in many aspects has been a victim of their own success and have got locked into an ancient architecture (x86) . Still dont count them out yet, in the past they did come up with some good RISC chips but the market just wasnt there at the time. We may seem them popping out some new architectures in the coming months.&amp;nbsp;They certainly have the manufacturing processes in place and lots of very bright people on board. And as I've&amp;nbsp;written before their compilers are superb.&lt;br /&gt;&lt;br /&gt;Read more here at: &lt;a href="http://www.eetimes.com/electronics-news/4230166/AMCC-demos-64-bit-ARM-server-chip"&gt;eetimes.com&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3996628679963470354?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3996628679963470354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2011/10/64-bit-arm-server-processor.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3996628679963470354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3996628679963470354'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2011/10/64-bit-arm-server-processor.html' title='64-bit ARM server processor'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4370682718202479915</id><published>2011-10-25T02:15:00.000-07:00</published><updated>2011-10-25T02:15:35.517-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='EVOPAR'/><title type='text'>EvoPar 2012 Call for papers</title><content type='html'>I received an email this morning which may be of interest to some of you in the GPU and GP space:&lt;br /&gt;Just posting an extract here, you can see more on their site.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Part of Evo* 2012, the main European events on Evolutionary Computation:&lt;br /&gt;EuroGP, EvoCop, EvoBio, EvoMusArt and EvoApplications -&lt;br /&gt;&lt;br /&gt;11-13 April 2012 - Malaga, Spain&lt;br /&gt;&lt;a href="http://www.evostar.org/"&gt;http://www.evostar.org/&lt;/a&gt;&lt;br /&gt;EVOPAR: Track on Parallel and Distributed Infrastructures&lt;br /&gt;&lt;br /&gt;Submissions are invited on (but not limited to) the following topics:&lt;br /&gt;&lt;br /&gt;- Optimization of parallel architectures by means of Evolutionary Algorithms.&lt;br /&gt;- Hardware implementation of EAs, including Field Programmable Gate Arrays&lt;br /&gt;(FPGA), GPU, games consols, mobile devices.&lt;br /&gt;- GPGPU optimisation (CUDA, AMD, ARM, OpenCL, etc., etc.).&lt;br /&gt;- Improving scheduling techniques for peer-to-peer (P2P) and&lt;br /&gt;grid systems or for running distributed EAs and GAs.&lt;br /&gt;- Improving fault tolerance techniques for distributed systems and&lt;br /&gt;distributed EAs capabilities for coping with failures.&lt;br /&gt;- Analytical modelling and performance evaluation of parallel and&lt;br /&gt;distributed infrastructures when running EAs.&lt;br /&gt;- Improvement in system performance through optimisation and tuning.&lt;br /&gt;- Case studies showing the role of parallel and distributed&lt;br /&gt;infrastructures in conjunction with distributed EAs when solving&lt;br /&gt;hard real-life problems.&lt;br /&gt;- Parallel and distributed implementation of genetic algorithms.&lt;br /&gt;&lt;br /&gt;IMPORTANT DATES&lt;br /&gt;&lt;br /&gt;Submission deadline: 30 November 2011&lt;br /&gt;Notification of authors: 14 January 2012&lt;br /&gt;Camera-ready deadline: 5 February 2012&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Track Organisers&lt;br /&gt;&lt;br /&gt;F. Fernandez de Vega, University of Extremadura, Spain&lt;br /&gt;W. B. Langdon, University College London, UK&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4370682718202479915?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4370682718202479915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2011/10/evopar-2012-call-for-papers.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4370682718202479915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4370682718202479915'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2011/10/evopar-2012-call-for-papers.html' title='EvoPar 2012 Call for papers'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-1104485778791900834</id><published>2011-10-25T01:57:00.000-07:00</published><updated>2011-10-25T01:57:15.153-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='560Ti'/><title type='text'>New blog home</title><content type='html'>BV2 has a new home on blogger (google) as I just couldn't be bothered to run my own backups, email hosting etc etc anymore and the expense of hosting my own server was getting a bit excessive. &lt;br /&gt;&lt;br /&gt;The blog has been renamed to ComputeCube as the idea is to merge the two seperate blogs into one, both links should point here.&lt;br /&gt;&lt;br /&gt;The automated wordpress-&amp;gt;blogger import tool did a good job but some of the image links have broken. I will be working on fixing these. Just to be clear my move to blogger has nothing to do with wordpress. I still think its a really good blogging system, just at the moment my needs are better served here.&lt;br /&gt;&lt;br /&gt;Google now pretty much hosts everything of mine. Email, blog and google+ for the social networking thingies. In my push to move with the times I even have a twitter account now (ComputeCube) which contains personal, blog and GPU related stuff.&lt;br /&gt;&lt;br /&gt;On the GPU front: I'm still running 2x 8800GT's and a Tesla C1060 but am considering getting&amp;nbsp;a 560Ti as they seem to offer the best price/performance ratio and allow me to use the new CUDA features.&lt;br /&gt;&lt;br /&gt;The server crash did knock the wind out of my blogging sails somewhat but since my last post&amp;nbsp;I've not been idle and have implemented a lot of my ideas in CUDA, watch this space for more details.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-1104485778791900834?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/1104485778791900834/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2011/10/new-blog-home.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1104485778791900834'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1104485778791900834'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2011/10/new-blog-home.html' title='New blog home'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-6348257496164829458</id><published>2010-07-24T17:31:00.000-07:00</published><updated>2011-10-17T14:59:32.658-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Server Crash'/><title type='text'>Server Crash</title><content type='html'>Unfortunately about 36hours ago the Hard Drive in my web server decided to end its little spinning life. Although the site has been restored from a couple of backups some of the file download links are not working.&lt;br/&gt;&lt;br/&gt;The files are still available but I need to fix the download links - please be patient :)&lt;br/&gt;&lt;br/&gt;For anyone running download monitor out there, the newer update changed the DB table names. Don't do what I did or rather didn't....  and forget to change the SQL backup scripts to reflect these changes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-6348257496164829458?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/6348257496164829458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/07/server-crash.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/6348257496164829458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/6348257496164829458'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/07/server-crash.html' title='Server Crash'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-1864745368322606189</id><published>2010-07-07T03:35:00.000-07:00</published><updated>2011-10-17T14:59:32.659-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SFF'/><category scheme='http://www.blogger.com/atom/ns#' term='QMicro2'/><category scheme='http://www.blogger.com/atom/ns#' term='Asus Rampage II Gene'/><category scheme='http://www.blogger.com/atom/ns#' term='ComputeCube'/><category scheme='http://www.blogger.com/atom/ns#' term='Tesla C1060'/><category scheme='http://www.blogger.com/atom/ns#' term='PC Design Lab'/><title type='text'>PC Design Lab - new case</title><content type='html'>I just got an email update from PC Design Lab regarding their new case. Those of you who follow the blog will know my &lt;a href="http://www.bv2.co.uk/?p=945" target="_blank"&gt;ComputeCube&lt;/a&gt; machine is built into their QMicro2 case, which has been really good with only one or two tiny niggles. Although in fairness they are caused by the the amount of power cables and the heat emitted by the Asus rampage 2 gene northbridge arrangement and the Tesla C1060.&lt;br/&gt;&lt;br/&gt;The new case looks good and their have adopted the suggestions from their clients. You may have a look at the new pre-order case &lt;a href="http://hivelogix.com/cases/case-qmicra" target="_blank"&gt;here&lt;/a&gt;.  Even though they have raised the cage I would have liked it to be slightly taller to help with airflow over the GPU's and power cable routing.  Strangely they mention it can now support 750w power supplies, but I have been running a 1250w one in the older case for a while now with no problems.&lt;br/&gt;&lt;br/&gt;The radiator bracket is a really good idea and would have helped sort out the Rampage 2 Genes overheating northbridge rather nicely.&lt;br/&gt;&lt;br/&gt;If you are looking for a SFF case in my opinion there is nothing better out there and this new model raises the bar even further.  Now if I could only get my hands on the new one and a watercooling kit :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-1864745368322606189?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/1864745368322606189/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/07/pc-design-lab-new-case.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1864745368322606189'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1864745368322606189'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/07/pc-design-lab-new-case.html' title='PC Design Lab - new case'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-694713520120409519</id><published>2010-07-06T08:29:00.000-07:00</published><updated>2011-10-17T14:59:32.659-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DirectCompute'/><category scheme='http://www.blogger.com/atom/ns#' term='Compute Cube'/><category scheme='http://www.blogger.com/atom/ns#' term='windows XP x64'/><category scheme='http://www.blogger.com/atom/ns#' term='nvidia parallel nsight'/><category scheme='http://www.blogger.com/atom/ns#' term='Windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='Tesla C1060'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='Sun VirtualBox'/><category scheme='http://www.blogger.com/atom/ns#' term='office 2010'/><title type='text'>Windows 7</title><content type='html'>While ordering a replacement power supply for one of my machines I decided to add a Windows 7 to the shopping cart.&lt;br/&gt;&lt;br/&gt;This was in no small part prompted by the NVidia parallel nsight addin for visual studio which would not work on my XP machines. An additional contributing factor was the lack of SP3 for Windows XP Pro x64. Of course the new office 2010 will only run on XP SP3, Vista and Windows 7.&lt;br/&gt;&lt;br/&gt;Having had a brief and scary encounter with Windows Vista I am pleasantly surprised with the new version of Windows.  Microsoft have done a good job with this one.&lt;br/&gt;&lt;br/&gt;The installation process had to be a "clean" one as there is no upgrade path from winXP x64. I took a full backup of my "&lt;a href="http://www.bv2.co.uk/?p=945" target="_blank"&gt;ComputeCube&lt;/a&gt;" machine and without bothering to format the C drive just popped the DVD in and rebooted.&lt;br/&gt;&lt;br/&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;The installation process was completely painless and apart from one reboot occurring too quickly for me to remove the DVD from the tray it was all installed within 15 to 20 mins. The windows 7 installer actually makes a backup of your old windows and program files directories so there was no need to go and fetch stuff back from my backups.&lt;br/&gt;&lt;br/&gt;I did lose my dual boot option for my Linux but lately I've been running them all inside VM's so I'm not concerned by that at all.  If you are looking for good VM software: Sun VirtualBox seems to tick all the boxes and even has an API.&lt;br/&gt;&lt;br/&gt;After the first login I was happy to see it had picked up all my hardware including the Tesla C1060. The only thing it had got rather wrong was the IP address of the gateway - it had missed by one.... weird.&lt;br/&gt;&lt;br/&gt;I've been using it since Saturday now with visual studio and office 2010 and apart from one frozen file copy dialog (which rather surprisingly could be end-tasked without crashing explorer...) it has performed flawlessly.&lt;br/&gt;&lt;br/&gt;Windows 7 also allows you to use DirectCompute on your GPU's. I've not quite got to grips with it yet but it seems quite functional. I'll probably stick to CUDA and OpenCL for now, much like I prefer OpenGL over DirectX - I just don't have the time to learn all these technologies and think it makes a bit more sense to stick to the cross platform ones for now.&lt;br/&gt;&lt;br/&gt;Tip: When trying to register com components via the command prompt, make sure you have selected "run as administrator"  even if you are logged in with admin rights.&lt;br/&gt;&lt;br/&gt;In summary Windows 7 installation is easy and thereafter it does what it says on the tin. What more can you ask from an OS?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-694713520120409519?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/694713520120409519/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/07/windows-7.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/694713520120409519'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/694713520120409519'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/07/windows-7.html' title='Windows 7'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4841996020725533366</id><published>2010-06-30T04:09:00.000-07:00</published><updated>2011-10-17T14:59:59.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='adsense'/><title type='text'>Site Changes</title><content type='html'>Yes..... I sold out.   There is now a google adwords space on the top right on the sidebar and at the bottom of individual posts.  I went for what I think is the most unobtrusive design they offered.&lt;br/&gt;&lt;br/&gt;Google have emailed me adwords related stuff for ages and I have resolutely resisted as this site is in no way a marketing site. However, in the last few months the site and related equipment failures (power supply, hard drive, mail server etc) have cost me quite a bit and hopefully this will help me recoup some of that.&lt;br/&gt;&lt;br/&gt;Looking at what they seem to be placing as ads on the pages I have been starting to wonder how good their contextual advertising is... still it's early days.&lt;br/&gt;&lt;br/&gt;In related site news the forums are now permanently removed, although there are still some links left over as I write this.  With the amount of spam / hacking etc its just not worth maintaining a forum on a small site like this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4841996020725533366?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4841996020725533366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/06/site-changes.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4841996020725533366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4841996020725533366'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/06/site-changes.html' title='Site Changes'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-2489109976915209861</id><published>2010-06-22T04:02:00.000-07:00</published><updated>2011-10-17T14:59:59.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Turing'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='Turing Complete'/><title type='text'>Catch up</title><content type='html'>No apologies for the long delays between posts, or even checking the blog. It just has to fit in with life at the moment and there is so much going on!&lt;br/&gt;&lt;br/&gt;Those of you who have left comments on the blog and emails for me should have got an answer last night or this morning. A bit of a delay for some of you.  I do apologise for those of you who sent an email and haven't got a reply back. My computer that used to handle all my email died and I haven't had a chance to fix it yet. I think its just the power supply.... hopefully! so will get the emails back in a few weeks when I eventually get around to fixing it.&lt;br/&gt;&lt;br/&gt;I've subsequently upgraded my email server - so any forthcoming mails will get to me. &lt;br/&gt;&lt;br/&gt;In development work I've been working on my SPH simulations, and some GP stuff whenever I get a chance. GP stuff is traditionally recursive - well the equation trees anyway and have needed a substantial amount of reworking to get working efficiently on the GPU.&lt;br/&gt;&lt;br/&gt;Speaking of recursive.... in order to be Turing Complete (assuming infinite memory for now) do you need to support / include recursion? Some posters on certain forums seem to think it is needed, but personally I can't see why?  Most recursion with a bit of effort can be iterative - although possibly not very pretty or efficient.&lt;br/&gt;&lt;br/&gt;For example. A GPU doesn't really support recursion*, but I would consider cuda / GPU combination as Turing complete. Admittedly not very efficient in certain cases - single thread for example. And again ignoring the infinite memory issue.  *You can if you implement your own stack type system in global memory...&lt;br/&gt;&lt;br/&gt;I'd be interested in knowing others views on this - email the normal place or comment here :)&lt;br/&gt;&lt;br/&gt;To all the regular readers of the blog - anyone else being amazed by the absolute explosion of GPU / CUDA related code / products / hardware.  Very exciting indeed!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-2489109976915209861?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/2489109976915209861/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/06/catch-up.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/2489109976915209861'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/2489109976915209861'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/06/catch-up.html' title='Catch up'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3513584636558551531</id><published>2010-03-19T05:45:00.000-07:00</published><updated>2011-10-17T14:59:59.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Development'/><category scheme='http://www.blogger.com/atom/ns#' term='SPH'/><category scheme='http://www.blogger.com/atom/ns#' term='ray tracing'/><category scheme='http://www.blogger.com/atom/ns#' term='CFD'/><title type='text'>SPH Screenshot</title><content type='html'>Finally the promised screenshot :)&lt;br/&gt;&lt;br/&gt;[caption id="attachment_1109" align="alignnone" width="299" caption="SPH with symmetry"]&lt;a href="http://www.bv2.co.uk/wp-content/uploads/2010/03/sph_s2.jpg"&gt;&lt;img class="size-medium wp-image-1109" title="sph_s2" src="http://www.bv2.co.uk/wp-content/uploads/2010/03/sph_s2.jpg" alt="SPH with symmetry" width="299" height="300" /&gt;&lt;/a&gt;[/caption]&lt;br/&gt;&lt;br/&gt;It's not all that impressive to look at as I've restricted all the particles to 2d although it does use 3d calculations. I do this to help look for any issues in the code as I find it hard to spot errors in a 3d particle rendering.&lt;br/&gt;&lt;br/&gt;This particular screenshot has 64000 particles that have been dropped into the box in a column formation and are now starting to slosh around at the bottom.&lt;br/&gt;&lt;br/&gt;The unusual thing with regards to a CUDA implementation is that it is using symmetry in the interactions thereby decreasing the memory/processing load. I've still got more work to do but its showing a lot of promise in running superfast particle interaction simulations.&lt;br/&gt;&lt;br/&gt;I've aso been doing a bit of work on my second version of my raytracer. I've once again stepped away from KD-trees and Octrees and am using a type of BVH, ray marching system. Screenshots once I have a decent scene rendered :)&lt;br/&gt;&lt;br/&gt;In other news I'm now compiling all my new C++/CUDA code in 64bit with the CUDA 3.0 beta. Although I think putting in c++ object support into CUDA was a mistake the new version does produce decent code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3513584636558551531?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3513584636558551531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/03/sph-screenshot.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3513584636558551531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3513584636558551531'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/03/sph-screenshot.html' title='SPH Screenshot'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-8857492228731102789</id><published>2010-02-26T09:39:00.000-08:00</published><updated>2011-10-17T14:59:59.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Development'/><category scheme='http://www.blogger.com/atom/ns#' term='Nexus'/><category scheme='http://www.blogger.com/atom/ns#' term='openMP'/><category scheme='http://www.blogger.com/atom/ns#' term='Linux'/><category scheme='http://www.blogger.com/atom/ns#' term='Windows 7'/><category scheme='http://www.blogger.com/atom/ns#' term='Visual Studio'/><category scheme='http://www.blogger.com/atom/ns#' term='Centos'/><title type='text'>Poor neglected blog...</title><content type='html'>Nearly 3 months since my last post :(&lt;br/&gt;&lt;br/&gt;Work has been exceptionally busy: In the last two months on top of my normal product maintenance and improvement duties I have prepared and filed a patent application, architected and largely completed a distributed, resilient document processing framework and found a bit of time to eat and sleep!&lt;br/&gt;&lt;br/&gt;I've noticed other blogs in the raytracing / graphics / visualization space have been very quiet lately - maybe everyone else is also working like crazy?&lt;br/&gt;&lt;br/&gt;Not a huge amount has happened in my raytracer and SPH projects although got some interesting effects running with a non-uniform mass particle system when I had time over Christmas. Screenshots soon.&lt;br/&gt;&lt;br/&gt;I do have the beta release of Nexus (the NVidia Visual Studio plugin)  but sadly it only runs on Windows Vista or Windows 7 which leads nicely on to my next point:&lt;br/&gt;&lt;br/&gt;I am a bit irritated with Microsoft for two reasons:  Even though I purchased a 64 bit Windows XP professional about 6 or 8 months ago there is no upgrade path to Windows 7...  Secondly even though visual studio 2008 standard has a switch for openMP it doesnt contain the openmp headers. Only the more expensive professional version does. Not something that was immediately obvious from the documentation before I purchased...&lt;br/&gt;&lt;br/&gt;Although I also run Linux (centos) I prefer to develop on a Windows GUI - less buggy and more responsive than gnome / kde in my opinion. For running code the Linux os does usually win though! I would really like to run Nexus so am a bit stuck about what to do....  Succumb and buy Windows 7 and get Nexus on Visual Studio? or just forget entirely about Windows development / environment and use Linux / gcc / Intel compilers instead?  While the Intel compilers are great (if a bit expensive) for an IDE I really do like Visual Studio.  Most of my code is cross platform and for graphics I mostly use openGL so could switch without too much trouble...    But direct compute is so tempting.....&lt;br/&gt;&lt;br/&gt;Arrrrgh what to do!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-8857492228731102789?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/8857492228731102789/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2010/02/poor-neglected-blog.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/8857492228731102789'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/8857492228731102789'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2010/02/poor-neglected-blog.html' title='Poor neglected blog...'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-2156130638677187640</id><published>2009-12-03T02:37:00.000-08:00</published><updated>2011-10-17T14:59:59.581-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mandelbulb'/><category scheme='http://www.blogger.com/atom/ns#' term='gigavoxels'/><category scheme='http://www.blogger.com/atom/ns#' term='Maths'/><title type='text'>Mandelbulb</title><content type='html'>I noticed &lt;a href="http://www.icare3d.org/blog_techno/gpu/gigabroccoli_the_mandelbulb_into_gigavoxels.html" target="_blank"&gt;this&lt;/a&gt; linked from both &lt;a href="http://farrarfocus.blogspot.com/2009/12/gigavoxels-mandelbulb.html" target="_self"&gt;Atom&lt;/a&gt; and &lt;a href="http://www.realtimerendering.com/blog/real-time-mandelbulb-visualization-with-gigavoxels/" target="_self"&gt;Real-time Rendering&lt;/a&gt; blogs. Cyril Crassin, the guy behind the amazing gigavoxels raytracing, has got a 3d Mandelbrot fractal rendering in real time.&lt;br/&gt;&lt;br/&gt;As we know there isn't actually a 3rd dimension to the imaginary plane so some manipulation is required. The chap who discovered a good way of transforming it to 3d ( the Mandelbulb ) has a website which you can find &lt;a href="http://www.skytopia.com/project/fractal/mandelbulb.html" target="_self"&gt;here&lt;/a&gt;. Well worth reading and some pretty amazing images!&lt;br/&gt;&lt;br/&gt;As a side note: I've owned the book: "Real-time Rendering" for a number of years now and it is an invaluable resource. The Real-time Rendering blog mentioned above is the blog by the authors of the book.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-2156130638677187640?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/2156130638677187640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/12/mandelbulb.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/2156130638677187640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/2156130638677187640'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/12/mandelbulb.html' title='Mandelbulb'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3863115522590812373</id><published>2009-10-28T06:09:00.000-07:00</published><updated>2011-10-17T14:59:59.582-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Development'/><category scheme='http://www.blogger.com/atom/ns#' term='STL'/><category scheme='http://www.blogger.com/atom/ns#' term='cudart (ray tracing)'/><category scheme='http://www.blogger.com/atom/ns#' term='DAG'/><category scheme='http://www.blogger.com/atom/ns#' term='C++'/><category scheme='http://www.blogger.com/atom/ns#' term='C++ and STL'/><title type='text'>C / C++ and STL</title><content type='html'>Before everyone gets really upset with the rest of this post, as is the trend in the OO community...  I thought I'd start, rather than end, with a disclaimer:  I use C++ and STL on a daily basis in my job, although I don't use all of what stl has to offer it does make coding in c++ much easier. C++ in itself does allow fairly elegant code (if constructed carefully) whilst providing a decent level of code performance. So I do actually like C++ and stl and they make my life at work much better :)&lt;br/&gt;&lt;br/&gt;But this blog isn't about my day job....  It's about my tinkering with the wonderful world of parallel algorithms and CUDA code.&lt;br/&gt;&lt;br/&gt;What a lot of people don't realize is that you *can* use stl, c++ classes and templates in a .cu file. As long as its client side code you should be fine. I've had a few compiler crashes when using stl especially the sort. To sort this out I used the overloaded &amp;lt; operator in your class, don't try and define a custom &amp;lt; method it will crash the compiler.&lt;br/&gt;&lt;br/&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;I was lucky enough to have Monday off so managed to find a bit of time over my extended weekend to do a bit of coding on the GPU Thermal Monitor and my ray-tracer. I've had drawings and code snippets written for my Proof of Concept ray tracer for a while now and just not had the time to implement them. For simplicity of debugging (come on, release &lt;a href="http://developer.nvidia.com/object/nexus.html" target="_blank"&gt;Nexus&lt;/a&gt;!! ) I decided to implement my idea on the CPU and for speed of coding decided to use c++ and stl - after all it has served me well in the past.&lt;br/&gt;&lt;br/&gt;My idea is highly parallel, after all ray tracing is a trivially parallel problem, and will eventually use persistent kernels like my cross-bridging thing did.&lt;br/&gt;&lt;br/&gt;I got stuck into coding and made my first in long string of poor coding decisions. I decided to make my Rays into a class along with a vector and point class. After all there is a fairly limited set of operations on a ray and I could overload an operator to move t units along the ray thereby keeping the code nice and simple.&lt;br/&gt;&lt;br/&gt;I read in all the triangles from my Stanford bunny model into a triangle class and assigned them into my modified grid data structure (also a class) using push_back on stl vectors (I prefer to use a .resize and index into them rather than using a 3d vector). From there a stl::sort got them in the order I wanted within the grid cells.&lt;br/&gt;&lt;br/&gt;I now generated all the rays (rayclass) in the traditional manner from the eye through the viewport and ... yes... assigned them to a stl vector.&lt;br/&gt;&lt;br/&gt;After a bit of care making sure the threads behaved in accessing the data structures I was done.&lt;br/&gt;&lt;br/&gt;Good programming practice so far?  In a purely OO / readability sense then yes. Nicely overloaded operators and hierarchy of classes including a few more support classes I have not mentioned. And it worked first try - apart from having to adjust the bunny position.&lt;br/&gt;&lt;br/&gt;Success? Proof of concept working?&lt;br/&gt;&lt;br/&gt;Er.. no :(   In deciding to make the rays and other things a class I had inadvertently scuppered my whole idea. What I was wanting to do is group and process rays in packets based on position and direction in the grid. But by using a class for the rays I'd started down a serial path. Read in a ray, assign ray to optimal grid traversal thread based on pos/dir, intersect ray with grid, intersect ray with grid cells contents (if any), move ray onwards in grid.&lt;br/&gt;&lt;br/&gt;What I'd originally wanted to was:  optimal grid thread(s) fetches chunk of rays from pool based on pos/dir, the whole chunk gets intersected with grid then with objects (if any) and moves on&lt;br/&gt;&lt;br/&gt;This seems like a trivial change and in fact its perfectly possible to do it with stl / c++. I could define a method for each ray class that would return a boolean to the calling grid thread indicating if it should be assigned to it. This again is inefficient as each ray object would have to be queried in turn - exposing the ray pos/dir as a public would be slightly more efficient  (although bad, coding practice) and does not solve the problem of looking up a pointer to each ray class. That said, it's still possible to work out a quick way to traverse the pool of ray classes in the stl vector to determine which ones the grid thread should process.&lt;br/&gt;&lt;br/&gt;The point here is not that c++ / stl worked perfectly BUT the way OO tends to force you into a particular way of thinking / implementation path.&lt;br/&gt;&lt;br/&gt;Although OO can and does work well in a multithreading environment it does come from an era in software development where things were largely serial in nature and most of the design patterns etc tend to steer you away from a optimal solution in a "throughput computing" environment. OO has the added disadvantage of encouraging you to code for the single case and not for the group.&lt;br/&gt;&lt;br/&gt;From now on I'm going to be very careful to avoid using "multithreaded" or "massively multithreaded" and "throughput computing" interchangeably as they are not the same thing at all. Although not mutually exlusive multithreaded implies lots of things running together in serial doing their own job and sometimes talking to each other via a variety of synchronization / sharing methods. Throughput computing is more about getting the job done efficiently, in general the higher the degree of sustained parallelism the higher the throughput.&lt;br/&gt;&lt;br/&gt;So, how would I change my implementation?&lt;br/&gt;&lt;br/&gt;Beware design patterns! Yes, great to use to get your work done - but efficient? Think carefully.&lt;br/&gt;&lt;br/&gt;Rays would be generated and stored in a pool with only origin, direction, last grid intersection and tri intersection. The grid threads can then easily operate on chunks of this data and store results quickly and efficiently for the next kernel/grid (if traversing) to pick up.&lt;br/&gt;&lt;br/&gt;The triangles could still be stored as classes but as we are only interested in the colour (I'm not using textures), apex, 2 sides and a normal it is much more efficient to store them in a flat structure and have the grid blocks store the index to the triangles that are within its bounds.&lt;br/&gt;&lt;br/&gt;Arranging the data in blocks also allows us to re-arrange it to be in a format that is more friendly to the memory access patterns of the device / cpu.&lt;br/&gt;&lt;br/&gt;Ultimately I see Objects starting to take more of a back seat in development especially in server side and throughput computing code. They still have an important role to play in many things - UI design is a good example.  I can see some sort of DAG entity being the new "object" probably stored in pools of similar ones all needing the same sort of processing or dependencies. We will probably get a whole new bunch of design patterns to go along with them too - exciting stuff!  Now who wants to write the new language / compiler??&lt;br/&gt;&lt;br/&gt;So think carefully, make sure your implementation hasn't changed your way of thinking. The code is meant to describe your algorithm not dictate its direction!&lt;br/&gt;&lt;br/&gt;Now just to find some time to re-do my ray tracing code....  :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3863115522590812373?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3863115522590812373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/10/c-c-and-stl.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3863115522590812373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3863115522590812373'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/10/c-c-and-stl.html' title='C / C++ and STL'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-1607425040893436716</id><published>2009-10-27T03:28:00.000-07:00</published><updated>2011-10-17T15:00:14.864-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GPU Temperature Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='Temperature Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Heat Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='BV2 Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Temperature'/><category scheme='http://www.blogger.com/atom/ns#' term='Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='nvidia gpu temperature'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Thermal'/><title type='text'>GPU Temperature Monitor</title><content type='html'>As of writing the combined download count of the &lt;a href="http://www.bv2.co.uk/?page_id=863" target="_blank"&gt;GPU Thermal Monitor&lt;/a&gt; has hit 520 :)&lt;br/&gt;&lt;br/&gt;So far I'm yet to receive any major feedback on bugs etc which leads me to believe it: a) works perfectly or b) no-one is bothering to report issues. As I'm an optimist I'm going with option a  :)&lt;br/&gt;&lt;br/&gt;I've had more requests for remote monitoring of the GPU temperature via a simple http request. This is something I need myself in order to keep track of temperatures in remote machines. This is now built in and in testing and bug fixing, hopefully to be released soon. I've not used completion ports as they seemed like overkill for what should be a light traffic application but as the source is included and under creative commons license please feel free to add them if needed. Secondly having it open source allows for some code review, which is important for security reasons as it now allows remote connections.&lt;br/&gt;&lt;br/&gt;If you have found a bug or would like another feature added please drop me a comment or email.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-1607425040893436716?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/1607425040893436716/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/10/gpu-temperature-monitor.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1607425040893436716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1607425040893436716'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/10/gpu-temperature-monitor.html' title='GPU Temperature Monitor'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3430388433403494756</id><published>2009-10-19T03:50:00.000-07:00</published><updated>2011-10-17T15:00:14.864-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Development'/><category scheme='http://www.blogger.com/atom/ns#' term='Amdahl&apos;s law'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel processing'/><category scheme='http://www.blogger.com/atom/ns#' term='ray tracing'/><title type='text'>Amdahl's law</title><content type='html'>A few months ago I made a post mentioning how I don't conform to the Amdahl's law way of thinking but never went into any details.&lt;br/&gt;&lt;br/&gt;The law describes the speedup that can be obtained if you can parallelize a section of your problem. The speedup that can be obtained is described by the following equation:&lt;br/&gt;&lt;br/&gt;[math] {\frac{1}{(1-P)+\frac{P}{S}}}[/math]&lt;br/&gt;&lt;br/&gt;Where P is the proportion of the problem that can be parallelized / sped up and S is the speedup amount.&lt;br/&gt;&lt;br/&gt;Assuming that S-&amp;gt;infinity  then P/S -&amp;gt; 0  this leave us with [math] {\frac{1}{(1-P)}}[/math]&lt;br/&gt;&lt;br/&gt;This implies that no matter how many processors / speed improvements we make to the P portion of the problem we can never do better than  [math] {\frac{1}{(1-P)}}[/math]   And the biggest % improvement from the baseline comes with low values of S (or relatively low numbers of parallel processors). This result is observed in the field time and again. Very seldom does throwing more than 4 or 8 processors at a problem speed it up any more than the large gains you get from the first 2 or 4 processors.&lt;br/&gt;&lt;br/&gt;This equation does expand with multiple P and associated S terms in order to describe a more complex / lengthly problem: (P1+P2+P3 = 100%)&lt;br/&gt;&lt;br/&gt;[math] {\frac{1}{(1-P1)+\frac{P1}{S1}}}+{\frac{1}{(1-P2)+\frac{P2}{S2}}}+{\frac{1}{(1-P3)+\frac{P3}{S3}}}[/math]&lt;br/&gt;&lt;br/&gt;Certain problems where P is large do respond well to the increase in processors these are known as "embarrassingly parallel", ray tracing is rather a good example of this.&lt;br/&gt;&lt;br/&gt; &lt;br/&gt;&lt;br/&gt;So why do I not agree with this if the equation makes sense?&lt;br/&gt;&lt;br/&gt;The assumption that only P areas can be accelerated by S and strung together in a serial fashion is rather simplistic.&lt;br/&gt;&lt;br/&gt;Why do we have to finish P1 before beginning P2?  Even if the P2 area has dependancies on P1 its rare to have the entire section of P2 to depend on a single result (of course there are cases - reduction kernels etc)&lt;br/&gt;&lt;br/&gt;Maybe P3 can overlap P1 and P2, some may benefit by having more processors while others may reach an optimal at two. Why not overlap the sections and supply them with their optimal processing power? This is easy to achieve with Directed Acyclic Graphs (DAG's) and can even be computed on the "fly" although they do get rather large!&lt;br/&gt;&lt;br/&gt;Quoting Amdahl's law as a reason why no further speed benefits are available in a system is really just showing that thinking is still stuck in serial mode with little bursts of parallelism thrown in.  Lets starting thinking parallel in all areas and make the most of all available compute resources.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3430388433403494756?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3430388433403494756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/10/amdahl-law.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3430388433403494756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3430388433403494756'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/10/amdahl-law.html' title='Amdahl&amp;#39;s law'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-1893206354594423907</id><published>2009-10-01T03:31:00.000-07:00</published><updated>2011-10-17T15:00:14.864-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='NVidia Fermi'/><category scheme='http://www.blogger.com/atom/ns#' term='NVidia'/><category scheme='http://www.blogger.com/atom/ns#' term='Fermi'/><title type='text'>Fermi</title><content type='html'>I've just completed reading the white paper released by nvidia which you can find &lt;a href="http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiArchitectureWhitepaper.pdf" target="_blank"&gt;here&lt;/a&gt;.&lt;br/&gt;&lt;br/&gt;Rather interestingly no mention of graphics performance has been made which, in a way, is really exciting. This has clearly been aimed at the high performance or throughput computing markets with the notable inclusion of ECC memory and increased double precision throughput along with the updated IEEE 754-2008 floating point support.&lt;br/&gt;&lt;br/&gt;Concurrent kernel execution and faster context switching will allow, with the use of DAG's, the optimization of execution on the devices rather than just working out the most efficient order of kernels to execute sequentially.&lt;br/&gt;&lt;br/&gt;Also tucked away in the white paper is the mention of have predication at the instruction level which should give greater control of divergent paths in your kernels.&lt;br/&gt;&lt;br/&gt;The inclusion of C++ support will appeal to a lot of people but am I rather unconvinced this is the correct way to go for throughput computing as it will encourage the use of all the old patterns that may work well in serial cases but are often rather poor for enabling maximum throughput.&lt;br/&gt;&lt;br/&gt;There is a lot more in the paper and already an announcement by Oak Ridge that they will be using it in a new supercomputer.&lt;br/&gt;&lt;br/&gt;All in all its a wonderful development and I can't help feeling that computing took a substantial leap forward today.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-1893206354594423907?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/1893206354594423907/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/10/fermi.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1893206354594423907'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/1893206354594423907'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/10/fermi.html' title='Fermi'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4404867447277097151</id><published>2009-09-25T02:08:00.000-07:00</published><updated>2011-10-17T15:00:14.865-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OptiX'/><category scheme='http://www.blogger.com/atom/ns#' term='ray tracing'/><title type='text'>Work and Spam...</title><content type='html'>Yes I am still alive... :)    I've been exceptionally busy at work lately, not complaining though as work pays the bills and gpu / compute things are just a hobby.&lt;br/&gt;&lt;br/&gt;What I do find rather frustrating is in the limited time I have to maintain the blog, it all gets eaten up by wading through mountains of spam. Akismet does a fantastic job, but on the forums side the bot detection captcha is pathetic and I have around 100 signups / 100's of messages to check every day. So until a later date when I can fix / upgrade it I will be suspending the forums from today.&lt;br/&gt;&lt;br/&gt;On a rather more positive note:  Nvidia has announced &lt;a href="http://www.nvidia.com/object/optix.html" target="_blank"&gt;OptiX&lt;/a&gt; ray tracing engine. Some examples are out already so worth having a look at.  Nvidia has also announced &lt;a href="http://developer.nvidia.com/object/nexus.html" target="_self"&gt;Nexus&lt;/a&gt; which looks incredible. I have heard that they may charge for it but with the features it apparently has I would be willing to pay.&lt;br/&gt;&lt;br/&gt;For some of the latest conferences papers and pre-prints have a look at &lt;a href="http://kesen.huang.googlepages.com/" target="_blank"&gt;Ke-Sen Huang's&lt;/a&gt; page which he keeps updated with some really nice stuff.&lt;br/&gt;&lt;br/&gt;As for me - no new cuda / compute development for a while. When I get a few mins free I'm tending to work on the maths for the stuff I want to do but never get enough time to implement anymore. Hopefully this will change in 2 or 3 months once work settles down a bit. On the subject of work there have been one or two questions over email and on the blog of the steps taken after an edge detector has been applied - unfortunately I cannot answer this as it gets a bit close to what I do for work (OCR and document processing). Rather interesting one of my ray tracing acceleration algorithms which turned out pathetically for its purpose has been rather good at identifying features in 2d/3d images but more on this at a later date once its a bit more complete.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4404867447277097151?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4404867447277097151/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/09/work-and-spam.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4404867447277097151'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4404867447277097151'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/09/work-and-spam.html' title='Work and Spam...'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3675716353906988859</id><published>2009-07-30T08:53:00.000-07:00</published><updated>2011-10-17T15:00:14.865-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='xy plane'/><category scheme='http://www.blogger.com/atom/ns#' term='Maths'/><category scheme='http://www.blogger.com/atom/ns#' term='Gaussian Convolution'/><title type='text'>3D Gaussian Convolution</title><content type='html'>There hasn't been much in the way of posts here lately as I've been really busy at work getting some new components built into the systems I work on. Not really hard but it's frustrating things like trying to get various components and libraries written in different languages to work together. So lately I've not had the energy to do much work on the computer once I get home...&lt;br/&gt;&lt;br/&gt;There has been a bit of interest in my 3D Gaussian convolution kernels. Although I explained the technique mathematically in an earlier post I never actually posted the code. As it is rather quick and quite a novel way of calculating the convolution for a xy plane I decided to post it so everyone can benefit from / improve upon the technique.  As always comments / bug reports etc are always welcome :)&lt;br/&gt;&lt;br/&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;The code is a series of kernels that will do a Gaussian Convolution (smoothing) on a 3D data set of floats.   The Gaussian kernel that I used is a 5x5x5 one but it is relatively easy to extend this.  The data set is a volume 256x256x256 of floats. Again with some modifications the kernels will handle a greater or lesser volume - up to the max memory of your card.&lt;br/&gt;&lt;br/&gt;In a trivial implementation you will read 5x5x5 data elements centred around your x,y,z point and then multiple them with their corresponding co-efficient. Finally summing them and dividing before saving into a seperate memory area. Of course this means 125 reads and associated calculations for every single data element in our 256x256x256 block resulting in over 2 billion reads - not very efficient!&lt;br/&gt;&lt;br/&gt;Using the fact that the Gaussian filter we are using is actually a convolution we can do each direction in turn before summing the result and dividing. This gives us 256x256x256 x 5 x 3 = 251658240 reads - which is over 8 times lower than the trivial implementation!&lt;br/&gt;&lt;br/&gt;As we are using the co-efficients many times it is beneficial to pre-calculate these and store them in constant memory. Constant memory on the device is cached and is really fast. Please note that this may not always be the case but for the Gaussian co-efficients that require more than one floating point operation to calculate this results in a large time saving.  This is implemented in "calcGaussianCoefficients" in the source code here:  [download id="5"] .&lt;br/&gt;&lt;br/&gt;Now the general method would be to process the x,y and z directions in turn, however it is possible to calculate a plane at a time while only having to store one column at a time in shared memory.  If all threads in a block are calculating a row each they will finish an element of the row at the same time (make sure to syncthreads to ensure they are all done) as we then have a completed calculation for the same element in each row it is then possible to convolve in the y direction without having to read any additional data from global memory.   Once the y convolution is completed you save the result back to a tempory area in global memory.   This technique ensures you get your memory reads and writes fully coalesced while simulateously minimizing the number of memory reads required.  The limited shared memory available can mean there may be bank conflicts between the threads but a bank conflict is still faster than a global memory read.  This is implemented in "calcGaussianXYPlaneConvolution256" in the source code:  [download id="5"] .&lt;br/&gt;&lt;br/&gt;The code for the xy convolution is very quick and uses a high percentage of the available bandwidth. As of writing it is the fastest Gaussian convolution I have seen for a plane. Certain assumptions have been made about the boundary conditions which are perfectly acceptable for my application but may not be ideal for your own. It is a good idea to check the output against what you expect using a Gold or Silver kernel.&lt;br/&gt;&lt;br/&gt;Please note the code supplied is not generic and is specific to a 5x5x5 Gaussian on a set of  256x256 planes although it is very easy to modify to other sizes. The z convolution is not included in the source. In the z convolution you would need to divide to get the correct value.&lt;br/&gt;&lt;br/&gt;As usual the code is licensed under the:  Creative Commons - Attribution-Share Alike 2.0 UK: England &amp;amp; Wales  License - ie feel free to use it in anything but some acknowledgement would be nice :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3675716353906988859?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3675716353906988859/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/07/3d-gaussian-convolution.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3675716353906988859'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3675716353906988859'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/07/3d-gaussian-convolution.html' title='3D Gaussian Convolution'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-7614200941128151044</id><published>2009-07-23T09:22:00.000-07:00</published><updated>2011-10-17T15:00:14.865-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Lattice Boltzman'/><category scheme='http://www.blogger.com/atom/ns#' term='SPH'/><category scheme='http://www.blogger.com/atom/ns#' term='sorting algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='ray tracing'/><category scheme='http://www.blogger.com/atom/ns#' term='Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='PhysX'/><title type='text'>Happy blog day!</title><content type='html'>My little blog is officially one year old today! :)&lt;br/&gt;&lt;br/&gt;Over the last year I've made many more posts than I was originally planning to make and judging by the number of subscribers and variety of institutions and people who visit it's not entirely all rubbish, or so I like to tell myself.&lt;br/&gt;&lt;br/&gt;Ongoing projects that will hopefully get completely or improved in the coming year are:&lt;br/&gt;&lt;br/&gt;Thermal monitor - networking coming soon&lt;br/&gt;&lt;br/&gt;More raytracing (of course) - linking PhysX to it (thanks to Timothy's post &lt;a href="http://farrarfocus.blogspot.com/2009/07/where-is-my-raytraced-physics-toy.html" target="_blank"&gt;here&lt;/a&gt; for inspiration) to improve on my rather primitive bouncing balls.&lt;br/&gt;&lt;br/&gt;SPH and Lattice methods for fluid simulation&lt;br/&gt;&lt;br/&gt;Sorting algorithms - I haven't posted much on this lately but some decent results so far&lt;br/&gt;&lt;br/&gt;Massive Data set processing.&lt;br/&gt;&lt;br/&gt;Anything else that grabs my interest&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-7614200941128151044?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/7614200941128151044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/07/happy-blog-day.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7614200941128151044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7614200941128151044'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/07/happy-blog-day.html' title='Happy blog day!'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-7908953976465061626</id><published>2009-07-21T03:28:00.000-07:00</published><updated>2011-10-17T15:00:14.865-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Jack Dongarra'/><category scheme='http://www.blogger.com/atom/ns#' term='directed acyclic graphs'/><category scheme='http://www.blogger.com/atom/ns#' term='Throughput Computing'/><title type='text'>Throughput Computing</title><content type='html'>Two weeks ago I attended a seminar at the Daresbury laboratory near Warrington. Unfortunately I only had the day off so couldn't attend the workshops on offer and buttercup, my trusty landy, had to drive there and back on the same day. Sorry to everyone on the m4, m5 and m6 that day :p&lt;br/&gt;&lt;br/&gt;&lt;a href="http://www.netlib.org/utk/people/JackDongarra/" target="_blank"&gt;Jack Dongarra&lt;/a&gt; presented a rather nice summary of supercomputing / parallel computing or as he called it "Throughput Computing".  I had never heard this term before and now really like it. It implies that you should make the best of available resources in order to maximize throughput. It's very easy to get into a mindset where everything has to be in parallel but it's possible by doing so you neglect to notice that some parts work very well in serial.&lt;br/&gt;&lt;br/&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;Of course serial parts can be a bottleneck with lots of fork and join bits which reduces your throughput, a better way of keeping your compute nodes busy while dealing with the inevitable serial sections is to use directed acyclic graphs.&lt;br/&gt;&lt;br/&gt;He also mentioned that effectively programming these large and complex machines is becomming more and more complex and therefore costly and there is currently no language that can adequately describe the systems needed.&lt;br/&gt;&lt;br/&gt;In the current and next generation of supercomputers there will be less memory per core than there currently is. Memory is a big consumer of power and in the drive for power efficiency this will have to be reduced possibly with small chunks of memory layered onto the processing elements in a 3d fashion. This will of course mean a rethink of existing algorithms as compute elements may not necessarily share a common memory area.&lt;br/&gt;&lt;br/&gt;Another very interesting topic he mentioned, which I had not considered before, is the area of fault tolerance. We are used to expecting hard drive / storage to fail and so have various systems to negate the impact a failure has (raid / mirrors etc) but if a compute node fails in a large cluster in the middle of a matrix operation for example how do we firstly detect there has been a problem and secondly how to recover from it without having to restart the entire run.&lt;br/&gt;&lt;br/&gt;Lastly he mentioned:&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;5 Important Features to Consider:&lt;/strong&gt;&lt;br/&gt;&lt;br/&gt;1) Many Core and hybrid machines will require block data layout and dynamic data driven execution.&lt;br/&gt;&lt;br/&gt;2) Mixed precision - you don't always need maximum precision&lt;br/&gt;&lt;br/&gt;3) Self adapting / auto tuning Software&lt;br/&gt;&lt;br/&gt;4) Fault Tolerant Algorithms&lt;br/&gt;&lt;br/&gt;5) Communication avoiding Algorithms&lt;br/&gt;&lt;br/&gt; &lt;br/&gt;&lt;br/&gt;All in all a very informative talk and well worth the 10 hours of driving time :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-7908953976465061626?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/7908953976465061626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/07/throughput-computing.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7908953976465061626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7908953976465061626'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/07/throughput-computing.html' title='Throughput Computing'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-6213232760694106839</id><published>2009-07-03T02:59:00.000-07:00</published><updated>2011-10-17T15:00:14.865-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Numerical precision'/><category scheme='http://www.blogger.com/atom/ns#' term='Maths'/><category scheme='http://www.blogger.com/atom/ns#' term='Floating point precision'/><category scheme='http://www.blogger.com/atom/ns#' term='Scientific Computing'/><title type='text'>Numerical Precision</title><content type='html'>Numerical precision is an ongoing concern of mine especially in big / long running simulations and solvers.&lt;br/&gt;&lt;br/&gt;I came across an article by Rob Farber on the &lt;a href="http://www.scientificcomputing.com/article-hpc-Numerical-Precision-How-Much-is-Enough-063009.aspx" target="_blank"&gt;scientificcomputing.com&lt;/a&gt; site this morning that asks the question "How much is Enough?".  Although no definitive answers are presented the author summarizes the current and future concerns over accuracy.&lt;br/&gt;&lt;br/&gt;Personally I don't believe floating point is the way forward. Floating point is fast to calculate in hardware but is not always an ideal way of representing numbers. Although the various branches of mathematics are largely base independent humans are most comfortable with base 10 while computers are of course most comfortable with base 2. This does result in some situations when a calculation in base 10 with only a few decimals of precision gives precise results whereas a calculation in base 2 is incapable of giving a precise result even given N bits of precision although the result is probably acceptable after n bits.&lt;br/&gt;&lt;br/&gt;I'm not presenting any solution to the precision problem, but merely pointing out that sometimes the issue is caused by:   using base 2 for calculations  and/or  the floating point representation of these numbers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-6213232760694106839?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/6213232760694106839/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/07/numerical-precision.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/6213232760694106839'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/6213232760694106839'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/07/numerical-precision.html' title='Numerical Precision'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-7113762221532827934</id><published>2009-07-01T03:33:00.000-07:00</published><updated>2011-10-17T15:00:14.866-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GPU Temperature Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='Temperature Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='nvapi'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='masm32'/><category scheme='http://www.blogger.com/atom/ns#' term='BV2 Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='masm'/><category scheme='http://www.blogger.com/atom/ns#' term='Thermal Monitor'/><title type='text'>BV2 Thermal monitor v0.11</title><content type='html'>Here is a small update to the Thermal Monitor which implements the "always on top" feature. This can be enabled and disabled by right clicking on its little system try icon.&lt;br/&gt;&lt;br/&gt;This is not the point release as planned as some of the additional features were appearing a bit unstable on my Win XP pro 64 machine. Until I find time to sort them out I thought I would release the most requested feature.&lt;br/&gt;&lt;br/&gt;The source code is included. It is still in masm and does include routines for non-standard window shapes and transparent blts etc - could be worth a look if you are interested in masm32.&lt;br/&gt;&lt;br/&gt;It is now available in the &lt;a href="http://www.bv2.co.uk/?page_id=863" target="_self"&gt;downloads&lt;/a&gt; section or here:&lt;br/&gt;&lt;br/&gt;[download id="4"] - Please see &lt;a href="http://www.bv2.co.uk/?page_id=849" target="_blank"&gt;Licenses Section &lt;/a&gt;for license details. By downloading you agree to be bound by the terms of the Creative Commons Attribution Share Alike license.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-7113762221532827934?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/7113762221532827934/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/07/bv2-thermal-monitor-v011.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7113762221532827934'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/7113762221532827934'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/07/bv2-thermal-monitor-v011.html' title='BV2 Thermal monitor v0.11'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-3252019209541703505</id><published>2009-06-25T08:02:00.000-07:00</published><updated>2011-10-17T15:00:33.148-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Gustafson&apos;s Law'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Silver Kernel'/><category scheme='http://www.blogger.com/atom/ns#' term='Development'/><category scheme='http://www.blogger.com/atom/ns#' term='Sparse Matrix'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='Amdahl&apos;s law'/><category scheme='http://www.blogger.com/atom/ns#' term='gold kernel'/><category scheme='http://www.blogger.com/atom/ns#' term='BV2 Thermal Monitor'/><category scheme='http://www.blogger.com/atom/ns#' term='CFD'/><category scheme='http://www.blogger.com/atom/ns#' term='Bronze Kernel'/><title type='text'>Glass half full or half empty?</title><content type='html'>Or as this is a hpc/cuda/parallel processing site:&lt;br/&gt;&lt;br/&gt;Gustafson's Law or Amdahl's law?&lt;br/&gt;&lt;br/&gt;Personally I prefer Gustafson's Law .... it seems more logical to me or is this just because I'm inherently an optimist?&lt;br/&gt;&lt;br/&gt;I would be quite interested on hearing your views on this - so comments /&lt;a href="http://forums.bv2.co.uk/viewtopic.php?f=3&amp;amp;t=11" target="_blank"&gt;forum&lt;/a&gt; posts most welcome.&lt;br/&gt;&lt;br/&gt;In other news:  The thermal monitor downloads have gone over 80! :)  The updated version (v0.2) is ready after mucking around with subclassing a control.... I will release it soon...&lt;br/&gt;&lt;br/&gt;Otherwise I have been extremely busy on debugging a sparse matrix solver - bugs in huge datasets can be hard to find! Even with the aid of Gold / Sivler / Bronze kernels mentioned in the last post they have been proving remarkably tricky to isolate. Rather surprising to me is the fact that the long long data type doesn't consume a lot more processing time than a normal unsigned int - so I have been using that wherever there is a risk of exceeding 2^32.&lt;br/&gt;&lt;br/&gt;CFD code coming soon too - although I have unwound a lot of the optimizations in order to make it easier to understand and possibly be a good foundation for your own optimizations.&lt;br/&gt;&lt;br/&gt;Right .... back to the grindstone!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-3252019209541703505?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/3252019209541703505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/06/glass-half-full-or-half-empty.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3252019209541703505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/3252019209541703505'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/06/glass-half-full-or-half-empty.html' title='Glass half full or half empty?'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-73331669704224959.post-4217284360485225828</id><published>2009-06-22T06:58:00.000-07:00</published><updated>2011-10-17T15:00:33.148-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Gold Silver Bronze kernels'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA debugging'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA large data sets'/><category scheme='http://www.blogger.com/atom/ns#' term='gold kernel'/><title type='text'>Gold, Silver and Bronze</title><content type='html'>Kernels of course! :)&lt;br/&gt;&lt;br/&gt;Most of the readers of this blog should be familiar with a "Gold" kernel in which your data is processed on the CPU (usually) and the output is carefully checked. This kernel and its associated outputs form the basis of the regression testing of subsequent implementations on the GPU including algorithmic optimizations.&lt;br/&gt;&lt;br/&gt;Personally I like most of my gold kernels to be naive implementations of an algorithm. This causes them to be  easily verifiable and usually easy to debug if there is a problem.&lt;br/&gt;&lt;br/&gt;If you currently don't implement a Gold kernel before writing your CUDA implementations and/or adapting you algorithm I strongly suggest you do.&lt;br/&gt;&lt;br/&gt;The purpose of this post is to suggest two other debugging techniques I use when needed and where possible. I call them my Silver and Bronze kernels.&lt;br/&gt;&lt;br/&gt;A Silver kernel is implemented on the GPU without any optimizations or algorithmic enhancements. The grid / block structure is as simple as possible making sure we don't vary from the Gold kernels implementation too much - only unwinding the loops into grid/blocks is allowed where possible. This type of kernel I use when I am writing something that depends on numerical precision. Once written and verified within acceptable numerical limits against the Gold kernel it becomes the new baseline kernel before later optimizations. This allows exact matching of later kernel outputs rather than using an "acceptable deviation" approach.&lt;br/&gt;&lt;br/&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;My Bronze kernel is extremely useful for detecting errors that occur in long chains of different kernel invocations usually involving large datasets. The CUDA emulator can be used for this but often the performance hit makes it take an unfeasibly long time to get to the area where your bug is occurring. I usually use my Bronze kernel for diagnosing and fixing the "Unspecified Launch error" message.&lt;br/&gt;&lt;br/&gt;&lt;strong&gt;To implement a Bronze kernel:&lt;/strong&gt;&lt;br/&gt;&lt;br/&gt;Allocate host memory for all the data structures the original kernel depends on&lt;br/&gt;&lt;br/&gt;Make sure any constants used on the device are also available on the host&lt;br/&gt;&lt;br/&gt;copy the data needed from the device into the host memory we allocated&lt;br/&gt;&lt;br/&gt;re-wind our loops etc and re-code our device kernel back into a "loopy" way. Keep in mind any _device_ calls would be actually inlined on the device so try and do the same on the host. Not implementing them in an inline fashion can hide bugs due to stack preservation on the host side. Sometimes unthreading a kernel is rather tricky but as far as possible try and get it back into a single threaded approach unless of course you are trying to debug a thread overlap / sync issue.&lt;br/&gt;&lt;br/&gt;execute the kernel - set any breakpoints /watches you may need&lt;br/&gt;&lt;br/&gt;copy the data back to the correct structures on the GPU&lt;br/&gt;&lt;br/&gt;deallocate the memory we allocated in the first step&lt;br/&gt;&lt;br/&gt; &lt;br/&gt;&lt;br/&gt;The benefit of this approach arises from the host OS / CPU generating exceptions on memory accesses that have gone awry or loops that have exceeded their bounds etc etc. It also allows for easy examination of variables and control flows without having to run the entire program under the emulator.&lt;br/&gt;&lt;br/&gt;Recently using the Bronze kernel technique I detected an unsigned int overflow on an index which was basically impossible to find and did not occur on the smaller test sets.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/73331669704224959-4217284360485225828?l=www.bv2.co.uk' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://www.bv2.co.uk/feeds/4217284360485225828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.bv2.co.uk/2009/06/gold-silver-and-bronze.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4217284360485225828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/73331669704224959/posts/default/4217284360485225828'/><link rel='alternate' type='text/html' href='http://www.bv2.co.uk/2009/06/gold-silver-and-bronze.html' title='Gold, Silver and Bronze'/><author><name>Barrett</name><uri>http://www.blogger.com/profile/13549719980585276668</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
