tag:blogger.com,1999:blog-73331669704224959.post6213232760694106839..comments2023-08-20T04:45:55.158-07:00Comments on ComputeCube: Numerical PrecisionBarretthttp://www.blogger.com/profile/13549719980585276668noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-73331669704224959.post-8700261790448549022009-07-03T18:57:02.000-07:002009-07-03T18:57:02.000-07:00Could always attempt to go with 64-bit integer mat...Could always attempt to go with 64-bit integer math, however you run the risk of integer multiply not being fully pipelined on some archs. Perhaps what you really want is a fused integer multiply add with intermediate shifts...Timothy Farrarhttp://farrarfocus.blogspot.comnoreply@blogger.comtag:blogger.com,1999:blog-73331669704224959.post-72832973852769106442009-07-04T01:44:20.000-07:002009-07-04T01:44:20.000-07:00Back in 'ye olde days' integer maths was a...Back in 'ye olde days' integer maths was almost always faster so using custom fixed point integer maths routines was a good idea both for accuracy and speed. Of course if you implemented your own fixed point you could relatively easily extend the precision as required. You still find quite a few big number libraries out on the internet these days.<br><br>If you want performance though, floating point is built into the hardware so is usually faster. And in cuda land 32bit mul's are very slow - hence the 24bit operations. When you get to double precision the current generation Nvidia GPU's only have 1 DP unit per multiprocessor so it *may* actually be faster to use 64 or 128 fixed point integer maths if you structure the code correctly. Something I'd love to test at some stage.<br><br>The IEEE 754r specification is also worth looking at as it does address some of the issues in the earlier spec.Barretthttp://www.bv2.co.uknoreply@blogger.com