Posts Tagged ‘FPU rounding’

CUDA Emulator Output

Wednesday, June 10th, 2009

As many people have noticed the same code executed in Emulator mode gives different floating point results from the kernels run in Debug or Release mode.

Although I know what causes this I have never bothered to investigate the actual differences as most of the stuff I write runs entirely on the GPU. Recently I have had to compare results on the CPU<->GPU and wrote some code to change the FPU settings. Firstly a quick explanation:

By default the CPU (FPU) is set to use 80 bit floating point internally. This means that when you load in an integer (fild) or a single / double float (fld) it gets converted to a 80 bit number inside the FPU stack. All operations are performed internally at 80 bits and when storing the result it converts back to the correct floating point width (single / double)  (fst / fstp). 

This method of operation is desirable as it reduces the effect of rounding / truncating on the intermediate results.  Of course while very useful for computing on the CPU this is not how the CUDA devices operate.

(more…)