Exact Floating Point

Alan A. Jorgensen,Andrew C. Masters

Exact Floating Point

2021

Standard IEEE floating point, which defines the representation and calculations of real numbers using a binary representation similar to scientific notation, does not define an exact floating-point result. In contrast, here we use a patented bounded floating-point (BFP) device and method for calculating and retaining the precision of the floating-point number represented, which provides an indication of exactness, with an “exact” floating-point result defined as a result that has error within + or – ½ units in the last place (ulps). Analysis and notification of exactness is important because subtraction of “similar,” but inexact, floating-point numbers can introduce an error (even catastrophic error) in the calculation. Here we also define “similar” and use bounded floating point to provide examples comparing subtraction of exact and inexact similar numbers by comparing the results from 64-bit and 128-bit standard and 80-bit bounded floating-point calculations.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations