Jump to content

Quadruple-precision floating-point format

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by ShashClp (talk | contribs) at 13:46, 16 March 2009 (Added extended precision). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computing, quadruple precision (also commonly shortened to quad precision) is a computer numbering format that occupies four storage locations in computer memory at address, address+1, address+2, and address+3. A quad-precision number, sometimes simply a quad, may be defined to be an integer, fixed point, or floating point.

In IEEE 754-2008 this 128-bit format is officially referred to as binary128. It is the fourth basic binary floating point interchange format, together with the 64-bit double precision, the 32-bit single precision, and the 16-bit half precision formats.

Quadruple precision memory format

 Sign bit: 1
 Exponent width: 15  
 Significand precision: 112 (113 implicit)   

The format is written with an implicit integer bit with value 1 unless the written exponent is all zeros. Thus only 112 bits of the fraction appear in the memory format, but the total precision is 113 bits (approximately 34 decimal digits, ). The bits are laid out as follows:

Exponent encodings

 Emin (0x0001) = -16382
 Emax (0x7ffe) = 16383
 Exponent bias (0x3fff) = 16383

The true exponent = written exponent - exponent bias

 0x0000 and 0x7fff  are reserved exponents 
 0x0000 is used to represent zero and denormals
 0x7fff is used to represent infinity and NaNs

All bit patterns are valid encodings.

Quadruple precision examples in hexadecimal

 3fff 0000 0000 0000 0000 0000 0000 0000   = 1
 c000 0000 0000 0000 0000 0000 0000 0000   = -2
 7ffe ffff ffff ffff ffff ffff ffff ffff   ~  1.189731495357231765085759326628007 x 104932 (Max Quad)
 3ffd 5555 5555 5555 5555 5555 5555 5555   ~  1/3

By default, 1/3 rounds down like double precision, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place.

 0000 0000 0000 0000 0000 0000 0000 0000   = 0
 8000 0000 0000 0000 0000 0000 0000 0000   = -0
 7fff 0000 0000 0000 0000 0000 0000 0000   = Infinity
 ffff 0000 0000 0000 0000 0000 0000 0000   = -Infinity

See also