Double-precision floating-point format

In computing, double precision is a computer numbering format that occupies two adjacent storage locations in computer memory. A double precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point.

Modern computers with 32-bit stores (single precision) provide 64-bit double precision. Double precision floating point is an IEEE 754 standard for encoding binary or decimal floating point numbers in 8 bytes.

Double precision binary floating-point format

Double precision binary floating-point is a commonly used format on PCs, due to it's wider range over single precision floating point, even if at a performance and bandwith cost. As with single precision floating point format, it lacks precision on integer numbers when compared with integer formats. It's commonly known simply as 'double. The IEEE 754 standard defines a double as:

Sign bit: 1 bit
Exponent width: 11 bits
Significant precision: 52 bits (53 implicit)

The format is written with an implicit integer bit with value 1 unless the written exponent is all zeros. With the 52 bits of the fraction mantissa appearing in the memory format the total precision is therefore 53 bits (approximately 16 decimal digits, $\log _{10}(2^{53})\approx 15.955$ ). The bits are laid out as follows:

Exponent encoding

The double precision binary floating-point exponent is encoded using an Excess-N representation, to be more exact, Excess-1023, also known as exponent bias on the IEEE 754 standard. Examples of such representations would be:

E_min (1) = -1022
E (50) = -973
E_max (2046) = 1023

Thus, as defined by Excess-N representation, in order to get the true exponent, the exponent bias (1023) has to be subtracted from the written exponent.

The exponents 0x000 and 0x7ff have a special meaning:

0x000 is used to represent zero and denormals.
0x7ff is used to represent infinity and NaNs.

All bit patterns are valid encoding.

The entire double precision number is described by:

$(-1)^{\text{sign}}\times 2^{{\text{exponent}}-{\text{exponent bias}}}\times 1.{\text{mantissa}}$

Double precision examples

0x3ff0 0000 0000 0000   = 1
0x3ff0 0000 0000 0001   = 1.0000000000000002220446049250313080847263336181640625, the next higher number > 1
0x3ff0 0000 0000 0002   = 1.000000000000000444089209850062616169452667236328125
0x4000 0000 0000 0000   = 2
0xc000 0000 0000 0000   = –2

0x7fef ffff ffff ffff   ≈ 1.7976931348623157 x 10³⁰⁸ (Max Double)

0x0000 0000 0000 0000   = 0
0x8000 0000 0000 0000   = –0

0x7ff0 0000 0000 0000   = Infinity
0xfff0 0000 0000 0000   = -Infinity

0x3fd5 5555 5555 5555   ≈ 1/3

(1/3 rounds down instead of up like single precision, because of the odd number of bits in the significant.)

In more detail:

Given the binary representation 0x3fd5 5555 5555 5555,
  Sign = 0x0
  Exponent = 0x3fd = 1021
  Exponent Bias = 1023 (above)
  Mantissa = 0x5 5555 5555 5555
  Value = 2^{(Exponent − Exponent Bias)} × 1.Mantissa – Note the Mantissa must not be converted to decimal here
        = 2^–2 × (0x15 5555 5555 5555 × 2^–52)
        = 2^–54 × 0x15 5555 5555 5555
        = 0.333333333333333314829616256247390992939472198486328125
        ≈ 1/3

Double precision binary floating-point format

Exponent encoding

Double precision examples

See also