decimal128 floating-point format

In computing, decimal128 is a decimal floating-point number format that occupies 16 bytes (128 bits) in memory.

Purpose and use

Like the binary128 formats, decimal128 takes place where extreme precision or ranges are to be handeled.

In contrast to the binaryxxx data formats the decimalxxx formats provide exact representation of decimal fractions, exact calculations with them and enable human common 'ties away from zero' rounding^[1] (in some range, to some precision, to some degree). In a trade-off for reduced performance, which is especially harming decimal128 computations on common 64- or 32-bit hardware. They are intended for applications where it's requested to come near to schoolhouse math, such as financial and tax computations. (In short they avoid plenty of problems like 0.2 + 0.1 -> 0.30000000000000000000000000000000004 which happen with binary128 datatypes.)

Range and precision

decimal128 supports 'normal' values that can have 34 digit precision from ±1.000000000000000000000000000000000×10^⁻⁶¹⁴³ to ±9.999999999999999999999999999999999×10^⁺⁶¹⁴⁴, plus 'denormal' values with ramp-down relative precision down to ±1 × 10⁻⁶¹⁷⁶ (only one digit left), signed zeros, signed infinities and NaN (Not a Number).

The binary format of the same bit-size supports a range from denormal-min ±6×10^⁻⁴⁹⁶⁶, over normal-min with full 113-bit precision ±3.3621031431120935062626778173217526×10^⁻⁴⁹³² to max ±1.189731495357231765085759326628007×10^⁺⁴⁹³².

Performance

Performance comparison is not easy, not very accurate and lacks reproducibility on modern IT systems for various reasons. One can roughly say that in a current 64-bit Intel(r) / linux / gcc / libdfp / BID implementation, basic arithmetic operations with decimal128 values are between factor 2.5 and 5 slower than with binary128 data types, while 'higher' functions like powers ( ~4500 ) and trigonometric functions like tangent ( ~550 ) suffer more performance penalties. Accounting that already the binary128 basic functions are a little and 'higher functions' significantly slower than the common binary64 values it's a massive impact which should be well considered and tested. To get an idea about performance on a specific system the code in 'Addendum - code' can be used. Perhaps the GNU gcc project and the 'libdfp' project on github could like some help to improve.

Representation / encoding of decimal128 values

decimal128 values are represented in a 'not normalized' near to 'scientific format', with combining some bits of the exponent with the leading bits of the significand in a 'combination field'.

Generic encoding
Sign	Combination	Trailing significand bits
1 bit	17 bits	110 bits
s	mmmmmmmmmmmmmmmmm	tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

Besides the special cases infinities and NaNs there are four points relevant to understand the encoding of decimal128.

BID vs. DPD encoding, Binary Integer Decimal using a binary coded positive integer for the significand, software centric and designed by Intel(r), vs. Densely Packed Decimal based on densely packed decimal encoding for all except the first digit of the significand, hardware centric and promoted by IBM(r), differences see below. Both alternatives provide exactly the same range of representable numbers: 34 digits of significand and $3 \times 2 12 = 12288$ possible exponent values. IEEE 754 allows these two different encodings, without a concept to denote which is used, for instance in a situation where decimal128 values are communicated between systems. CAUTION!: Be aware that transferring binary data between systems using different encodings will mostly produce valid decimal128 numbers, but with different value. Prefer data exchange in íntegral or ASCII 'triplets' for sign, exponent and significand.

Because the significands in the IEEE 754 decimal formats is not normalized (in contrast to the binary formats), most values with less than 34 significant digits have multiple possible representations; 1000000 × 10^-2=100000 × 10^-1=10000 × 10⁰=1000 × 10¹ all have the value 10000. These sets of representations for a same value are called cohorts, the different members can be used to denote how many digits of the value are known precisely.

The encodings combine two bits of the exponent with the leading 3 to 4 bits of the significand in a 'combination field', different for 'big' vs. 'small' significands. That enables bigger precision and range, in trade-off that some simple functions like sort and compare, very frequently used in coding, do not work on the bit pattern but require computations to extract exponent and significand and then try to obtain an exponent aligned representation. This effort is partly balanced by saving the effort for normalization, but contributes to the slower performance of the decimal datatypes. Beware: BID and DPD use different bits of the combination field for that, see below.

Different understanding of significand as integer or fraction, and acc. different bias to apply for the exponent (for decimal128 what is stored in bits can be decoded as base to the power of 'stored value for the exponent minus bias of 6143' times significand understood as $d 0 . d -1 d -2 d -3 ... d -31 d -32 d -33$ (note: radix dot after first digit, significand fractional), or base to the power of 'stored value for the exponent minus bias of 6176' times significand understood as $d 33 d 32 d 31 ... d 3 d 2 d 1 d 0$ (note: no radix dot, significand integral), both produce the same result [2019 version^[2] of IEEE 754 in clause 3.3, page 18]. For decimal datatypes the second view is more common, while for binary datatypes the first, the biases are different for each datatype.)

In the case of Infinity and NaN, all other bits of the encoding are ignored. Thus, it is possible to initialize an array to Infinities or NaNs by filling it with a single byte value.

Binary integer significand field

This format uses a binary significand from 0 to $1034 - 1$ = 9999999999999999999999999999999999 = 1ED09BEAD87C0378D8E63FFFFFFFF₁₆ = 011110110100001001101111101010110110000111110000000011011110001101100011100110001111111111111111111111111111111111₂. The encoding can represent binary significands up to $10 \times 2 110 - 1$ = 12980742146337069071326240823050239 but values larger than $1034 - 1$ are illegal (and the standard requires implementations to treat them as 0, if encountered on input).

If the 2 bits after the sign bit are "00", "01", or "10", then the exponent field consists of the 14 bits following the sign bit, and the significand is the remaining 113 bits, with an implicit leading 0 bit:

This includes subnormal numbers where the leading significand digit is 0.

If the 2 bits after the sign bit are "11", then the 14-bit exponent field is shifted 2 bits to the right (after both the sign bit and the "11" bits thereafter), and the represented significand is in the remaining 111 bits. In this case there is an implicit (that is, not stored) leading 3-bit sequence "100" in the true significand. Compare having an implicit 1 in the significand of normal values for the binary formats. The "00", "01", or "10" bits are part of the exponent field.

For the decimal128 format, all of these significands are out of the valid range (they begin with $2113 > 1.038 \times 10 34$ ), and are thus decoded as zero, but the pattern is same as for decimal32 and decimal64.

Be aware that the bit numbering used in the tables for e.g. m₁₆ … m₀ is in opposite direction than that used in the paper for the IEEE 754 standard G₀ … G₁₆.

BID Encoding
Combination Field																	Exponent	Significand / Description
m₁₆	m₁₅	m₁₄	m₁₃	m₁₂	m₁₁	m₁₀	m₉	m₈	m₇	m₆	m₅	m₄	m₃	m₂	m₁	m₀	Exponent	Significand / Description
combination field not! starting with '11', bits ab = 00, 01 or 10
a	b	c	d	m	m	m	m	m	m	m	m	m	m	e	f	g	abcdmmmmmmmmmm	(0)efgtttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt Finite number, all 'legal' significands 0 .. 9999999999999999999999999999999999 fit here.
combination field starting with '11', but not 1111, bits ab = 11, bits cd = 00, 01 or 10
1	1	c	d	m	m	m	m	m	m	m	m	m	m	e	f	g	cdmmmmmmmmmmef	100gtttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt Theoretical case, all these signifiands are > 1.0384593717069655257060992658440191 × 10^34, thus > 10^34 - 1, 'illegal' and to be treated as zero.
combination field starting with '1111', bits abcd = 1111
1	1	1	1	0														±Infinity
1	1	1	1	1	0													quiet NaN
1	1	1	1	1	1													signaling NaN (with payload in significand)

In the above cases, the value represented is

(−1)^sign × 10^{exponent−6176} × significand

Densely packed decimal significand field

In this version, the significand is stored as a series of decimal digits. The leading digit is between 0 and 9 (3 or 4 binary bits), and the rest of the significand uses the densely packed decimal (DPD) encoding.

The encoding varies depending on whether the most significant 4 bits of the significand are in the range 0 to 7 (0000₂ to 0111₂), or higher (1000₂ or 1001₂).

2 bits of the exponent and the leading digit (3 or 4 bits) of the significand are combined into the five bits that follow the sign bit.

This twelve bits after that are the exponent continuation field, providing the less-significant bits of the exponent.

The last 110 bits are the significand continuation field, consisting of eleven 10-bit declets.^[3] Each declet encodes three decimal digits^[3] using the DPD encoding.

If the first two bits after the sign bit are "00", "01", or "10", then those are the leading bits of the exponent, and the three bits after that are interpreted as the leading decimal digit (0 to 7):

If the first two bits after the sign bit are "11", then the next two bits are the leading bits of the exponent, and the fifth bit is prefixed with "100" to form the leading decimal digit of the significand (8 or 9):

DPD Encoding
Combination Field																	Exponent	Significand / Description
m₁₆	m₁₅	m₁₄	m₁₃	m₁₂	m₁₁	m₁₀	m₉	m₈	m₇	m₆	m₅	m₄	m₃	m₂	m₁	m₀	Exponent	Significand / Description
combination field not! starting with '11', bits ab = 00, 01 or 10
a	b	c	d	e	m	m	m	m	m	m	m	m	m	m	m	m	abmmmmmmmmmmmm	(0)cde tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt Finite number with small first digit of significand (0 … 7).
combination field starting with '11', but not 1111, bits ab = 11, bits cd = 00, 01 or 10
1	1	c	d	e	m	m	m	m	m	m	m	m	m	m	m	m	cdmmmmmmmmmmmm	100e tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt tttttttttt Finite number with big first digit of significand (8 or 9).
combination field starting with '1111', bits abcd = 1111
1	1	1	1	0														±Infinity
1	1	1	1	1	0													quiet NaN
1	1	1	1	1	1													signaling NaN (with payload in significand)

The remaining two combinations (11110 and 11111) of the 5-bit field are used to represent ±infinity and NaNs, respectively.

The 10-bit DPD to 3-digit BCD transcoding for the declets is given by the following table. b₉ … b₀ are the bits of the DPD, and d₂ … d₀ are the three BCD digits. Be aware that the bit numbering used here for e.g. b₉ … b₀ is in opposite direction than that used in the paper for the IEEE 754 standard b₀ … b₉, add. the decimal digits are numbered 0-based here while in opposite direction and 1-based in the IEEE 754 paper. The bits on white background are not counting for the value, but signal how to understand / shift the other bits. The concept is to denote which digits are small (0 … 7) and encoded in three bits, and which are not, then calculated from a prefix of '100', and one bit specifying if 8 or 9.

Densely packed decimal encoding rules^[4]
Code space (1024 states)	b9	b8	b7	b6	b5	b4	b3	b2	b1	b0	d2	d1	d0	Values encoded	Description	Occurrences (1000 states)
DPD encoded value											Decimal digits
50.0% (512 states)	a	b	c	d	e	f	0	g	h	i	0abc	0def	0ghi	(0–7) (0–7) (0–7)	3 small digits	51.2% (512 states)
37.5% (384 states)	a	b	c	d	e	f	1	0	0	i	0abc	0def	100i	(0–7) (0–7) (8–9)	2 small digits, 1 large digit	38.4% (384 states)
	a	b	c	g	h	f	1	0	1	i	0abc	100f	0ghi	(0–7) (8–9) (0–7)
	g	h	c	d	e	f	1	1	0	i	100c	0def	0ghi	(8–9) (0–7) (0–7)
9.375% (96 states)	g	h	c	0	0	f	1	1	1	i	100c	100f	0ghi	(8–9) (8–9) (0–7)	1 small digit, 2 large digits	9.6% (96 states)
	d	e	c	0	1	f	1	1	1	i	100c	0def	100i	(8–9) (0–7) (8–9)
	a	b	c	1	0	f	1	1	1	i	0abc	100f	100i	(0–7) (8–9) (8–9)
3.125% (32 states, 8 used)	x	x	c	1	1	f	1	1	1	i	100c	100f	100i	(8–9) (8–9) (8–9)	3 large digits, b9, b8: don't care	0.8% (8 states)

The 8 decimal values whose digits are all 8s or 9s have four codings each. The bits marked x in the table above are ignored on input, but will always be 0 in computed results. (The 8 × 3 = 24 non-standard encodings fill the unused range from 10³ = 1000 to 2¹⁰ - 1 = 1023.)

In the above cases, with the true significand as the sequence of decimal digits decoded, the value represented is

(-1)^{\text{signbit}}\times 10^{{\text{exponentbits}}_{2}-6176_{10}}\times {\text{truesignificand}}_{10}

History

decimal128 was formally introduced in the 2008 revision of the IEEE 754 standard,^[5] which was taken over into the ISO/IEC/IEEE 60559:2011 standard.^[6]

Less important information, side effects of the encoding

DPD encoding is pretty efficient, not wasting more than about 2.4 percent of space vs. BID, because the 2¹⁰ = 1024 possible values in 10 bit is only little more than what is used to encode all numbers from 0 to 999.

Zero has 12288 possible representations (24576 when both signed zeros are included), (even many more if you account the 'illegal' significands which have to be treated as zeroes).

The gain in range and precision by the 'combination encoding' evolves because the taken 2 bits from the exponent only use three states, and the 4 MSBs of the significand stay within 0000 … 1001 (10 states). In total that is 3 × 10 = 30 possible values when combined in one encoding, which is representable in 5 bits (⁠ $2^{5}=32$ ⁠).

The decimalxxx formats include denormal values, for a graceful degradation of precision near zero, but in contrast to the binaryxxx formats they are not marked / do not need a special exponent, in decimal128 they are just values too small to have full34 digit precision even with the smallest exponent.

In the cases of Infinity and NaN, all other bits of the encoding are ignored. Thus, it is possible to initialize an array to Infinities or NaNs by filling it with a single byte value.

Addendum - code

try this to see performance of addition and tangent for bin64, bin128, dec64 and dec128 datatypes, compile and run instructions see in header.

The output consists of: clock cycles taken; iterations; result; expression tested .

// program to compare the performance of binary vs. decimal datatypes, 
// WIP, covering elementary operations, 
// requires 'libdfp' installed, 
// compile with:          'gcc     -I /usr/local/include/dfp -o decxxx_perf_sample.c -ldfp -lm -lquadmath' 
// or - optimized - with: 'gcc -O2 -I /usr/local/include/dfp -o decxxx_perf_sample.c -ldfp -lm -lquadmath' 
// run with: './dec128_perf_sample value_1 value_2 (count) 
// e.g. '     ./dec128_perf_sample 8.0 5.0E-16 1000' 
 
#define __STDC_WANT_IEC_60559_TYPES_EXT__
#define __STDC_WANT_DEC_FP__
#define __STDC_WANT_IEC_60559_DFP_EXT__
 
#include <fenv.h> 

#include <stdio.h> 									// reg. e.g. printf, 
#include <float.h>
#include <limits.h>
#include <math.h> 									// reg. e.g. pow, 
#include <stdlib.h> 									// reg. e.g. atof, 

#include <time.h> 									// reg. e.g. clock(), 
#include <locale.h> 									// reg. formatted print of integers, not yet sufficient, 
// #include <decimal.h> 									// reg. ??? 
#include <string.h> 									// reg. e.g. strcat, 
#include <quadmath.h> 									// reg. e.g. quadmath_snprintf, 
 
clock_t start1, end1; 

#define TIMEITcd( expr, N ) \
	start1 = clock(); \
	for( int i = 1; i < N; ++i ) \
	{ \
		expr; \
	} \
	end1 = clock(); \
	printf( "%07d; %d; %.18E; %s \n", end1 - start1, N, expr, #expr ) 
 
#define TIMEITcq( expr, N ) \
	start1 = clock(); \
	for( int i = 1; i < N; ++i ) \
	{ \
		expr; \
	} \
	end1 = clock(); \
	quadmath_snprintf( str, sizeof(str), "%.36QE", expr ); \
	printf( "%07d; %d; %s; %s \n", end1 - start1, N, str, #expr ) 
 
#define TIMEITcDD( expr, N ) \
	start1 = clock(); \
	for(int i = 1; i < N; ++i) \
	{ \
		expr; \
	} \
	end1 = clock(); \
	printf( "%07d; %d; %.17DE; %s \n", end1 - start1, N, expr, #expr ) 
 
#define TIMEITcDL( expr, N ) \
	start1 = clock(); \
	for(int i = 1; i < N; ++i) \
	{ \
		expr; \
	} \
	end1 = clock(); \
	printf( "%07d; %d; %.35DDE; %s \n", end1 - start1, N, expr, #expr ) 
 
int main( int argc, char *argv[] ) 
{ 
	fe_dec_setround( 4 ); 								// round ties away from zero for decimal datatypes, 
 
// 	setlocale(LC_ALL, "en_US"); 							// or any other locale that supports thousands separators
	volatile double x1d = 0.0, x2d = 0.0, x3d = 0.0; 
	volatile __float128 x1q = 0.0, x2q = 0.0, x3q = 0.0; 
	volatile _Decimal64 x1DD = 0.0DD, x2DD = 0.0DD, x3DD = 0.0DD; 
	volatile _Decimal128 x1DL = 0.0DL, x2DL = 0.0DL, x3DL = 0.0DL; 
	volatile int count = 1000000; 							// how many times to run the loop, 
	char str[ 45 ]; 
 
	if( argv[ 3 ] ) count = atoi( argv[ 3 ] ); 
 
	printf("add two values from command line arguments \n" ); 
	printf("no benefit from '-O2'? \n" ); 
	x1d = strtod( argv[ 1 ], NULL ); 
	x2d = strtod( argv[ 2 ], NULL ); 
	TIMEITcd( x1d = x1d + x2d, count ); 
	x1q = strtod128( argv[ 1 ], NULL ); 
	x2q = strtod128( argv[ 2 ], NULL ); 
	TIMEITcq( x1q = x1q + x2q, count ); 
	x1DD = strtod64( argv[ 1 ], NULL ); 
	x2DD = strtod64( argv[ 2 ], NULL ); 
	TIMEITcDD( x1DD = x1DD + x2DD, count ); 
	x1DL = strtod128( argv[ 1 ], NULL ); 
	x2DL = strtod128( argv[ 2 ], NULL ); 
	TIMEITcDL( x1DL = x1DL + x2DL, count ); 
	printf(" \n" ); 
 
	printf("tangent of value from command line argument \n" ); 
	printf("no benefit from '-O2'? \n" ); 
	x1d = strtod( argv[ 1 ], NULL ); 
	TIMEITcd( x3d = tan( x1d ), count ); 
	x1q = strtod128( argv[ 1 ], NULL ); 
	TIMEITcq( x3q = tanq( x1q ), count ); 
	x1DD = strtod64( argv[ 1 ], NULL ); 
	TIMEITcDD( x3DD = tand64( x1DD ), count ); 
	x1DL = strtod128( argv[ 1 ], NULL ); 
	TIMEITcDL( x3DL = tand128( x1DL ), count ); 
	printf(" \n" ); 
 
	return 0; 
}

References

^ Cowlishaw, Mike (2007). "Decimal Arithmetic FAQ – Part 1 – General Questions". speleotrove.com. IBM Corporation. Retrieved 2022-07-29.
^ 754-2019 - IEEE Standard for Floating-Point Arithmetic ( caution: paywall ). 2019. doi:10.1109/IEEESTD.2019.8766229. ISBN 978-1-5044-5924-2. Archived from the original on 2019-11-01. Retrieved 2019-10-24.
^ ^a ^b Muller, Jean-Michel; Brisebarre, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Stehlé, Damien; Torres, Serge (2010). Handbook of Floating-Point Arithmetic (1 ed.). Birkhäuser. doi:10.1007/978-0-8176-4705-6. ISBN 978-0-8176-4704-9. LCCN 2009939668.
^ Cowlishaw, Michael Frederic (2007-02-13) [2000-10-03]. "A Summary of Densely Packed Decimal encoding". IBM. Archived from the original on 2015-09-24. Retrieved 2016-02-07.
^ IEEE Computer Society (2008-08-29). IEEE Standard for Floating-Point Arithmetic. IEEE. doi:10.1109/IEEESTD.2008.4610935. ISBN 978-0-7381-5753-5. IEEE Std 754-2008.
^ "ISO/IEC/IEEE 60559:2011". 2011. Archived from the original on 2016-03-04. Retrieved 2016-02-08.

[1] Cowlishaw, Mike (2007). "Decimal Arithmetic FAQ – Part 1 – General Questions". speleotrove.com. IBM Corporation. Retrieved 2022-07-29.

[2] 754-2019 - IEEE Standard for Floating-Point Arithmetic ( caution: paywall ). 2019. doi:10.1109/IEEESTD.2019.8766229. ISBN 978-1-5044-5924-2. Archived from the original on 2019-11-01. Retrieved 2019-10-24.

[Muller_2010-3] Muller, Jean-Michel; Brisebarre, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Stehlé, Damien; Torres, Serge (2010). Handbook of Floating-Point Arithmetic (1 ed.). Birkhäuser. doi:10.1007/978-0-8176-4705-6. ISBN 978-0-8176-4704-9. LCCN 2009939668.

[Cowlishaw_2000-4] Cowlishaw, Michael Frederic (2007-02-13) [2000-10-03]. "A Summary of Densely Packed Decimal encoding". IBM. Archived from the original on 2015-09-24. Retrieved 2016-02-07.

[IEEE-754_2008-5] IEEE Computer Society (2008-08-29). IEEE Standard for Floating-Point Arithmetic. IEEE. doi:10.1109/IEEESTD.2008.4610935. ISBN 978-0-7381-5753-5. IEEE Std 754-2008.

[ISO-60559_2011-6] "ISO/IEC/IEEE 60559:2011". 2011. Archived from the original on 2016-03-04. Retrieved 2016-02-08.

[1]

[2]

[3]

[4]

[5]

[6]