Integer (computer science)
The integral data types (so called because they are most frequently used to represent integers) of computing generally consist of some number of bits (usually a power of two) treated as a unit of storage or manipulation. Bit is derived from the term Binary digIT, and represents the fundamental unit of computer storage--0 or 1, on or off. Everything else is just a bunch-o-bits.
The table below lists data types recognized by common processors. Additional data types, such as bit-fields and extended-precision integers, found in high level programming languages are not discussed here. Following the table are additional usage notes, then details on number representation.
See also: real data type
bits | name | comments | |
1 | bit | status, Boolean flag | |
4 | nibble, nybble | humorously derived half a byte; usually a single BCD digit | |
8 | byte, octet | small integers, characters | |
16 | word | larger integers, pointers | |
32 | longword | usually shortened to long; larger integers, pointers | |
64 | quadword, long long | larger integers, pointers | |
80 | tenbyte | Intel-specific, probably should be in floating point article? | |
128 | octword | VMS internal date/time format | |
In addition to their interpretation as sizes of numerical values, three terms (bit, byte, and word) have other common usages. In particular, word was originally used to indicate the "most efficient size" of data for a processor--typically the size of its internal registers. Thus various families, or different models within families, of processors had different-sized words-- 8-, 12-, 16-, 32-, 36-, 60- and 64-bit words have all been used. Machines also exist with 9-bit words, and may use the term "byte" for them.
The term "octet" can be used for more clarity, and always refers to eight bits.
Popular usage has narrowed (sorry) the usual meaning of word to 16-bits, unless the context indicates
otherwise. The other terms are typically used only when the content is to be interpreted numerically.
Telecommunications or network traffic volume is usually described in terms of bits per second. For example, a 56Kb modem is capable of transferring data at 56 kilobits/second; Ethernet transfers data at speeds ranging from 10 megabits/second to 1000 megabits/second.
A byte, usually called an octet in a networking context, is used to specify the size or amount of computer memory or storage, regardless of the type of data represented. For example, a 50 byte text string, 100 KB (kilobytes) files, 128 MB (megabytes) of RAM, or 30 GB (gigabytes) of disk storage.
Pointer is a generic term used to indicate an integral value (or a structure thereof) that is used to specify ("point to") a location (address) in memory.
complement, one's-complement, two's-complement
Complementing a binary number simply means changing all the 0s to 1s and all the 1s to 0s,
nothing more.
A byte, holding 8 bits, can represent the values 00000000 (0) to 11111111 (25510), if all bits
are used to represent the magnitude of the number. This is called an unsigned integer.
To represent both positive and negative (signed) integers, the convention is that the
most significant bit (MSB) of the binary representation of the number will be used to
indicate the sign of the number, rather than contributing to its magnitude. With only seven
bits, the magnitude can range from 0000000 (0) to 1111111 (127). The MSB is set to 0
for a positive number and 1 for a negative number. Thus you can represent numbers from
-12710 to +12710.
However, negative integers aren't just a sign and an independent magnitude. Two conventions
are used to convert a positive integer to its negative counterpart.
The one's-complement (OC) representation of a negative number is created by taking the
complement of its positive representation. For example, negated 00101011 (43) becomes 11010100 (-43).
(Notice that the lower seven bits could be interpreted as a magnitude of 84, but that's not
the convention.)
In One's Complement (OC), there are two ways to represent zero: 00000000 (+0) and 11111111 (-0). To avoid this, and to also make integer addition simpler, the two's-complement (TC) representation is the one generally used. The Two's Complement (TC) representation is created by first complementing the positive number, then adding 1 to it. Thus 00101011 (43) becomes 11010101 (-43).
In TC, there is only one zero (00000000). Negating a negative number involves the same operation: complementing, then adding 1. The pattern 11111111 now represents -110 and 10000000 represents -12810;
that is, the range of TC integers is -12810 to +12710.
To add two TC integers, treat them as unsigned numbers, add them, and ignore any potentical carry over. The
result will be the correct TC number, unless both summands were positive and the result is negative or both summands were negative and the result is non-negative. The latter cases are refered to as "overflow" or "wrap around"; the addition cannot be carried out in 8 bit TC in these cases. For example:
00101011 (+43) 11010101 (-43) 00101011 (+43) 10011010 (-101)
+ 11010101 (-43) + 11100011 (-29) + 11100011 (-29) + 10110001 (- 79)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
00000000 ( 0) 10111000 (-72) 00001110 (+14) 01001011 (overflow)
endian, big-endian, little-endian, network byte order
When an integer is represented with multiple bytes, the actual ordering of those bytes in memory,
or the sequence in which they are transmitted over some medium, is subject to convention.
This is similar to the situation in written languages, where some are written left-to-right, while
others are written right-to-left.
Using a 4-byte integer, written as "ABCD", where A is the most significant byte and D is least significant byte, big-endian convention would store the number
in successive memory locations as A (lowest address), then B, then C, finally D, while little-endian convention would store the bytes in D-C-B-A order.
Network byte order is, by convention, sending the bytes in the order A, then B, etc., onto
the medium. It is the responsibility for the transmitting and receiving systems to convert, if
necessary, to their internal endian format.
Processor families that use big-endian storage: Motorola, IBM 370
Processor families that use little-endian format: Intel, VAX
Processor families that use either (determined by software): MIPS, Alpha
The PDP family of processors, which were word- rather than byte-addressable, used the
unusual pattern of B-A-D-C (that is, byte-swap within words).
The term big-endian is derived from the Big-Endians of Jonathan Swift's
See also : Kilobyte, Megabyte,Gigabyte,Terabyte,Petabyte,Exabyte,Zettabyte, Yottabyte