Computer Architecture: The Essentials -- Data & Program Representation (part 2)



Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software

Input Devices
| Data Loggers + Recorders | Books | Links + Resources


AMAZON multi-meters discounts AMAZON oscilloscope discounts


<<PREV.

13. Signed Binary Integers

The positional representation described in Section 3.5 has no provision for negative numbers. To accommodate negative numbers, we need an alternative. Three interpretations have been used:

Sign Magnitude. With sign-magnitude representation, bits of an integer are divided into a sign bit (1 if the number is negative and 0 otherwise) and a set of bits that gives the absolute value (i.e., the magnitude) of the integer. The magnitude field follows the positional representation illustrated in FIG. 2.

One's Complement. The set of bits is interpreted as a single field.

A positive integer uses the positional representation illustrated in FIG. 2 with the restriction that for an integer of k bits, the maximum positive value is 2k-1. To form a negative of any value, invert each bit (i.e., change from 0 to 1 or vice versa). The most significant bit tells the sign of the integer (1 for negative integers, 0 otherwise).

Two's Complement. The set of bits is interpreted as a single field.

A positive integer uses the positional representation illustrated in FIG. 2 with the restriction that for an integer of k bits, the maximum positive value is 2k-1 -1. Thus, positive integers have the same representation as in one's complement. To form a negative number, start with a positive number, subtract one, and then invert each bit. As with one's complement, the most significant bit tells the sign of the integer (1 for negative integers, 0 other wise).

Each interpretation has interesting quirks. For example, the sign-magnitude interpretation makes it possible to create a value of negative zero, even though the concept does not correspond to a valid mathematical concept. The one's complement interpretation provides two values for zero: all zero bits and the complement, all one bits.

Finally, the two's complement interpretation includes one more negative value than positive values (to accommodate zero).

Which interpretation is best? Programmers can debate the issue because each interpretation works well in some cases. However, programmers cannot choose because computer architects make the decision and build hardware accordingly. Each of the three representations has been used in at least one computer. Many hardware architectures use the two's complement scheme. There are two reasons. First, two's complement makes it possible to build low-cost, high-speed hardware to perform arithmetic operations. Second, as the next section explains, hardware for two's complement arithmetic can also handle unsigned arithmetic.

14. An Example Of Two's Complement Numbers

We said that k bits can represent 2k possible combinations. Unlike the unsigned representation in which the combinations correspond to a continuous set of integers starting at zero, two's complement divides the combinations in half. Each combination in the first half (zero through 2^k-1 -1) is assigned the same value as in the unsigned representation. Combinations in the second half, each of which has the high-order bit equal to one, correspond to negative integers. Thus, at exactly one-half of the way through the possible combinations, the value changes from the largest possible positive integer to the negative integer with the largest absolute value.

An example will clarify the two's complement assignment. To keep the example small, we will consider a four-bit integer. FIG. 9 lists the sixteen possible bit combinations and the decimal equivalent when using unsigned, sign magnitude, one's complement, and two's complement representations.

FIG. 9 The decimal value assigned to each combination of four bits when using unsigned, sign-magnitude, one's complement, and two's complement interpretations.

As noted above, unsigned and two's complement have the advantage that except for overflow, the same hardware operations work for both representations. For example, adding one to the binary value 1001 produces 1010. In the unsigned interpretation, adding one to nine produces ten; in the two's complement interpretation, adding one to negative seven produces negative six.

The important point is:

A computer can use a single hardware circuit to provide unsigned or two's complement integer arithmetic; software running on the computer can choose an interpretation for each integer.

15. Sign Extension

Although FIG. 9 shows four-bit binary strings, the ideas can be extended to an arbitrary number of bits. Many computers include hardware for multiple integer sizes (e.g., a single computer can offer sixteen bit, thirty-two bit, and sixty-four bit representations), and allow a programmer to choose one of the sizes for each integer data item.

If a computer does contain multiple sizes of integers, a situation can arise in which a value is copied from a smaller-size integer to a larger-size integer. For example, consider copying a value from a sixteen-bit integer to a thirty-two-bit integer. What should be placed in the extra bits? In two's complement, the solution consists of copying the least significant bits and then extending the sign bit -- if the original value is positive, extending the high-order bit fills the most significant bits of the larger number with zeros; if the original value is negative, extending the high-order bit fills the most significant bits of the larger number with ones. In either case, the integer with more bits will be interpreted to have the same numeric value as the integer with fewer bits†.

We can summarize:

Sign extension: in two's complement arithmetic, when an integer Q composed of k bits is copied to an integer of more than k bits, the additional high-order bits are made equal to the top bit of Q. Extending the sign bit ensures that the numeric value of the two will be the same if each is interpreted as a two's complement value.

Because two's complement hardware gives correct results when performing arithmetic operations on unsigned values, it may seem that software could use the hardware to support all unsigned operations. However, sign extension provides an exception to the rule: the hardware will always perform sign extension, which may have unexpected results. For example, if an unsigned integer is copied to a larger unsigned integer, the copy will not have the same numeric value if the high-order bit is 1. The point is:

Because two's complement hardware performs sign extension, copying an unsigned integer to a larger unsigned integer can change the value.

†Because division and multiplication by powers of two can be implemented with shift operations, sign extension occurs during a right-shift operation, which results in the correct value. Thus, shifting integer -14 right one bit results in -7, and shifting integer 14 right one bit results in 7.

16. Floating Point

In addition to hardware that performs signed and unsigned integer arithmetic, general-purpose computers provide hardware that performs arithmetic on floating point values. Floating point representation used in computers derives from scientific notation in which each value is represented by a mantissa and an exponent. For example, scientific notation expresses the value -12345 as -1.2345 × 10^4. Similarly, a chemist might write a well-known constant, such as Avogadro's number, as:

6.023 × 10^23

Unlike conventional scientific notation, the floating point representation used in computers is based on binary. Thus, a floating point value consists of a bit string that is divided into three fields: a bit to store the sign, a group of bits that stores a mantissa, and a third group of bits that stores an exponent. Unlike conventional scientific notation, everything in floating point is based on powers of two. For example, the mantissa uses a binary positional representation to store a value, and the exponent is an integer that specifies a power of 2 rather than a power of 10. In scientific notation, we think of an exponent as specifying how many digits to shift the decimal point; in floating point, the exponent specifies how many bits to shift the binary point.

To further optimize space, many floating point representations include optimizations:

- The value is normalized.

- The most significant bit of the mantissa is implicit.

-The exponent is biased to simplify magnitude comparison.

The first two optimizations are related. A floating point number is normalized by adjusting the exponent to eliminate leading zeros from the mantissa. In decimal, for example, 0.003×10^4 can be normalized to 3×10^1. Interestingly, normalizing a binary floating point number always produces a leading bit of 1 (except in the special case of the number zero). Therefore, to increase the number of bits of precision in the mantissa, floating point representations do not need to store the most significant bit of the mantissa when a value is stored in memory. Instead, when a floating point number computation is required, the hardware concatenates a 1 bit onto the mantissa.

An example will clarify the concepts. The example we will use is IEEE† standard 754, which is widely used in the computer industry. The standard specifies both single precision and double precision numbers. According to the standard, a single precision value occupies thirty-two bits, and a double precision value occupies sixty-four bits.

FIG. 10 illustrates how the IEEE standard divides a floating point number into three fields.

†IEEE stands for Institute of Electrical and Electronics Engineers, an organization that creates standards used in electronic digital systems.


FIG. 10 The format of (a) a single precision and (b) a double precision floating point number according to IEEE Standard 754, with the lowest bit in each field labeled. Fields consist of a sign bit, exponent, and mantissa.

Bit numbering in the figure follows the IEEE standard, in which the least significant bit is assigned bit number zero. In single precision, for example, the twenty-three rightmost bits, which constitute a mantissa, are numbered zero through twenty-two.

The next eight bits, which constitute an exponent, are numbered twenty-three through thirty, and the most significant bit, which contains a sign, is bit number thirty-one. For double precision, the mantissa occupies fifty-two bits and the exponent occupies eleven bits.

17. Range Of IEEE Floating Point Values

The IEEE standard for single precision floating point allows normalized values in which the exponent ranges from negative one hundred twenty-six through one hundred twenty-seven. Thus, the approximate range of values that can be represented is:

2^-126 to 2^127 which, in decimal, is approximately:

10^-38 to 10^38

The IEEE standard for double precision floating point provides an enormously larger range than single precision. The range is:

2^-1022 to 2^1023 which, in decimal, is approximately:

10^-308 to 10^308

To make magnitude comparison fast, the IEEE standard specifies that an exponent field stores the exponent (a power of two) plus a bias constant. The bias constant used with single precision is one hundred twenty-seven, and the bias constant used with double precision is one thousand twenty-three†. For example, to store an exponent of three, the exponent field in a single precision value is assigned the value one hundred thirty, and an exponent of negative five is represented by one hundred twenty-two.

As an example of floating point, consider how 6.5 is represented. In binary, 6 is 110, and .5 is a single bit following the binary point, giving us 110.1 (binary). If we use binary scientific notation and normalize the value, 6.5 can be expressed:

1.101 × 22

To express the value as an IEEE single precision floating point number, the sign bit is zero, and the exponent must be biased by 127, making it 129. In binary, 129 is:

10000001

To understand the value in the mantissa, recall that the leading 1 bit is not stored, which means that instead of 1101 followed by zeros, the mantissa is stored as:

10100000000000000000000

FIG. 11 shows how the fields combine to form a single-precision IEEE floating point representation of 6.5.


FIG. 11 The value 6.5 (decimal) represented as a single-precision IEEE floating point constant.

18. Special Values

Like most floating point representations, the IEEE standard follows the implicit leading bit assumption -- a mantissa is assumed to have a leading one bit that is not stored. Of course, any representation that strictly enforces the assumption of a leading one bit is useless because the representation cannot store the value zero. To handle zero, the IEEE standard makes an exception -- when all bits are zero, the implicit assumption is ignored, and the stored value is taken to be zero.

The IEEE standard includes two other special values that are reserved to represent positive and negative infinity: the exponent contains all ones and the mantissa contains all zeros. The point of including values for infinity is that some digital systems do not

†The bias constant is always 2^k-1 -- 1, where k is the number of bits in the exponent field.

have facilities to handle errors such as arithmetic overflow. On such systems, it is important that a value be reserved so that the software can determine that a floating point operation failed.

19. Binary Coded Decimal Representation

Most computers employ the binary representations for integers and floating point numbers described above. Because the underlying hardware uses digital logic, binary digits of 0 and 1 map directly onto the hardware. As a result, hardware can compute binary arithmetic efficiently and all combinations of bits are valid. However, two disadvantages arise from the use of binary representations. First, the range of values is a power of two rather than a power of ten (e.g., the range of an unsigned 32-bit integer is zero to 4,294,967,295). Second, floating point values are rounded to binary fractions rather than decimal fractions.

The use of binary fractions has some unintended consequences, and their use does not suffice for all computations. For example, consider a bank account that stores U.S. dollars and cents. We usually represent cents as hundredths of dollars, writing 5.23 to denote five dollars and 23 cents. Surprisingly, one hundredth (i.e., one cent) cannot be represented exactly as a binary floating point number because it turns into a repeating binary fraction. Therefore, if binary floating point arithmetic is used for bank accounts, individual pennies are rounded, making the totals inaccurate. In a scientific sense, the inaccuracy is bounded, but humans demand that banks keep accurate records -- they become upset if a bank preserves significant digits of their account but loses pennies.

To accommodate banking and other computations where decimal is required, a Binary Coded Decimal (BCD) representation is used. Some computers (notably on IBM mainframes) have hardware to support BCD arithmetic; on other computers, software performs all arithmetic operations on BCD values.

Although a variety of BCD formats have been used, the essence is always the same: a value is represented as a string of decimal digits. The simplest case consists of a character string in which each byte contains the character for a single digit. However, the use of character strings makes computation inefficient and takes more space than needed. As an example, if a computer uses the ASCII character set, the integer 123456 is stored as six bytes with values†:

0x31 0x32 0x33 0x34 0x35 0x36

If a character format is used, each ASCII character (e.g., 0x31) must be converted to an equivalent binary value (e.g., 0x01) before arithmetic can be performed. Further more, once an operation has been performed, the digits of the result must be converted from binary back to the character format. To make computation more efficient, modern BCD systems represent digits in binary rather than as characters. Thus, 123456 could be represented as:

0x01 0x02 0x03 0x04 0x05 0x06

†Although our examples use ASCII, BCD is typically used on IBM computers that employ the EBCDIC character set.

Although the use of a binary representation has the advantage of making arithmetic faster, it also has a disadvantage: a BCD value must be converted to character format before it can be displayed or printed. The general idea is that because arithmetic is per formed more frequently than I/O, keeping a binary form will improve overall performance.

20. Signed, Fractional, and Packed BCD Representations

Our description of BCD omits many details found in commercial systems. For ex ample, an implementation may limit the size of a BCD value. To handle fractions, BCD must either include an explicit decimal point or the representation must specify the location of the decimal point. Furthermore, to handle signed arithmetic, a BCD representation must include a sign. Interestingly, one of the most widely used BCD conventions places the sign byte at the right-hand end of the BCD string. Thus -123456 might be represented by the sequence:

0x01 0x02 0x03 0x04 0x05 0x06 0x2D

where 0x2D is a value used to indicate a minus sign. The advantage of placing the sign on the right arises because no scanning is required when arithmetic is performed -- all bytes except the last byte of the string correspond to decimal digits.

The final detail used with BCD encodings arises from the observation that using a byte for each digit is inefficient. Each digit only requires four bits, so placing one digit in each eight-bit byte wastes half of each byte. To reduce the storage space needed for BCD, a packed representation is used in which each digit occupies a nibble (i.e., four bits). With a packed version of BCD, the integer -123456 can be represented in four bytes:

0x01 0x23 0x45 0x6D

where the last nibble contains the value 0xD to indicate that the number is negative†.

21. Data Aggregates

So far, we have only considered the representation for individual data items such as characters, integers, or floating point numbers. Most programming languages allow a programmer to specify aggregate data structures that contain multiple data items, such as arrays, records, or structs. How are such values stored? In general, an aggregate value occupies contiguous bytes. Thus, on a computer that uses an eight-bit byte, a data aggregate that consists of three sixteen-bit integers occupies six contiguous bytes as FIG. 12 illustrates.

†To aid with BCD arithmetic, the x86 architecture has a condition code bit that indicates whether 4-bit addition overflows.


FIG. 12 A data aggregate consisting of three sixteen-bit integers arranged in successive bytes of memory numbered 0 through 5.

We will see later that some memory systems do not permit arbitrary data types to be contiguous. Thus, we will reconsider data aggregates when we discuss memory architecture.

22. Program Representation

Modern computers are classified as stored program computers because programs as well as data are placed in memory. We will discuss program representation and storage in the next sections, including the structure of instructions the computer under stands and their storage in memory. For now, it is sufficient to understand that each computer defines a specific set of operations and a format in which each is stored. On some computers, for example, each instruction is the same size as other instructions; on other computers, the instruction size varies. We will see that on a typical computer, an instruction occupies multiple bytes. Thus, the bit and byte numbering schemes that the computer uses for data values also apply to instructions.

23. Summary

The underlying digital hardware has two possible values, logical 0 and logical 1.

We think of the two values as defining a bit (binary digit), and use bits to represent data and programs. Each computer defines a byte size, and most current systems use eight bits per byte.

A set of bits can be used to represent a character from the computer's character set, an unsigned integer, a single or double precision floating point value, or a computer program. Representations are chosen carefully to maximize the flexibility and speed of the hardware while keeping the cost low. The two's complement representation for signed integers is particularly popular because a single piece of hardware can be constructed that performs operations on either two's complement integers or unsigned integers. In cases where decimal arithmetic is required, computers use Binary Coded Decimal values in which a number is represented by a string that specifies individual decimal digits.

Organizations, such as ANSI and IEEE, have created standards for representation; such standards allow hardware manufactured by two separate organizations to interoperate and exchange data.

QUIZ

1. Give a mathematical proof that a string of k bits can represent 2k possible values (hint: argue by induction on the number of bits).

2. What is the value of the following binary string in hexadecimal?

11011110101011011011111011101111

3. Write a computer program that determines whether the computer on which it is running uses big endian or little endian representation for integers.

4. Write a computer program that prints a string of zeros and ones that represents the bits of an integer. Place a blank between each bit, and add an extra space after every four bits.

5. Write a computer program that determines whether the computer on which it is running uses one's complement, two's complement, or (possibly) some other representation for signed integers.

6. Write a computer program that determines whether the computer on which it is running uses the ASCII or EBCDIC character set.

7. Write a computer program that takes a set of integers as input and for each integer prints the two's complement, one's complement, and sign-magnitude representation of the integer.

8. Write a C program that prints a table of all possible eight-bit binary values and the two's complement interpretation of each.

9. Write a computer program that adds one to the largest possible positive integer and uses the result to determine whether the computer implements two's complement arithmetic.

10. Write a computer program to display the value of a byte in hexadecimal, and apply the pro gram to an array of bytes. Add an extra space after every four bytes to make the output easier to read.

11. Extend the hexadecimal dump program in the previous exercise to also print the character representation of any printable character. For characters that do not have a printable representation, arrange for the program to print a period.

12. A programmer computes the sum of two unsigned 32-bit integers. Can the resulting sum be less than either of the two values? Explain.

13. Suppose you are given a computer with hardware that can only perform 32-bit arithmetic, and are asked to create functions that add and subtract 64-bit integers. How can you per form 64-bit computations with 32-bit hardware? (To simplify the problem, limit your answer to unsigned arithmetic.)

14. The C Programming language allows a programmer to specify constants in decimal, binary, hexadecimal, and octal. Write a program that declares 0, 5, 65, 128, and -1 and -256 in decimal, binary, hexadecimal, and octal, and uses printf to show that the values are correct. Which is the easiest representation?

15. Create a form of Binary Coded Decimal similar to the one described in the text, and write a computer program that uses the form to add two arbitrary length integers.

16. Extend the previous program to include multiplication.

17. The financial industry uses a "bankers" rounding algorithm. Read about the algorithm, and implement a program that uses decimal arithmetic to compute the sum of the two decimal values with both bankers rounding and conventional rounding.

PREV. | NEXT

Related Articles -- Top of Page -- Home

Updated: Monday, March 27, 2017 11:45 PST