Computer Architecture: The Essentials -- Data & Program Representation (part 1)

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

The previous section introduces digital logic, and describes basic hardware building blocks that are used to create digital systems. This section continues the discussion of fundamentals by explaining how digital systems use binary representations to encode programs and data. We will see that representation is important for programmers as well as for hardware engineers because software must understand the format that the underlying hardware uses, and the format affects the speed with which the hardware can perform operations, such as addition.

2. Digital Logic And The Importance Of Abstraction

As we have seen, digital logic circuits contain many low-level details. The circuits use transistors and electrical voltage to perform basic operations. The main point of digital logic, however, is abstraction -- we want to hide the underlying details and use high-level abstractions whenever possible. For example, we have seen that each input or output of a 7400-series digital logic chip is restricted to two possible conditions: zero volts or five volts. When computer architects use logic gates to design computers, how ever, they do not think about such details. Instead, they use abstract designations of logical 0 and logical 1 from Boolean algebra. Abstracting means that complex digital systems, such as memories and processors, can be described without thinking about individual transistors or voltages. More important, abstraction means that a design can be used with a battery-operated device, such as a smart phone, that uses lower voltages to reduce power consumption.

To a programmer, the most important abstractions are the items visible to software:

the representations used for data and programs. The next sections consider data representation, and discuss how it is visible to programs; later sections describe how instructions are represented.

3. Definitions Of Bit And Byte

All data representation builds on digital logic. We use the abstraction binary digit (bit) to describe a digital entity that can have two possible values, and assign the mathematical names 0 and 1 for the two values.

Multiple bits are used to represent more complex data items. For example, each computer system defines a byte to be the smallest data item larger than a bit that the hardware can manipulate.

How big is a byte? The size of a byte is not standard across all computer systems.

Instead, the size is chosen by the architect who designs the computer. Early computer designers experimented with a variety of byte sizes, and some special-purpose computers still use unusual byte sizes. For example, an early computer manufactured by CDC corporation used a six-bit byte, and a computer manufactured by BB&N used a ten-bit byte. However, most modern computer systems define a byte to contain eight bits -- the size has become so widely accepted that engineers usually assume a byte size equal to eight bits, unless told otherwise. The point is:

Although computers have been designed with other size bytes, current computer industry practice defines a byte to contain eight bits.

4. Byte Size And Possible Values

The number of bits per byte is especially important to programmers because memory is organized as a sequence of bytes. The size of the byte determines the maximum numerical value that can be stored in one byte. A byte that contains k bits can represent one of 2k values (i.e., exactly 2k unique strings of 1s and 0s exist that have length k). Thus, a six-bit byte can represent 64 possible values, and an eight-bit byte can represent 256 possible values. As an example, consider the eight possible combinations that can be achieved with three bits. FIG. 1 illustrates the combinations.

000 001

010 011

100 101

110 111

FIG. 1 The eight unique combinations that can be assigned to three bits.

What does a given pattern of bits represent? The most important thing to under stand is that the bits themselves have no intrinsic meaning -- the interpretation of the value is determined by the way hardware and software use the bits. For example, a string of bits could represent an alphabetic character, a string of characters, an integer, a floating point number, an audio recording (e.g., a song), a video, or a computer pro gram.

In addition to items a computer programmer understands, computer hardware can be designed in which a set of bits can represent the status of three peripheral devices.

For example:

- The first bit has the value 1 if a keyboard is connected.

- The second bit has the value 1 if a camera is connected.

- The third bit has the value 1 if a printer is connected.

Alternatively, hardware can be designed in which a set of three bits represent the current status of three pushbutton switches: the ith bit is 1 if a user is currently pushing switch i. The point is:

Bits have no intrinsic meaning -- all meaning is imposed by the way bits are interpreted.

5. Binary Weighted Positional Representation

One of the most common abstractions used to associate a meaning with each combination of bits interprets them as a numeric value. For example, an integer interpretation is taken from mathematics: bits are values in a positional number system that uses base two. To understand the interpretation, remember that in base ten, the possible digits are 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, each position represents a power of 10, and the number 123 represents 1 times 102 plus 2 times 101 plus 3 times 100. In the binary system, the possible digits are 0 and 1, and each bit position represents a power of two.

That is, the positions represent successive powers of two: 2 0 ,2 1 ,2 2, and so on. FIG. 2 illustrates the positional concept for binary numbers.

FIG. 2 The value associated with each of the first six bit positions when using a positional interpretation in base two.

As an example, consider the binary number:

010101

According to the figure, the value can be interpreted as:

010101 = 0 × 25

+ 1 × 24

+ 0 × 23

+ 1 × 22

+ 0 × 21

+ 1 × 20

=21

We will discuss more about specific forms of integer representation (including negative numbers) later in the section. For now, it is sufficient to observe an important consequence of conventional positional notation: the binary numbers that can be represented in k bits start at zero instead of one. If we use the positional interpretation illustrated in FIG. 2, the binary numbers that can be represented with three bits range from zero through seven. Similarly, the binary numbers that can be represented with eight bits range from zero through two hundred fifty-five. We can summarize:

A set of k bits can be interpreted to represent a binary integer. When conventional positional notation is used, the values that can be represented with k bits range from 0 through 2k - 1.

Because it is an essential skill in the design of both software and hardware, anyone working in those fields should know the basics. FIG. 3 lists the decimal equivalents of binary numbers that hardware and software designers should know. The table includes entries for 232 and 264 (an incredibly large number). Although smaller values in the table should be memorized, hardware and software designers only need to know the order of magnitude of the larger entries. Fortunately, is it easy to remember that 232 contains ten decimal digits and 264 contains twenty.

6. Bit Ordering

The positional notation in FIG. 2 may seem obvious. After all, when writing decimal numbers, we always write the least significant digit on the right and the most significant digit on the left. Therefore, when writing binary, it makes sense to write the Least Significant Bit (LSB) on the right and the Most Significant Bit (MSB) on the left.

When digital logic is used to store an integer, however, the concepts of "right" and

"left" no longer make sense. Therefore, a computer architect must specify exactly how bits are stored, and which are the least and most significant.

The idea of bit ordering is especially important when bits are transferred from one location to another. For example, when a numeric value is moved between a register and memory, the bit ordering must be preserved. Similarly, when sending data across a network, the sender and receiver must agree on the bit ordering. That is, the two ends must agree whether the LSB or the MSB will be sent first.

FIG. 3 Decimal values for commonly used powers of two.

7. Hexadecimal Notation

Although a binary number can be translated to an equivalent decimal number, programmers and engineers sometimes find the decimal equivalent difficult to understand.

For example, if a programmer needs to test the fifth bit from the right, using the binary constant 010000 makes the correspondence between the constant and the bit much clearer than the equivalent decimal constant 16.

Unfortunately, long strings of bits are as unwieldy and difficult to understand as a decimal equivalent. For example, to determine whether the sixteenth bit is set in the following binary number, a human needs to count individual bits:

11011110110010010000100101001001

To aid humans in expressing binary values, a compromise has been reached: a positional numbering system with a larger base. If the base is chosen to be a power of two, translation to binary is trivial. Base eight (known as octal) has been used, but base sixteen (known as hexadecimal) has become especially popular.

Hexadecimal representation offers two advantages. First, because the representation is substantially more compact than binary, the resulting strings are shorter. Second, because sixteen is a power of two, conversion between binary and hexadecimal is straightforward and does not involve a complex arithmetic calculation (i.e., a human can perform the transformation easily and quickly, without the need for a calculator or other tools).

In essence, hexadecimal encodes each group of four bits as a single hex† digit between zero and fifteen. FIG. 4 lists the sixteen hex digits along with the binary and decimal equivalent of each. The figure and the examples that follow use uppercase letters A through F to represent hex digits above nine. Some programmers and some programming languages use lowercase letters a through f instead; the distinction is unimportant and programmers should be prepared to use either form.

FIG. 4 The sixteen hexadecimal digits and their equivalent binary and decimal values. Each hex digit encodes four bits of a binary value.

As an example of hexadecimal encoding, FIG. 5 illustrates how a binary string corresponds to its hexadecimal equivalent.

FIG. 5 Illustration of the relationship between binary and hexadecimal.

Each hex digit represents four bits.

†Programmers use the term hex as an abbreviation for hexadecimal.

8. Notation For Hexadecimal And Binary Constants

Because the digits used in binary, decimal, and hexadecimal number systems over lap, constants can be ambiguous. To solve the ambiguity, an alternate notation is need ed. Mathematicians and some textbooks add a subscript to denote a base other than ten (e.g., 13516 specifies that the constant is hexadecimal). Computer architects and programmers tend to follow programming language notation: hex constants begin with pre fix 0x, and binary constants begin with prefix 0b. Thus, to denote 13516 , a programmer writes 0x135. Similarly, the 32-bit constant from FIG. 5 is written:

0xDEC90949

9. Character Sets

We said that bits have no intrinsic meaning, and that the hardware or software must determine what each bit represents. More important, more than one interpretation can be used -- a set of bits can be created and used with one interpretation and later used with another.

As an example, consider character data that has both a numeric and symbolic interpretation. Each computer system defines a character set† to be a set of symbols that the computer and I/O devices agree to use. A typical character set contains uppercase and lowercase letters, digits, and punctuation marks. More important, computer architects often choose a character set such that each character fits into a byte (i.e., each of the bit patterns in a byte is assigned one character). Thus, a computer that uses an eight-bit byte has two hundred fifty-six (28) characters in its character set, and a computer that uses a six-bit byte has sixty-four (26) characters. In fact, the relationship between the byte size and the character set is so strong that many programming languages refer to a byte as a character.

What bit values are used to encode each character? The computer architect must decide. In the 1960s, for example, IBM Corporation chose the Extended Binary Coded Decimal Interchange Code (EBCDIC) representation as the character set used on IBM computers. CDC Corporation chose a six-bit character set for use on their computers.

The two character sets were completely incompatible.

As a practical matter, computer systems connect to devices such as keyboards, printers, or modems, and such devices are often built by separate companies. To inter operate correctly, peripheral devices and computer systems must agree on which bit pat tern corresponds to a given symbolic character. To help vendors build compatible equipment, the American National Standards Institute (ANSI) defined a character representation known as the American Standard Code for Information Interchange (ASCII). The ASCII character set specifies the representation of one hundred twenty eight characters, including the usual letters, digits, and punctuation marks; additional values in an eight-bit byte can be assigned for special symbols. The standard is widely accepted.

†Names of character sets are pronounced, not spelled out. For example, EBCDIC is pronounced ebb'se dick, and ASCII is pronounced ass'key.

FIG. 6 lists the ASCII representation of characters by giving a hexadecimal value and the corresponding symbolic character. Of course, the hexadecimal notation is merely a shorthand notation for a binary string. For example, the lowercase letter a has hexadecimal value 0x61, which corresponds to the binary value 0b01100001.

FIG. 6 The ASCII character set. Each entry shows a hexadecimal value and the graphical representation for printable characters and the meaning for others.

We said that a conventional computer uses eight-bit bytes, and that ASCII defines one hundred twenty-eight characters (i.e., a seven-bit character set). Thus, when ASCII is used on a conventional computer, one-half of the byte values are unassigned (decimal values 128 through 255). How are the additional values used? In some cases, they are not -- peripheral devices that accept or deliver characters merely ignore the eighth bit in a byte. In other cases, the computer architect or a programmer extends the character set (e.g., by adding punctuation marks for alternate languages).

10. Unicode

Although a seven-bit character set and an eight-bit byte work well for English and some European languages, they do not suffice for all languages. Chinese, for example, contains thousands of symbols and glyphs. To accommodate such languages, extensions and alternatives have been proposed.

One of the widely accepted extended character sets is named Unicode. Unicode extends ASCII and is intended to accommodate all languages, including languages from the Far East. Originally designed as a sixteen-bit character set, later versions of Unicode have been extended to accommodate larger representations. Thus, future computers and I/O devices may base their character set on Unicode.

11. Unsigned Integers, Overflow, And Underflow

The positional representation of binary numbers illustrated in FIG. 2† is said to produce unsigned integers. That is, each of 2k combinations of bits is associated with a nonnegative numeric value. Because the unsigned integers used in a computer have finite size, operations like addition and subtraction can have unexpected results. For ex ample, subtracting a positive k-bit unsigned integer from a smaller positive k-bit un signed integer can yield a negative (i.e., signed) result. Similarly, adding two k-bit un signed integers can produce a value that requires more than k bits to represent.

Hardware to perform unsigned binary arithmetic handles the problem in an interesting way. First, the hardware produces a result by using wraparound (i.e., the hardware adds two k-bit integers, and takes the k low-order bits of the answer).

Second, the hardware sets overflow or underflow conditions to indicate whether the result exceeded k bits or was negative‡. For example, an overflow indicator corresponds to the value that would appear in the k+1st bit (i.e., the value commonly known as carry). FIG. 7 illustrates an addition with three-bit arithmetic that results in a carry.

FIG. 7 Illustration of addition with unsigned integers that produces over flow. The overflow indicator, which tells whether wraparound occurred, is equal to the carry bit.

‡The term underflow denotes a value that is less than the representation can hold. A negative result from unsigned integer arithmetic is classified as an underflow because negative values cannot be represented.

12. Numbering Bits And Bytes

How should a set of bits be numbered? If we view the set as a string, it makes sense to start numbering from the left, but if we view the set as a binary number, it makes sense to start numbering from the right (i.e., from the numerically least significant bit). Numbering is especially important when data is transferred over a network because the sending and receiving computers must agree on whether the least-significant or most-significant bit will be transferred first.

The issue of numbering becomes more complicated if we consider data items that span multiple bytes. For example, consider an integer that consists of thirty-two bits. If the computer uses eight-bit bytes, the integer will span four bytes, which can be transferred starting with the least-significant or the most-significant byte.

We use the term little endian to characterize a system that stores and transmits bytes of an integer from least significant to most significant, and the term big endian to characterize a system that stores and transmits bytes of an integer from most significant to least significant. Similarly, we use the terms bit little endian and bit big endian to characterize systems that transfer bits within a byte starting at the least-significant bit and most-significant bit, respectively. We can think of the bytes of an integer as being stored in an array, and the endianness determines the direction in memory. FIG. 8 uses an example integer to illustrate the two byte orders, showing positional representation and the arrangement of bytes in memory using both little endian order and big endian order.

00011101 10100010 00111011 01100111

(a) Integer 497,171,303 in binary positional representation

(b) The integer stored in little endian order

FIG. 8 (a) Integer 497,171,303 expressed as a 32-bit binary value, with spaces used to mark groups of eight bits, (b) the integer stored in successive memory locations using little endian order, and (c) the integer stored in successive memory locations using big endian order.

The big endian representation may seem appealing because it mimics the order humans use to write numbers. Surprisingly, little endian order has several advantages for computing. For example, little endian allows a programmer to use a single memory ad dress to refer to all four bytes of an integer, the two low-order bytes, or only the lowest-order byte. NEXT>>

PREV. | NEXT