Computer Architecture: Data Paths and Instruction Execution

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

Section 2 introduces digital logic and describes the basic hardware building blocks that are used to create digital systems. The section covers basic gates, and shows how gates are constructed from transistors. The section also describes the important concept of a clock, and demonstrates how a clock allows a digital circuit to perform a series of operations. Successive sections describe how data is represented in binary and cover processors and instruction sets.

This section explains how digital logic circuits can be combined to construct a computer. The section reviews functional units, such as arithmetic-logic units and memory, and shows how the units are interconnected. Finally, the section explains how the units interact to perform computation. Later sections of the text expand the discussion by examining processors and memory systems in more detail.

2. Data Paths

The topic of how hardware can be organized to create a programmable computer is complex. Rather than look at all the details of a large design, architects begin by describing the major hardware components and their interconnection. At a high level, we are only interested in how instructions are read from memory and how an instruction is executed. Therefore, the high-level description ignores many details and only shows the interconnections across which data items move as instructions are executed. For ex-ample, when we consider the addition operation, we will see the data paths across which two operands travel to reach an Arithmetic Logic Unit (ALU) and a data path that carries the result to another unit. Our diagrams will not show other details, such as power and ground connections or control connections. Computer architects use the terms data paths to describe the idea and data path diagram to describe a figure that depicts the data paths.

To make the discussion of data paths clear, we will examine a simplified computer.

The simplifications include:

-- Our instruction set only contains four instructions

-- We assume a program has already been loaded into memory

-- We ignore startup and assume the processor is running

-- We assume each data item and each instruction occupies exactly 32 bits

-- We only consider integer arithmetic

-- We completely ignore error conditions, such as arithmetic overflow

Although the example computer is extremely simple, the basic hardware units we examine are exactly the same as a conventional computer. Thus, the example is sufficient to illustrate the main hardware components, and the example interconnection is sufficient to illustrate how data paths are designed.

3. The Example Instruction Set

As the previous section describes, a new computer design must begin with the design of an instruction set. Once the details of instructions have been specified, a computer architect can design hardware that performs each of the instructions. To illustrate how hardware is organized, we will consider an imaginary computer that has the following properties:

-- A set of sixteen general-purpose registers†

-- A memory that holds instructions (i.e., a program)

-- A separate memory that holds data items

Each register can hold a thirty-two bit integer value. The instruction memory contains a sequence of instructions to be executed. As described above, we ignore startup, and assume a program has already been placed in the instruction memory. The data memory holds data values. We will also assume that both memories on the computer are byte-addressable, which means that each byte of memory is assigned an address.

FIG. 1 lists the four basic instructions that our imaginary computer implements.

†Hardware engineers often use the term register file to refer to the hardware unit that implements a set of registers; we will simply refer to them as registers.

===========

Instruction | Meaning

add | Add the integers in two registers and place the result in a third register

load | Load an integer from the data memory into a register

store |Store the integer in a register into the data memory

jump | Jump to a new location in the instruction memory

FIG. 1 Four example instructions, the operands each uses, and the meaning of the instruction.

=========

The add instruction is the easiest to understand - the instruction obtains integer values from two registers, adds the values together, and places the result in a third register. For example, consider an add instruction that specifies adding the contents of registers 2 and 3 and placing the result in register 4. If register 2 contains 50 and register 3 contains 60, such an add instruction will place 110 in register 4 (i.e., the sum of the integers in registers 2 and 3).

In assembly language, such an instruction is specified by giving the instruction name followed by operands. For example, a programmer might code the add instruction described in the previous paragraph by writing:

a addd dr r44, ,r r22, ,r r3 3

where the notation rX is used to specify register X. The first operand specifies the destination register (where the result should be placed), and the other two specify source registers (where the instruction obtains the values to sum).

The load and store instructions move values between the data memory and the registers. Like many commercial processors, our imaginary processor requires both operands of an add instruction to be in registers. Also like commercial computers, our imaginary processor has a large data memory, but only a few registers. Consequently, to add two integers that are in memory, the two values must be loaded into registers.

The load instruction makes a copy of an integer in memory and places the copy in a register. The store instruction moves data in the opposite direction: it makes a copy of the value currently in a register and places the copy in an integer in memory.

One of the operands for a load or store specifies the register to be loaded or stored.

The other operand is more interesting because it illustrates a feature found on many commercial processors: a single operand that combines two values. Instead of using a single constant to specify a memory address, memory operands contain two parts. One part specifies a register, and the other part specifies a constant that is often called an offset. When the instruction is executed, the processor reads the current value from the specified register, adds the offset, and uses the result as a memory address.

An example will clarify the idea. Consider a load instruction that loads register 1 from a value in memory. Such an instruction might be written as:

l looaad dr r11, ,2 20 0( (r r3 3) )

where the first operand specifies that the value should be loaded into register 1. The second operand specifies that the memory address is computed by adding the offset 20 to the current contents of register 3.

Why are processors designed with operands that specify a register plus an offset? Using such a form makes it easy and efficient to iterate through an array. The address of the first element is placed in a register, and bytes of the element can be accessed by using the constant part of the operand. To move to the next element of the array, the register is incremented by the element size. For now, we only need to understand that such operands are used, and consider how to design hardware that implements them.

As an example, suppose register 3 contains the value 10000, and the load instruction shown above specifies an offset of 20. When the instruction is executed, the hardware adds 10000 and 20, treats the result as a memory address, and loads the integer from location 10020 into register 1.

The fourth instruction, a jump controls the flow of execution by giving the processor an address in the instruction memory. Normally, our imaginary processor works like an ordinary processor by executing an instruction and then moving to the next instruction in memory automatically. When it encounters a jump instruction, however, the processor does not move to the next instruction. Instead, the processor uses the operand in the jump instruction to compute a memory address, and then starts executing at that address.

Like the load and store instructions, our jump instruction allows both a register and offset to be specified in its operand. For example, the instruction:

j juummp p6 60 0( (r r111 1) )

specifies that the processor should obtain the contents of register 11, add 60, treat the result as an address in the instruction memory, and make the address the next location where an instruction is executed. It is not important now to understand why processors contain a jump instruction - you only need to understand how the hardware handles the move to a new location in a program.

4. Instructions In Memory

We said that the instruction memory on our imaginary computer contains a set of instructions for the processor to execute, and that each instruction occupies thirty-two bits. A computer designer specifies the exact format of each instruction by specifying what each bit means. FIG. 2 shows the instruction format for our imaginary computer.

FIG. 2 The binary representation for each of the four instructions listed in FIG. 1. Each instruction is thirty-two bits long.

Look carefully at the fields used in each instruction. Each instruction has exactly the same format, even though some of the fields are not needed in some instructions. A uniform format makes it easy to design hardware that extracts the fields from an instruction.

The operation field in an instruction (sometimes called an opcode field) contains a value that specifies the operation. For our example, an add instruction has the operation field set to 1, a load instruction has the operation field set to 2, and so on. Thus, when it picks up an instruction, the hardware can use the operation field to decide which operation to perform.

The three fields with the term reg in their name specify three registers. Only the add instruction needs all three registers; in other instructions, one or two of the register fields are not used. The hardware ignores the unused fields when executing an instruction other than add.

The order of operands in the instructions may seem unexpected and inconsistent with the code above. For example, the code for an add instruction has the destination (the register to contain the result) on the left, and the two registers to be added on the right. In the instruction, fields that specify the two registers to be added precede the field that specifies the destination. FIG. 3 shows a statement written by a programmer and the instruction when it has been converted to bits in memory. We can summarize the point:

The order of operands in an assembly language program is chosen to be convenient to a programmer; the order of operands in an instruction in memory is chosen to make the hardware efficient.

FIG. 3 (a) An example add instruction as it appears to a programmer, and (b) the instruction stored in memory.

In the figure, the field labeled reg A contains 2 to specify register 2, the field labeled reg B contains 3 to specify register 3, and the field labeled dst reg contains 4 to specify that the result should be placed in register 4.

When we examine the hardware, we will see that the binary representation used for instructions is not capricious - the format is chosen to simplify the hardware design.

For example, if an instruction has an operand that specifies a memory address, the register in the operand is always assigned to the field labeled reg A. Thus, if the hardware must add the offset to a register, the register is always found in field reg A.

Similarly, if a value must be placed in a register, the register is found in field dst reg.

5. Moving To The Next Instruction

Section 2 illustrates how a clock can be used to control the timing of a fixed sequence of steps. Building a computer requires one additional twist: instead of a fixed sequence of steps, a computer is programmable which means that although the computer has hardware to perform every possible instruction, the exact sequence of instructions to perform is not predetermined. Instead, a programmer stores a program in memory, and the processor moves through the memory, extracting and executing successive instructions one at a time. The next sections illustrate how digital logic circuits can be arranged to enable programmability.

What pieces of hardware are needed to execute instructions from memory? One key element is known as an instruction pointer. An instruction pointer consists of a register (i.e., a set of latches) in the processor that holds the memory address of the next instruction to execute. For example, if we imagine a computer with thirty-two-bit memory addresses, an instruction pointer will hold a thirty-two-bit value. To execute instructions, the hardware repeats the following three steps.

-- Use the instruction pointer as a memory address and fetch an instruction

-- Use bits in the instruction to control hardware that performs the operation

-- Move the instruction pointer to the next instruction

One of the most important aspects of a processor that can execute instructions arises from the mechanism used to move to the next instruction. After it extracts an instruction from the instruction memory, the processor must compute the memory address of the instruction that immediately follows. Thus, once a given instruction has executed, the processor is ready to execute the next sequential instruction.

In our example computer, each instruction occupies thirty-two bits in memory.

However, the memory is byte-addressable, which means that after an instruction is executed, hardware must increment the instruction pointer by four bytes (thirty two bits) to move to the next instruction. In essence, the processor must add four to the instruction pointer and place the result back in the instruction pointer. To perform the computation, the constant 4 and the current instruction pointer value are passed to a thirty-two bit adder. FIG. 4 illustrates the basic components used to increment an instruction pointer and shows how the components are interconnected.

FIG. 4 Hardware that increments a program counter.

The circuit in the figure appears to be an infinite loop that will simply run wild incrementing the program counter continuously. To understand why the circuit works, re call that a clock is used to control and synchronize digital circuits. In the case of the program counter, the clock only lets the increment occur after an instruction has executed. Although no clock is shown, we will assume that each component of the circuit is connected to the clock, and the component only acts according to the clock. Thus, the adder will compute a new value immediately, but the program counter will not be up dated until the clock pulses. Throughout our discussion, we will assume that the clock pulses once per instruction.

Each line in the figure represents a data path that consists of multiple parallel wires. In the figure, each data path is thirty-two bits wide. That is, the adder takes two inputs, both of which are thirty-two bits. The value from the instruction pointer is obvious because the instruction pointer has thirty-two bits. The other input, marked with the label 4 represents a thirty-two-bit constant with the numeric value 4. That is, we imagine thirty-two wires that are all zero except the third wire. The adder computes the sum and produces a thirty-two-bit result.

6. Fetching an Instruction

The next step in constructing a computer consists of fetching an instruction from memory. For our simplistic example, we will assume that a dedicated instruction memory holds the program to be executed, and that a memory hardware unit takes an address as input and extracts a thirty-two bit data value from the specified location in memory. That is, we imagine a memory to be an array of bytes that has a set of input lines and a set of output lines. Whenever a value is placed on the input lines, the memory uses the value as input to a decoder, selects the appropriate bytes, and sets the output lines to the value found in the bytes. FIG. 5 illustrates how the value in a program counter is used as an address for the instruction memory.

FIG. 5 The data path used during instruction fetch in which the value in a program counter is used as a memory address.

7. Decoding an Instruction

When an instruction is fetched from memory, it consists of thirty-two bits. The next conceptual step in execution consists of instruction decoding. That is, the hardware separates fields of the instruction such as the operation, registers specified, and offset. Recall from FIG. 2 how the bits of an instruction are organized. Because we used separate bit fields for each item, instruction decoding is trivial - the hardware simply separates the wires that carry bits for the operation field, each of the three register fields, and the offset field. FIG. 6 illustrates how the output from the instruction memory is fed to an instruction decoder.

FIG. 6 Illustration of an instruction decoder connected to the output of the instruction memory.

In the figure, individual outputs from the instruction decoder do not all have thirty-two bits. The operation consists of five bits, the outputs that correspond to registers consist of four bits each, the output labeled offset consists of fifteen bits. Thus, we can think of a line in the data path diagram as indicating one or more bits of data.

It is important to understand that the output from the decoder consists of fields from the instruction. For example, the path labeled offset contains the fifteen offset bits from the instruction. Similarly, the data path labeled reg A merely contains the four bits from the reg A field in the instruction. The point is that the data for reg A only specifies which register to use, and does not carry the value that is currently in the register. We can summarize:

Our example instruction decoder merely extracts bit fields from an instruction without interpreting the fields.

Unlike our imaginary computer, a real processor may have multiple instruction for mats (e.g., the fields in an arithmetic instruction may be in different locations than the fields in a memory access instruction). Furthermore, a real processor may have variable length instructions. As a result, an instruction decoder may need to examine the operation to decide the location of fields. Nevertheless, the principle applies: a decoder ex tracts fields from an instruction and passes each field along a data path.

8. Connections to a Register Unit

The register fields of an instruction are used to select registers that are used in the instruction. In our example, a jump instruction uses one register, a load or store instruction uses two, and an add instruction uses three. Therefore, each of the three possible register fields must be connected to a register storage unit as FIG. 7 illustrates.

FIG. 7 Illustration of a register unit attached to an instruction decoder.

9. Control and Coordination

Although all three register fields connect to the register unit, the unit does not al ways use all three. Instead, a register unit contains logic that determines whether a given instruction reads existing values from registers or writes data into one of the registers. In particular, the load and add instructions each write a result to a register, but the jump and store instructions do not.

It may seem that the operation portion of the instruction should be passed to the register unit to allow the unit to know how to act. To understand why the figure does not show a connection between remaining fields of the instruction and the register unit, remember that we are only examining data paths (i.e., the hardware paths along which data can flow). In an actual computer, each of the units illustrated in the figure will have additional connections that carry control signals. For example, each unit must receive a clock signal to ensure that it coordinates to take action at the correct time (e.g., to ensure that the data memory does not store a value until the correct address has been computed).

In practice, most computers use an additional hardware unit, known as a controller, to coordinate overall data movement and each of the functional units. A controller must have one or more connections to each of the other units, and must use the operation field of an instruction to determine how each unit should operate to perform the instruction. In the diagram, for example, a connection between the controller and register unit would be used to specify whether the register unit should fetch the values of one or two registers, and whether the unit should accept data to be placed in a register. For now, we will assume that a controller exists to coordinate the operation of all units.

10. Arithmetic Operations and Multiplexing

Our example set of instructions illustrates an important principle: hardware that is designed to re-use functional units. Consider arithmetic. Only the add instruction per forms arithmetic explicitly. A real processor will have several arithmetic and logical instructions (e.g., subtract, shift, logical and, etc), and will use the operation field in the instruction to decide which the ALU should perform.

Our instruction set also has an implicit arithmetic operation associated with the load, store, and jump instructions. Each of those instructions requires an addition operation to be performed when the instruction is executed. Namely, the processor must add the offset value, which is found in the instruction itself, to the contents of a register. The resulting sum is then treated as a memory address.

The question arises: should a processor have a separate hardware unit to compute the sum needed for an address, or should a single ALU be used for both general arithmetic and address arithmetic? Such questions form the basis for key decisions in processor design. Separate functional units have the advantage of speed and ease of design. Re-using a functional unit for multiple purposes has the advantage of taking less power.

Our design illustrates re-use. Like many processors, our design contains a single Arithmetic Logic Unit (ALU) that performs all arithmetic operations†. For our sample instruction set, inputs to the ALU can come from two sources: either a pair of registers or a register and the offset field in an instruction. How can a hardware unit choose among multiple sources of input? The mechanism that accommodates two possible in puts is known as a multiplexor. The basic idea is that a multiplexor has K data inputs, one data output, and a set of control lines used to specify which input is sent to the out put. To understand how a multiplexor is used, consider FIG. 8, which shows a multiplexor between the register unit and ALU. When viewing the figure, remember that each line in our diagram represents a data path with thirty-two bits. Thus, each input to the multiplexor contains thirty-two bits as does the output. The multiplexor selects all thirty-two bits from one of the two inputs and sends them to the output.

†Incrementing the program counter is a special case.

FIG. 8 Illustration of a multiplexor used to select an input for the ALU.

In the figure, inputs to the multiplexor come from the register unit and the offset field in the instruction. How does the multiplexor decide which input to pass along? Recall that our diagram only shows the data path. In addition, the processor contains a controller, and all units are connected to the controller. When the processor executes an add instruction, the controller signals the multiplexor to select the input coming from the register unit. When the processor executes other instructions, the controller specifies that the multiplexor should select the input that comes from the offset field in the instruction.

Observe that the operation field of the instruction is passed to the ALU. Doing so permits the ALU to decide which operation to perform. In the case of an arithmetic or logical instruction (e.g., add, subtract, right shift, logical and), the ALU uses the operation to select the appropriate action. In the case of other instructions, the ALU performs addition.

11. Operations Involving Data In Memory

When it executes a load or store operation, the computer must reference an item in the data memory. For such operations, the ALU is used to add the offset in the instruction to the contents of a register, and the result is used as a memory address. In our simplified design, the memory used to store data is separate from the memory used to store instructions. FIG. 9 illustrates the data paths used to connect a data memory.

FIG. 9 Illustration of data paths including data memory.

12. Example Execution Sequences

To understand how computation proceeds, consider the data paths that are used for each instruction. The following paragraphs explain the sequence. In each case, the pro gram counter gives the address of an instruction, which is passed to the instruction memory. The instruction memory fetches the value from memory, and passes bits of the value to the instruction decoder. The decoder separates fields of the instruction and passes them to other units. The remainder of the operation depends on the instruction.

Add. For an add instruction, the register unit is given three register numbers, which are passed along paths labeled reg A, reg B, and dst reg. The register unit fetches the values in the first two registers, which are passed to the ALU. The register unit also prepares to write to the third register. The ALU uses the operation code to determine that addition is required. To allow the reg B output from the register unit to reach the ALU, the controller (not shown) must set multiplexor M2 to pass the value from the B register unit and to ignore the offset value from the decoder. The controller must set multiplexor M3 to pass the output from the ALU to the register unit's data in put, and must set multiplexor M1 to ignore the output from the ALU. Once the output from the ALU reaches the input connection on the register unit, the register unit stores the value in the register specified by the path labeled dst reg, and the operation is complete.

Store. After a store instruction has been fetched from memory and decoded, the register unit fetches the values for registers A and B, and places them on its output lines.

Multiplexor M2 is set to pass the offset field to the ALU and ignore the value of register B. The controller instructs the ALU to perform addition, which adds the offset and contents of register A. The resulting sum is passed to the data memory as an address.

Meanwhile, the register B value (the second output of the register unit) is passed to the data in connection on the data memory. The controller instructs the data memory to perform a write operation, which writes the value of register B into the location specified by the value on the address lines, and the operation is complete.

Load. After a load instruction has been fetched and decoded, the controller sets multiplexor M2 so the ALU receives the contents of register A and the offset field from the instruction. As with a store, the controller instructs the ALU to perform the addition, and the result is passed to the data memory as an address. The controller signals the data memory to perform a fetch operation, which means the output of the data memory is the value at the location given by the address input. The controller must set multiplexor M3 to ignore the output from the ALU and pass the output of the data memory along the data in path of the register unit. The controller signals the register unit to store its input value in the register specified by register dst reg. Once the register unit stores the value, execution of the instruction is complete.

Jump. After a jump instruction has been fetched and decoded, the controller sets multiplexor M2 to pass the offset field from the instruction, and instructs the ALU to perform the addition. The ALU adds the offset to the contents of register A. To use the result as an address, the controller sets multiplexor M3 to pass the output from the ALU and ignore the output from the data memory. Finally, the controller sets multiplexor M1 to pass the value from the ALU to the program counter. Thus, the result from the ALU becomes the input of the 32-bit program counter. The program counter receives and stores the value, and the instruction is complete. Recall that the program counter always specifies the address in memory from which to fetch the next instruction.

Therefore, when the next instruction executes, the instruction will be extracted from the address that was computed in the previous instruction (i.e., the program will jump to the new location).

13. Summary

A computer system is programmable, which means that instead of having the entire sequence of operations hardwired into digital logic, the computer executes instructions from memory. Programmability provides substantial computational power and flexibility, allowing one to change the functionality of a computer by loading a new program into memory. Although the overall design of a computer that executes instructions is complex, the basic components are not difficult to understand.

A computer consists of multiple hardware components, such as a program counter, memories, register units, and an ALU. Connections among components form the computer's data path. We examined a set of components sufficient to execute basic instructions, and reviewed hardware for the steps of instruction fetch, decode, and exe cute, including register and data access. The encoding used for instructions is selected to make hardware design easier -- fields from the instruction are extracted and passed to each of the hardware units.

In addition to the data path, a controller has connections to each of the hardware units. A multiplexor is an important mechanism that allows the controller to route data among the hardware units. In essence, each multiplexor acts as a switch that allows data from one of several sources to be sent to a given output. When an instruction exe cutes, a controller uses fields of the instruction to determine how to set the multiplexors during the execution. Multiplexors permit a single ALU to compute address offsets as well as to compute arithmetic operations.

We reviewed execution of basic instructions and saw how multiplexors along the data path in a computer can control which values pass to a given hardware unit. We saw, for example, that a multiplexor selects whether the program counter is incremented by four to move to the next instruction or has the value replaced by the output of the ALU (to perform a jump).

EXERCISES

1. Does the example system follow the Von Neumann Architecture? Why or why not?

2. Consult FIG. 3, and show each individual bit when the following instruction is stored in memory:

a addd dr r11, ,r r1144, ,r r9 9

3. Consult FIG. 3, and show each individual bit when the following instruction is stored in memory: l looaad dr r77, ,4 433((rr1155) )

4. Why is the following instruction invalid?

j jump p4 400000000((rr1155) )

Hint: consider storing the instruction in memory.

5. The example presented in this section uses four instructions. Given the binary representation in FIG. 2, how many possible instructions (opcodes) can be created?

6 Explain why the circuit in FIG. 5 is not merely an infinite loop that runs wildly.

7. When a jump instruction is executed, what operation does the ALU perform?

8. A data path diagram, such as the diagram in FIG. 9 hides many details. If the example is changed so that every instruction is sixty-four bits long, what trivial change must be made to the figure?

9. Make a table of all instructions and show how each of the multiplexors is set when the instruction is executed.

10. Modify the example system to include additional operations right shift and subtract.

11. In FIG. 9, which input does multiplexor M1 forward during an add instruction?

12. In FIG. 9, for what instructions does multiplexor M3 select the input from the ALU?

13. Redesign the computer system in FIG. 9 to include a relative branch instruction. Assume the offset field contains a signed value, and add the value to the current program counter to produce the next value for the program counter.

14. Can the system in FIG. 9 handle multiplication? Why or why not?

PREV. | NEXT