Computer Architecture: Variety of Processors & Computational Engines

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

Previous sections describe the basic building blocks used to construct computer systems: digital logic and representations used for data types such as characters, integers, and floating point numbers. This section begins an investigation of one of three key elements of any computer system: a processor. The section introduces the general concept, describes the variety of processors, and discusses the relationship between clock rate and processing rate. The next sections extend the basic description by explaining instruction sets, addressing modes, and the functions of a general-purpose CPU.

2. The Two Basic Architectural Approaches

Early in the history of computers, architects experimenting with new designs considered how to organize the hardware. Two basic approaches emerged that are named for the groups who proposed them:

-- Harvard Architecture d Von Neumann Architecture

We will see that the two share ideas, and only differ in how programs and data are stored and accessed.

3. The Harvard And Von Neumann Architectures

The term Harvard Architecture† refers to a computer organization with four principal components: a processor, an instruction memory, a data memory, and I/O facilities, organized as FIG. 1 illustrates.

FIG. 1 Illustration of the Harvard Architecture that uses two memories, one to hold programs and another to store data.

Although it includes the same basic components, a Von Neumann Architecture‡ uses a single memory to hold both programs and data. FIG. 2 illustrates the approach.

FIG. 2 Illustration of the Von Neumann Architecture. Both programs and data can be stored in the same memory.

†The name arises because the approach was first used on the Harvard Mark I relay computer.

‡The name is taken from John Von Neumann, a mathematician who first proposed the architecture.

The chief advantage of the Harvard Architecture arises from its ability to have one memory unit optimized to store programs and another memory unit optimized to store data. The chief disadvantage arises from inflexibility: when purchasing a computer, an owner must choose the size of the instruction memory and the size of data memory.

Once the computer has been purchased, an owner cannot use part of the instruction memory to store data nor can he or she use part of the data memory to store programs.

Although it has fallen out of favor for general-purpose computers, the Harvard Architecture is still sometimes used in small embedded systems and other specialized designs.

Unlike the Harvard Architecture, the Von Neumann Architecture offers complete flexibility: at any time, an owner can change how much of the memory is devoted to programs and how much to data. The approach has proven to be so valuable that it has become widely adopted:

Because it offers flexibility, the Von Neumann Architecture, which uses a single memory to hold both programs and data, has become pervasive: almost all computers follow the Von Neumann approach.

We say a computer that follows the Von Neumann Architecture employs a stored program approach because a program is stored in memory. More, important programs can be loaded into memory just like other data items.

Except when noted, the remainder of the text implicitly assumes a Von Neumann Architecture. There are two primary exceptions in Sections 6 and 12. Section 6, which explains data paths, uses a simplified Harvard Architecture in the example.

Section 12, which explains caching, discusses the motivation for using separate instruction and data caches.

4. Definition of a Processor

The remainder of this section considers the processor component present in both the Harvard and Von Neumann Architectures. The next sections define the term and characterize processor types. Later sections explore the subcomponents of complex processors.

Although programmers tend to think of a conventional computer and often use the term processor as a synonym for the Central Processing Unit (CPU), computer architects have a much broader meaning that includes the processors used to control the engine in an automobile, processors in hand-held remote control devices, and specialized video processors used in graphics equipment. To an architect, a processor refers to a digital device that can perform a computation involving multiple steps. Individual processors are not complete computers; they are merely one of the building blocks that an architect uses to construct a computer system. Thus, although it can compute more than the combinatorial Boolean logic circuits we examined in Section 2, a processor need not be large or fast. In particular, some processors are significantly less powerful than the general-purpose CPU found in a typical PC. The next sections help clarify the definition by examining characteristics of processors and explaining some of the ways they can be used.

5. The Range Of Processors

Because processors span a broad range of functionality and many variations exist, no single description adequately captures all the properties of processors. Instead, to help us appreciate the many designs, we need to divide processors into categories ac cording to functionality and intended use. For example, we can use four categories to explain whether a processor can be adapted to new computations. The categories are listed in order of flexibility:

-- Fixed logic d Selectable logic d Parameterized logic d Programmable logic

A fixed logic processor, which is the least flexible, performs a single task. More important, all the functionality needed to perform the operation is built in when the processor is created, and the functionality cannot be altered without changing the underlying hardware†. For example, a fixed logic processor can be designed to compute a function, such as sine(x), or to perform a graphics operation needed in a video game.

A selectable logic processor has slightly more flexibility than a fixed logic processor. In essence, a selectable logic processor contains facilities needed to perform more than one function; the exact function is specified when the processor is invoked. For example, a selectable logic processor might be designed to compute either sine(x) or cosine(x).

A parameterized logic processor adds additional flexibility. Although it only computes a predetermined function, the processor accepts a set of parameters that control the computation. For example, consider a parameterized processor that computes a hash function, h(x). The hash function uses two constants, p and q, and computes the hash of x by computing the remainder of x when multiplied by p and divided by q. For ex ample, if p is 167 and q is 163, h(26729) is the remainder of 4463743 divided by 163, or 151‡. A parameterized processor for such a hash function allows constants p and q to be changed each time the processor is invoked. That is, in addition to the input, x, the processor accepts additional parameters, p and q, that control the operation.

A programmable logic processor offers the most flexibility because it allows the sequence of steps to be changed each time the processor is invoked -- the processor can be given a program to run, typically by placing the program in memory.

†Engineers use the term hardwired for functionality that cannot be changed without altering the underlying wiring.

‡Hashing is often applied to strings. In the example, number 26729 is the decimal value of the two characters in the string "hi" when treated as an unsigned short integer.

6. Hierarchical Structure And Computational Engines

A large processor, such as a modern, general-purpose CPU, is so complex that no human can understand the entire processor as a single unit. To control the complexity, computer architects use a hierarchical approach in which subparts of the processor are designed and tested independently before being combined into the final design.

Some of the independent subparts of a large processor are so sophisticated that they fit our definition of a processor -- the subpart can perform a computation that involves multiple steps. For example, a general-purpose CPU that has instructions for sine and cosine might be constructed by first building and testing a trigonometry processor, and then combining the trigonometry processor with other pieces to form the final CPU.

How do we describe a subpiece of a large, complex processor that acts independently and performs a computation? Some engineers use the term computational engine. The term engine usually implies that the subpiece fills a specific role and is less powerful than the overall unit. For example, FIG. 3 illustrates a CPU that contains several engines.

FIG. 3 An example of a CPU that includes multiple components. The large arrow in the center of the figure indicates a central interconnect mechanism that the components use to coordinate.

The CPU in the figure includes a special-purpose graphics engine. Graphics engines, sometimes called graphics accelerators, are common because video game software is popular and many computers need a graphics engine to drive the graphics display at high speed. For example, a graphics engine might include facilities to repaint the surface of a graphical figure after it has been moved (e.g., in response to a joystick movement).

The CPU illustrated in FIG. 3 also includes a query engine. Query engines and closely related pattern engines are used in database processors. A query engine examines a database record at high speed to determine if the record satisfies the query; a pat tern engine examines a string of bits to determine if the string matches a specified pat tern (e.g., to test whether a document contains a particular word). In either case, a CPU has enough capability to handle the task, but a special-purpose processor can perform the task much faster.

7. Structure Of A Conventional Processor

Although the imaginary CPU described in the previous section contains many engines, most processors do not. Two questions arise. First, what engine(s) are found in a conventional processor? Second, how are the engines interconnected? This section answers the questions broadly, and later sections give more detail.

Although a practical processor contains many subcomponents with complex inter connections among them, we can view a processor as having five conceptual units:

-- Controller d Arithmetic Logic Unit (ALU) d Local data storage (typically, registers) d Internal interconnection(s) d External interface(s) (I/O buses)

FIG. 4 illustrates the concept.

Controller. The controller forms the heart of a processor. Controller hardware has overall responsibility for program execution. That is, the controller steps through the program and coordinates the actions of all other hardware units to perform the specified operations.

Arithmetic Logic Unit (ALU). We think of the ALU as the main computational engine in a processor. The ALU performs all computational tasks, including integer arithmetic, operations on bits (e.g., left or right shift), and Boolean (logical) operations (e.g., Boolean and, or, exclusive or, and not). However, an ALU does not perform multiple steps or initiate activities. Instead, the ALU only performs one operation at a time, and relies on the controller to specify exactly what operation to perform on the operand values.

Local Data Storage. A processor must have at least some local storage to hold data values such as operands for arithmetic operations and the result. As we will see, local storage usually takes the form of hardware registers -- values must be loaded into the hardware registers before they can be used in computation.

FIG. 4 The five major units found in a conventional processor. The external interface connects to the rest of the computer system.

Internal Interconnection(s). A processor contains one or more hardware mechanisms that are used to transfer values between the other hardware units. For example, the interconnection hardware is used to move data values from the local storage to the ALU or to move results from the ALU to local storage. Architects sometimes use the term data path to describe an internal interconnection.

External Interface(s). The external interface unit handles all communication between the processor and the rest of the computer system. In particular, the external interface manages communication between the processor and external memory and I/O devices.

8. Processor Categories and Roles

Understanding the range of processors is especially difficult for someone who has not encountered hardware design because processors can be used in a variety of roles.

It may help if we consider the ways that hardware devices use processors and how processors function in each role. Here are four examples:

-- Coprocessors d Microcontrollers d Embedded system processors d General-purpose processors

Coprocessors. A coprocessor operates in conjunction with and under the control of another processor. Usually, a coprocessor consists of a special-purpose processor that performs a single task at high speed. For example, some CPUs use a coprocessor known as a floating point accelerator to speed the execution of arithmetic operations -- when a floating point operation occurs, the CPU automatically passes the necessary values to the coprocessor, obtains the result, and then continues execution. In architectures where a running program does not know which operations are performed directly by the CPU and which operations are performed by a coprocessor, we say that the operation of a coprocessor is transparent to the software. Typical coprocessors use fixed or selectable logic, which means that the functions the coprocessor can perform are determined when the coprocessor is designed.

Microcontrollers. A microcontroller consists of a programmable device dedicated to the control of a physical system. For example, microcontrollers run physical systems such as the engine in a modern automobile, the landing gear on an airplane, and the automatic door in a grocery store. In many cases, a microcontroller performs a trivial function that does not require much traditional computation. Instead, a microcontroller tests sensors and sends signals to control devices. FIG. 5 lists an example of the steps a typical microcontroller can be programmed to perform:

do forever {wait for the sensor to be tripped;

turn on power to the door motor;

wait for a signal that indicates the door is open;

wait for the sensor to reset;

delay ten seconds;

turn off power to the door motor;}

FIG. 5 Example of the steps a microcontroller performs. In most cases, microcontrollers are dedicated to trivial control tasks.

Embedded System Processors. An embedded system processor runs sophisticated electronic devices such as a wireless router or smart phone. The processors used for embedded systems are usually more powerful than the processors that are used as microcontrollers, and often run a protocol stack used for communication. However, the processor may not contain all the functionality found on more general-purpose CPUs.

General-purpose Processors. General-purpose processors are the most familiar and need little explanation. For example, the CPU in a PC is a general-purpose proces sor.

9. Processor Technologies

How are processors created? In the 1960s, processors were created from digital logic circuits. Individual gates were connected together on a circuit board, which then plugged into a chassis to form a working computer. By the 1970s, large-scale integrated circuit technology arrived, which meant that the smallest and least powerful processors -- such as those used for microcontrollers -- could each be implemented on a single integrated circuit. As integrated circuit technology improved and the number of transistors on a chip increased, a single chip became capable of holding more powerful processors. Today, many of the most powerful general-purpose processors consist of a single integrated circuit.

10. Stored Programs

We said that a processor performs a computation that involves multiple steps.

Although some processors have the series of steps built into the hardware, most do not.

Instead, they are programmable (i.e., they rely on a mechanism known as programming). That is, the sequence of steps to be performed comprise a program that is placed in a location the processor can access; the processor accesses the program and follows the specified steps.

Computer programmers are familiar with conventional computer systems that use main memory as the location that holds a program. The program is loaded into memory each time a user runs the application. The chief advantage of using main memory to hold programs lies in the ability to change the program. The next time a user runs a program after it has been changed, the altered version will be used.

Although our conventional notion of programming works well for general-purpose processors, other types of processors use alternative mechanisms that are not as easy to change. For example, the program for a microcontroller usually resides in hardware known as Read Only Memory (ROM). In fact, a ROM that contains a program may re side on an integrated circuit along with a microcontroller that runs the program. For ex ample, the microcontroller used in an automobile may reside on a single integrated circuit that also contains the program the microcontroller runs.

The important point is that programming is a broad notion:

To a computer architect, a processor is classified as programmable if, at some level of detail, the processor is separate from the program it runs. To a user, it may appear that the program and processor are integrated, and it may not be possible to change the program without replacing the processor.

11. The Fetch-Execute Cycle

How does a programmable processor access and perform steps of a program? The data path description in Section 6 explains the basic idea. Although the details vary among processors, all programmable processors follow the same fundamental paradigm.

The underlying mechanism is known as the fetch-execute cycle.

To implement fetch-execute, a processor has an instruction pointer that automatically moves through the program in memory, performing each step. That is, each programmable processor executes two basic functions repeatedly. Algorithm 1 presents the two fundamental steps†.

Algorithm

Repeat forever { Fetch: access the next step of the program from the location in which the program has been stored.

Execute: perform the step of the program.}

Algorithm 1 The Fundamental Steps Of The Fetch-Execute Cycle

The important point is:

At some level, every programmable processor implements a fetch execute cycle.

Several questions arise. Exactly how is the program represented in memory, and how is such a representation created? How does a processor identify the next step of a program? What are the possible operations that can be performed during the execution phase of the fetch-execute cycle? How does the processor perform each operation? The next sections will answer each of these questions in more detail. The remainder of this section concentrates on three questions: how fast does a processor operate, how does a processor begin with the first step of a program, and what happens when the processor reaches the end of a program?

†Note that the algorithm presented here is a simplified form; when we discuss I/O, we will see how the algorithm is extended to handle device interrupts.

12. Program Translation

An important question for programmers concerns how a program is converted to the form a processor expects. A programmer uses a High Level Language (HLL) to create a computer program. We say the programmer writes source code. The programmer uses a tool to translate the source code into the representation that a processor expects.

Although a programmer invokes a single tool, such as gcc, multiple steps are required to perform the translation. First, a preprocessor expands macros, producing a modified source program. The modified source program becomes input to a compiler, which translates the program into assembly language. Although it is closer to the form needed by a processor, assembly language can be read by humans. An assembler translates the assembly language program into a relocatable object program that contains a combination of binary code and references to external library functions. A linker processes the relocatable object program by replacing external function references with the code for the functions. To do so, the linker extracts the name of a function, searches one or more libraries to find binary code for the function. FIG. 6 illustrates the translation steps and the software tool that performs each step.

FIG. 6 The steps used to translate a source program to the binary object code representation used by a processor.

13. Clock Rate and Instruction Rate

One of the primary questions about processors concerns speed: how fast does the fetch-execute cycle operate? The answer depends on the processor, the technology used to store a program, and the time required to execute each instruction. On one hand, a processor used as a microcontroller to actuate a physical device (e.g., an electric door) can be relatively slow because a response time under one-tenth of a second seems fast to a human. On the other hand, a processor used in the highest-speed computers must be as fast as possible because the goal is maximum performance.

As we saw in Section 2, most processors use a clock to control the rate at which the underlying digital logic operates. Anyone who has purchased a computer knows that sales personnel push customers to purchase a fast clock with the argument that a higher clock rate will result in higher performance. Although a higher clock rate usually means higher processing speed, it is important to realize that the clock rate does not give the rate at which the fetch-execute cycle proceeds. In particular, in most systems, the time required for the execute portion of the cycle depends on the instruction being executed. We will see later that operations involving memory access or I/O can require significantly more time (i.e., more clock cycles) than those that do not. The time also varies among basic arithmetic operations: integer multiplication or division requires more time than integer addition or subtraction. Floating point computation is especially costly because floating point operations usually require more clock cycles than equivalent integer operations. Floating point multiplication or division stands out as especially costly -- a single floating point division can require orders of magnitude more clock cycles than an integer addition.

For now, it is sufficient to remember the general principle:

The fetch-execute cycle may not proceed at a fixed rate because the time taken to execute an instruction depends on the operation being performed. An operation such as multiplication requires more time than an operation such as addition.

14. Control: Getting Started and Stopping

So far, we have discussed a processor running a fetch-execute cycle without giving details. We now need to answer two basic questions. How does the processor start running the fetch-execute cycle? What happens after the processor executes the last step in a program? The issue of program termination is the easiest to understand: processor hardware is not designed to stop. Instead, the fetch-execute cycle continues indefinitely. Of course, a processor can be permanently halted, but such a sequence is only used to power down a computer -- in normal operations, the processor continues to execute one instruction after another.

In some cases, a program uses a loop to delay. For example, a microcontroller may need to wait for a sensor to indicate an external condition has been met before proceeding. The processor does not merely stop to wait for the sensor. Instead, the program contains a loop that repeatedly tests the sensor. Thus, from a hardware point of view, the fetch-execute cycle continues.

The notion of an indefinite fetch-execute cycle has a direct consequence for programming: software must be planned so a processor always has a next step to execute.

In the case of a dedicated system such as a microcontroller that controls a physical de vice, the program consists of an infinite loop -- when it finishes the last step of the program, the processor starts again at the first step. In the case of a general-purpose computer, an operating system is always present. The operating system can load an application into memory, and then direct the processor to run the application. To keep the fetch-execute cycle running, the operating system must arrange to regain control when the application finishes. When no application is running, the operating system enters a loop to wait for input (e.g., from a touch screen, keyboard, or mouse).

To summarize:

Because a processor runs the fetch-execute cycle indefinitely, a system must be designed to ensure that there is always a next step to execute.

In a dedicated system, the same program executes repeatedly; in a general-purpose system, an operating system runs when no application is running.

15. Starting the Fetch-Execute Cycle

How does a processor start the fetch-execute cycle? The answer is complex be cause it depends on the underlying hardware. For example, some processors have a hardware reset. On such processors, engineers arrange for a combinatorial circuit to apply voltage to the reset line until all system components are ready to operate. When voltage is removed from the reset line, the processor begins executing a program from a fixed location. Some processors start executing a program found at location zero in memory once the processor is reset. In such systems, the designer must guarantee that a valid program is placed in location zero before the processor starts.

The steps used to start a processor are known as a bootstrap. In an embedded environment, the program to be run usually resides in Read Only Memory (ROM). On a conventional computer, the hardware reads a copy of the operating system from an I/O device, such as a disk, and places the copy into memory before starting the processor.

In either case, hardware assist is needed for bootstrap because a signal must be passed to the processor that causes the fetch-execute cycle to begin.

Many devices have a soft power switch, which means that the power switch does not actually turn power on or off. Instead, the switch acts like a sensor -- the processor can interrogate the switch to determine its current position. Booting a device that has a softswitch is no different than booting other devices. When power is first applied (e.g., when a battery is installed), the processor boots to an initial state. The initial state consists of a loop that interrogates the soft power switch. Once the user presses the soft power switch, the hardware completes the bootstrap process.

16. Summary

A processor is a digital device that can perform a computation involving multiple steps. Processors can use fixed, selectable, parameterized or programmable logic. The term engine identifies a processor that is a subpiece of a more complex processor.

Processors are used in various roles, including coprocessors, microcontrollers, em bedded processors, and general-purpose processors. Although early processors were created from discrete logic, a modern processor is implemented as a single VLSI chip.

A processor is classified as programmable if at some level, the processor hardware is separate from the sequence of steps that the processor performs; from the point of view of the end user, however, it might not be possible to change the program without replacing the processor. All programmable processors follow a fetch-execute cycle; the time required for one cycle depends on the operation performed. Because fetch-execute processing continues indefinitely, a designer must construct a program in such a way that the processor always has an instruction to execute.

A set of software programs are used to translate a source program, written by a programmer, into the binary representation that a processor requires. The set includes a preprocessor, compiler, assembler, and linker.

QUIZ

1. Neither FIG. 1 nor FIG. 2 has storage as a major component. Where does storage (e.g., flash or an electro-mechanical disk) fit into the figures?

2. Consider the System-on-Chip (SoC) approach described in Section 2. Besides a processor, memory, and I/O facilities, what does an SoC need?

3. Consult Wikipedia to learn about early computers. How much memory did the Harvard Mark I computer have, and what year was it created? How much memory did the IBM 360/20 computer have, and what year was it created?

4. Although CPU manufacturers brag about the graphics accelerators on their chips, some video game designers choose to keep the graphics hardware separate from the processor. Explain one possible motivation for keeping it separate.

5. Imagine a smart phone that employs a Harvard Architecture. If you purchase such a phone, what would you need to specify that you do not normally specify?

6 What aspect of a Von Neumann Architecture makes it more vulnerable to hackers than a Harvard Architecture?

7. If you have access to gcc, read the man page to learn the command line argument that al lows you to run only the preprocessor and place the preprocessed program in a file that can be viewed. What changes are made in the source program?

8. Extend the previous exercise by placing the assembly language output from the compiler in a file that can be viewed.

9. Write a computer program that compares the difference in execution times between an integer division and a floating point division. To test the program, execute each operation 100,000 times, and compare the difference in running times.

PREV. | NEXT