Computer Architecture: CPUs -- Hardware Modularity

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

Earlier sections give an overview of hardware architectures without discussing design or implementation details. This brief section considers designs that employ modularity. In particular, the section contrasts hardware modularity with software modularity, and considers why common programming abstractions do not apply to hardware. It then uses an example to illustrate how a basic hardware module can be designed that is flexible, and how replication of a basic module allows a designer to form a scalable hardware design.

2. Motivations for Modularity

Modular construction has two motivations: intellectual and economic. From an intellectual perspective, a modular approach allows a designer to break a large complex problem into smaller pieces. A small piece is easier to understand than the complete solution. Consequently, it is easier for a designer to ensure that the piece is correct, and easier for a designer to optimize an individual piece.

The economic motivation for modularity arises from the cost of designing and testing products. In many cases, a company does not produce one isolated product. In stead, the company creates a set of related products. One common reason for multiple products arises from size -- a company might sell a set of related products that range in size from small to large. For example, a company that sells network equipment might offer four models of a network switch, where the models connect four computers, twenty-four computers, forty-eight computers, or ninety-six computers. Alternatively, a company may offer a series of products that supply the same basic functionality, but where each product has special features. For example, a company that sells network equipment may offer one model that connects to a wireless Wi-Fi network and another model that connects to a wired Ethernet.

Because designing a product is expensive, a company can save money if a basic module can be designed once and then re-used in multiple products. Further savings arise because once a basic module has been tested thoroughly, successive designs that use the module can assume it works correctly.

3. Software Modularity

Modularity has played a key role in software design since early computers. The principal abstraction consists of a subroutine (also called a procedure, subprogram or function). The early motivation for using subroutines arose from limited memory size -- instead of repeating sections of code at multiple places throughout the program, a single copy of the code could be placed in memory, and then used (i.e., called)at several places in the program.

As software became more complex, subprograms became an important tool for handling complexity. In particular, the use of a subprogram abstraction made it possible to have an expert build a piece of software that other programmers could use without understanding the details. For example, an expert who understands numerical mathematics can create a set of trigonometric functions, such as sin(x) and cos(x), that are both efficient and accurate. Other programmers can invoke the functions without writing the code themselves and without needing to understand the algorithms being used. By raising the level of abstraction and hiding details, subprograms allow programmers to work at a higher level, meaning that they can be much more productive and the resulting software will contain fewer errors.

4. Parameterized Invocation of Subprograms

How can a basic building block be used for multiple purposes? The answer for software is well known. When creating a subprogram, a programmer specifies a set of formal parameters. Then, when writing code that invokes the subprogram, the programmer specifies actual arguments that are substituted in place of formal parameters.

The key point is:

When building modularized software, a single copy of each subprogram exists. The only change among invocations consists of the actual arguments supplied when the subprogram is invoked.

5. Hardware Scaling and Parallelism

Although it works well with software, the paradigm of parameterized function in vocation cannot be used with hardware. The reason is that software can invoke a function iteratively, but hardware requires separate physical instantiations that can be con trolled in parallel. For example, consider controlling a set of N items. In software, the items can be stored in an array, a function can be written to perform an operation on one item, and the program can iterate through the array, calling the function for each element. The program can scale to a larger array merely by changing the bound on the iteration.

When hardware is created to control a set of items, each item requires some hardware dedicated to the item. If additional elements are added to the set, additional hardware must be added to the design. In other words, scaling a hardware design al ways requires adding additional pieces of hardware. As a consequence:

When hardware designers think about a modular design, they look for ways to make it possible to add additional hardware to the design, not for ways to invoke a given piece of hardware iteratively.

6. Basic Block Replication

The fundamental technique used to make it possible to scale hardware consists of defining a basic building block that can be replicated as needed. We have already seen trivial examples. For instance, a latch circuit can be replicated N times to form an N-bit register, and a full adder is replicated N-1 times and combined with a half adder to build a circuit to compute the sum of two N-bit integers.

In the trivial cases described above, replication involves a small circuit (i.e., a few gates), and the number of replications is fixed. Although replication of a small circuit is an important aspect of design, the approach can be applied to significantly larger circuits and used to scale a design. For example, a chip manufacturer may use a multicore architecture to produce a series of products that have two cores, four cores, eight cores, and so on. Replication is especially important in designs where the number of inputs or outputs visible to a user varies across a series of products.

7. An Example Design (Rebooter)

An example will clarify the idea. Rather than choose a hypothetical design, we will consider a piece of hardware used in the author's lab. The lab, which is used for operating system and networking research, has a large set of backend computers that are available for researchers and students in classes. The lab facilities allow a user to create an operating system, allocate a backend computer, download the operating system into the backend computer's memory, and start the computer running. The user then can interact with the backend computer.

Unfortunately, experimental work on operating systems often results in crashes or leaves the computer hardware in a state that cannot respond to further input. In such situations, the backend computer must be power-cycled to regain control. Therefore, we have created a special-purpose hardware system that can power-cycle individual back end computers as needed. We call the system a rebooter. Several generations of re-booter hardware have been used in the lab; we will review one design.

8. High-level Rebooter Design

In principle, the rebooter hardware follows a straightforward approach. A rebooter has a set of outputs that each supply power to a backend computer. The inputs to the rebooter consist of a binary value that specifies one of the outputs to reboot plus an en able input that tells the rebooter to act. To use the rebooter, a binary value is placed on the input lines (to specify one of the outputs) and the enable input is set to 1, which causes the rebooter to power-cycle the specified output†. FIG. 1 illustrates the in puts and outputs.

Rebooter Hardware Unit: N-bit binary input value power connections for 2N backend computers enable input

FIG. 1 The conceptual organization of rebooter hardware.

How many outputs should a rebooter have? The question is important because the rebooter needs a physical connection for each output. Initially, the lab had only one backend, but the size evolved quickly to two and then eight. To plan for the future, we needed a rebooter circuit to accommodate at least 40 backends, and perhaps 100. The situation illustrates a standard hardware dilemma:

-- A design with too few outputs will not accommodate future needs

-- A design with too many outputs is wasteful

†The exact details of how the rebooter circuit is used are irrelevant to the discussion that follows; it is only important to understand the basics.

9. A Building Block to Accommodate a Range Of Sizes

Rather than choose a specific size, we used a modular approach. That is, we chose a basic building block and devised a way to interconnect basic blocks to form a larger rebooter. The modular approach allowed us to construct a small rebooter, and then add additional outputs as needed.

Our basic building block consists of a sixteen-output rebooter as FIG. 2 illustrates.

FIG. 2 Illustration of the basic building block used for the rebooter.

Look carefully at the figure. The binary input value comprises eight bits, but there are only sixteen outputs. Thus, only four bits are needed to select one of the outputs.

Why are extra input bits present? They are used to allow multiple copies of the building block to be combined to form a larger rebooter.

10. Parallel Interconnection

Our design uses a parallel approach common to many hardware systems. That is, the inputs connect to all modules in parallel. Conceptually, each building block passes a copy of its inputs (including the enable input) on to the next building block. FIG. 3 illustrates the idea.

FIG. 3 Illustration of a basic building block passing all inputs to the next stage of the rebooter.

11. An Example Interconnection

FIG. 4 illustrates how the building blocks can be connected.

FIG. 4 An example interconnection of four copies of the basic building block that provides 64 outputs.

12. Module Selection

As FIG. 4 indicates, the inputs are passed in parallel to all four modules. A question arises: if the input specifies power-cycling computer number 5, does each module power-cycle its fifth output? The answer is no. Only the fifth output on module 1 is affected.

To understand how modules respond to inputs, it is necessary to know that each module is assigned a unique ID (0, 1, 2, and 3 in our example). A module includes hardware that checks the four high-order bits of the input to see if they match the as signed ID. If the input does not match the ID, the input is ignored. In other words, the hardware interprets the four high-order bits as a module selection and the four low-order bits as an output selection.

As an example, FIG. 5 illustrates how the hardware interprets the input value 5 as module 0 and output 5.

FIG. 5 The interpretation of input 5 by the rebooter in FIG. 4.

As FIG. 5 shows, input 5 means the four high-order bits contain 0000 and the four low-order bits contain 0101. The high order bits match the ID assigned to module 0, but none of the other modules. Therefore, only module 0 responds to the input.

Using the high-order bits of the input to select a module makes the hardware extremely efficient. The module selection bits can be passed to a comparator chip along with the ID of the module. As the name implies, a comparator compares two sets of inputs, and sets an output line high if the two are equal. Thus, very little additional hardware is needed to perform module selection.

13. Summary

Both hardware and software engineers use modularity. In software, the fundamental abstraction for modularity is a subprogram. In hardware, the fundamental abstraction is the replication of a basic building block.

One method used to accommodate a range of hardware sizes consists of structuring a module (i.e., a building block) to accept a set of N input lines that control a set of 2N outputs. When building blocks are replicated, each is assigned a unique ID. Additional input lines are added to the design, which means the high-order bits of the input can be used to select one of the modules, and the low-order bits can be used to select an output on the module.

EXERCISES

1. In engineering, what is the relationship between modularity and re-use?

2. How does the ability to pass arguments to functions help programmers control the complexity of software?

3. When a software engineer and a hardware engineer think about the design of a crypto system that processes 128-bit integers, they each start with a bias. A software engineer might imagine an algorithm that iterates through the integer, working on 32 bits at a time. What will a hardware engineer envision?

4. Mathematically, one can have an arbitrary number of outputs from a module and use arithmetic to extract a module number and an input for the module (e.g., for seven outputs per module divide the input value by 7 to get a module number and use the remainder to select an output within the module). However, hardware engineers always choose to make outputs a power of two. Explain.

5. What are the tradeoffs to consider when choosing how many outputs a piece of hardware should have?

6. Suppose a basic building block contains 4 outputs, and a design must scale to 64 outputs. How many building blocks will be used?

7. If each building block contains 8 outputs and the input has 16 bits, how many total outputs can be controlled, and how many building block chips will be used?

8. In the previous exercise, draw a diagram similar to the one in FIG. 5 that shows how bits of the input are interpreted.

9. Look up comparator chips. How many pairs of inputs does a single comparator have?

10. In the previous exercise, suppose a comparator chip can compare K pairs of inputs and a designer needs to compare 2K pairs. How can multiple chips be used?

PREV. | NEXT