Computer Architecture: CPUs -- Programmed and Interrupt Driven I/O



Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software

Input Devices
| Data Loggers + Recorders | Books | Links + Resources


AMAZON multi-meters discounts AMAZON oscilloscope discounts


1. Introduction

Earlier sections introduce I/O. The previous section explains how a bus provides the connection between a processor and a set of I/O devices. The section discusses the bus address space, and shows how an address space can hold a combination of both memory and I/O devices. Finally, the section explains that a bus uses the fetch-store paradigm, and shows how fetch and store operations can be used to interrogate or control an external device.

This section continues the discussion. The section describes and compares the two basic styles of interaction between a processor and an I/O device. It focuses on interrupt-driven I/O, and explains how device driver software in the operating system interacts with an external device.

The next section takes a different approach to the subject by examining I/O from a programmer's perspective. The section looks at individual devices, and describes how they interact with the processor.

2. I /O Paradigms

We know from the previous section that I/O devices connect to a bus, and that a processor can interact with the device by issuing fetch and store operations to bus ad dresses that have been assigned to the device. Although the basic mechanics of I/O are easy to specify, several questions remain unanswered. What control operations should each device support? How does application software running on the processor access a given device without understanding the hardware details? Can the interaction between a processor and I/O devices affect overall system performance?

3. Programmed I/O

The earliest computers took a straightforward approach to I/O: an external device consisted of basic digital circuits that controlled the hardware in response to fetch and store operations; the CPU handled all the details. For example, to write data on a disk, the CPU activated a set of circuits in the device, one at a time. The circuits positioned the disk arm and caused the head to write a block of data. To capture the idea that an early peripheral device consisted only of basic circuits that respond to commands from the CPU we say that the device contained no intelligence, and characterize the form of interaction by saying that the I/O is programmed.

4. Synchronization

It may seem that writing software to perform programmed I/O is trivial: a program merely assigns a value to an address on the bus. To understand I/O programming, however, we need to remember two things. First, a nonintelligent device cannot remember a list of commands. Instead, circuits in the device perform each command precisely when the processor sends the command. Second, a processor operates much faster than an I/O device - even a slow processor can execute thousands of instructions in the time it takes for a motor or mechanical actuator to move a physical mechanism.

As an example of a mechanical device, consider a printer. The print mechanism can spray ink across the page, but can only print a small vertical band at any time.

Printing starts at the top of the page. After printing one horizontal band, the paper must be advanced before the next horizontal band can be printed. If a processor merely is sues instructions to print an item, advance the paper, and print another item, the second item may be printed while the paper is still moving, resulting in a smear. In the worst case, if the print mechanism is not designed to operate while the paper advance mechanism operates, the hardware may be damaged.

To prevent problems, programmed I/O relies on synchronization. That is, once it issues a command, the processor must interact with the device to wait until the device is ready for another command. We can summarize:

Because a processor operates orders of magnitude faster than an I/O device, programmed I/O requires the processor to synchronize with the device that is being controlled.

5. Polling

The basic form of synchronization that a processor uses with an I/O device is known as polling. In essence, polling requires the processor to ask repeatedly whether an operation has completed before the processor starts the next operation. Thus, to per form a print operation, a processor can use polling at each step. FIG. 1 shows an example.

-- Test to see if the printer is powered on

-- Cause the printer to load a blank sheet of paper

-- Poll to determine when the paper has been loaded

-- Specify data in memory that tells what to print

-- Poll to wait for the printer to load the data

-- Cause the printer to start spraying a band of ink

-- Poll to determine when the ink mechanism finishes

-- Cause the printer to advance the paper to the next band

-- Poll to determine when the paper has advanced

-- Repeat the above six steps for each band to be printed

-- Cause the printer to eject the page

-- Poll to determine when the page has been ejected


FIG. 1 Illustration of synchronization between a processor and an I/O device. The processor must wait for each step to complete.

6. Code for Polling

How does software perform polling? Because a bus follows the fetch-store paradigm, polling must use a fetch operation. That is, one or more of the addresses as signed to the device correspond to status information - when the processor fetches a value from the address, the device responds by giving its current status.

To understand how polling appears to a programmer, we need to know the exact details of a hardware device. Unfortunately, most devices are incredibly complex. For example, many vendors sell three-in-one printers that can function as scanners or fax machines as well as printers. To keep an example simple, we will imagine a simple printing device, and create a programming interface for the device. Although our example device is indeed much simpler than commercial devices, the general approach is exactly the same.

Recall that a device is assigned addresses in the address space, and the device is engineered to respond to fetch and store instructions to those addresses. When a device is created, the designer does not specify the addresses that will be used, but instead writes a relative specification by giving addresses 0 though N-1. Later, when the device is installed in a computer, actual addresses are assigned. The use of relative ad dresses in the specification means a programmer can write software to control the de vice without knowing the actual address. Once the device is installed, the actual ad dress can be passed to the software as an argument.

An example will clarify the concept. Our imaginary printer defines thirty-two contiguous bytes of addresses. Furthermore, the design has grouped the addresses into eight words that are each thirty-two bits long. The use of words is typical. The specification in FIG. 2 shows how the device interprets fetch and store operations for each of the addresses.


FIG. 2 A specification for the bus interface on an imaginary printing device. A processor uses fetch and store to control the device and determine its status.

The figure gives the meaning of fetch and store operations on addresses assigned to our imaginary I/O device. As described above, addresses in the specification start at zero because they are relative. When the device is connected to a bus, the device will be assigned thirty-two bytes somewhere in the bus address space, and software will use the actual addresses when communicating with the device.

Once a programmer is given a hardware specification similar to the one in FIG. 2, writing code that controls a device is straightforward. For example, assume our printing device has been assigned the starting bus address 0x110000. Addresses 0 through 3 in the figure will correspond to actual addresses 0x110000 through 0x110003.

To determine whether the printer is powered on, the processor merely needs to access the value in addresses 0x110000 through 0x110003. If the value is zero, the printer is off. The code to access the device status appears to be a memory fetch. In C, the status test code can be written:

int *p = (int *)0x110000;

if (*p != 0) { /* Test whether printer is on */

/* printer is on */

} else {

/* printer is off */

}

The example assumes an integer size of four bytes. The code declares p to be a pointer to an integer, initializes p to 0x110000, and then uses *p to obtain the value at address 0x110000.

Now that we understand how software communicates with a device, we can consider a sequence of steps and synchronization. FIG. 3 shows C code that performs some of the steps found in FIG. 1.


FIG. 3 Example C code that uses polling to carry out some of the steps from FIG. 1 on the imaginary printing device specified in FIG. 2.

Code in the figure assumes the device has been assigned address 0x110000, and that data structure my data contains the data to be printed in exactly the form the printer expects. To understand the use of pointers, remember that the C programming language defines pointer arithmetic: adding K to an integer pointer advances the pointer by KN bytes, where N is the number of bytes in an integer. Thus, if variable p has the value 0x110000, p+1 equals 0x110004.

The example code illustrates another feature of many devices that may seem strange to a programmer: multiple steps for a single operation. In our example, the data to be printed is located in memory and two steps are used to specify data. In the first step, the address of the data is passed to the printer. In the second step, the printer is instructed to load a copy of the data. Having two steps may seem unnecessary - why doesn't the printer start loading data automatically once an address has been specified? To understand, remember that each fetch and store instruction controls circuits in the device. A device designer might choose such a design because he or she finds it easier to build hardware that has two separate circuits, one to accept a memory address and one to load data from memory.

Programmers who have not written a program to control a device may find the code shocking because it contains four occurrences of a while statement that each appear to be an infinite loop. If such a statement appeared in a conventional application program, the statement would be in error because the loop continually tests the value at a memory location without making any changes. In the example, however, pointer p references a device instead of a memory location. Thus, when the processor fetches a value from location p+6, the request passes to a device, which interprets it as a request for status information. So, unlike a value in memory, the value returned by the device will change over time -- if the processor polls enough times, the device will complete its current operation and return zero as the status value. The point is:

Although polling code appears to contain infinite loops, the code can be correct because values returned by a device can change over time.

7. Control and Status Registers

We use the term Control and Status Registers (CSRs) to refer to the set of ad dresses that a device uses. More specifically, a control register corresponds to a contiguous set of addresses (usually the size of an integer) that respond to a store operation, and a status register corresponds to a contiguous set of addresses that respond to a fetch operation.

In practice, CSRs are usually more complicated than the simplified version listed in FIG. 2. For example, a typical status register assigns meanings to individual bits (e.g., the low-order bit of the status word specifies whether the device is in motion, the next bit specifies whether an error has occurred, and so on). More important, to conserve addresses, many devices combine control and status functions into a single set of addresses. That is, a single address can serve both functions - a store operation to the address controls the device, and a fetch operation to the same address reports the device status.

As a final detail, some devices interpret a fetch operation as both a request for status information and a control operation. For example, a trackpad delivers bytes to indicate motion of a user's fingers. The processor uses a fetch operation to obtain data from the trackpad. Furthermore, each fetch automatically resets the hardware to measure the next motion.

8. Using a Structure to Define CSRs

The example code in FIG. 3 uses a pointer and pointer arithmetic to reference individual items. In practice, programmers usually create a C struct that defines the CSRs, and then use named members of the struct to reference items in the CSRs. For example, FIG. 4 shows how the code from FIG. 3 appears when a struct is used to define the CSRs.


FIG. 4 The code from FIG. 3 rewritten to use a C struct.

As the example shows, code that uses a struct is much easier to read and debug.

Because members of the struct can be given meaningful names, a programmer reading the code can guess at the purpose, even if they are not intimately familiar with a device.

In addition, using a struct improves program organization because all the offsets of individual CSRs are specified in one place instead of being embedded throughout the code.

To summarize:

Instead of distributing CSR references throughout the code, a programmer can improve readability by declaring a structure that defines all the CSRs for a device and then referencing fields in the structure.

9. Processor Use and Polling

The chief advantage of a programmed I/O architecture arises from the economic benefit: because they do not contain sophisticated digital circuits, devices that rely on programmed I/O are inexpensive. The chief disadvantage of programmed I/O arises from the computational overhead: each step requires the processor to interact with the I/O device.

To understand why polling is especially undesirable, we must recall the fundamental mismatch between I/O devices and computation: because they are electromechanical, I/O devices operate several orders of magnitude slower than a processor. Furthermore, if a processor uses polling to control an I/O device, the amount of time the processor waits is fixed, and is independent of the processor speed.

The important point can be summarized:

Because a typical processor is much faster than an I/O device, the speed of a system that uses polling depends only on the speed of the I/O device; using a fast processor will not increase the rate at which I/O is performed.

Turning the statement around produces a corollary: if a processor uses polling to wait for an I/O device, using a faster processor merely means that the processor will execute more instructions waiting for the device (i.e., loops, such as those in FIG. 3, will run faster). Thus, a faster processor merely wastes more cycles waiting for an I/O device -- if the processor did not need to poll, the processor could be performing computation instead.

10. Interrupt-Driven I /O

In the 1950s and 1960s, computer architects became aware of the mismatch between the speed of processors and I/O devices. The difference was particularly important when the first generation of computers, which used vacuum tubes, was replaced by a second generation that used solid-state devices. Although the use of solid-state de vices (i.e., transistors) increased the speed of processors, the speed of I/O devices remained approximately the same. Thus, architects explored ways to overcome the mismatch between I/O and processor speeds.

One approach emerged as superior, and led to a revolution in computer architecture that produced the third generation of computers. Known as an interrupt mechanism, the facility is now standard in processor designs.

The central premise of interrupt-driven I/O is straightforward: instead of wasting time polling, allow a processor to continue to perform computation while an I/O device operates. When the device finishes, arrange for the device to inform the processor so that the processor can handle the device. As the name implies, the hardware temporarily interrupts the computation in progress to handle I/O. Once the device has been serviced, the processor resumes the computation exactly where it was interrupted.

In practice, interrupt-driven I/O requires that all aspects of a computer system be designed to support interrupts, including:

-- I /O device hardware

-- Bus architecture and functionality

-- Processor architecture

-- Programming paradigm

I/O Device Hardware. Instead of merely operating under control of a processor, an interrupt-driven I/O device must operate independently once it has started. Later, when it finishes, a device must be able to interrupt the processor.

Bus Architecture and Functionality. A bus must support two-way communication that allows a processor to start an operation on a device and allows the device to interrupt the processor when the operation completes.

Processor Architecture. A processor needs a mechanism that can cause the processor to suspend normal computation temporarily, handle a device, and then resume the computation.

Programming Paradigm. Perhaps the most significant change involves a shift in the programming paradigm. Polling uses a sequential, synchronous style of programming in which the programmer specifies each step of the operation an I/O device per forms. As we will see in the next section, interrupt-driven programming uses an asynchronous style of programming in which the programmer writes code to handle events.

11. An Interrupt Mechanism and Fetch-Execute

As the term interrupt implies, device events are temporary. When a device needs service (e.g., when an operation completes), hardware in the device sends an interrupt signal over the bus to the processor. The processor temporarily stops executing instructions, saves the state information needed to resume execution later, and handles the de vice. When it finishes handling an interrupt, the processor reloads the saved state and resumes executing exactly at the point the interrupt occurred. That is:

An interrupt mechanism temporarily borrows the processor to handle an I/O device. Hardware saves the state of the computation when an interrupt occurs and resumes the computation once interrupt processing finishes.

From an application programmer's point of view, an interrupt is transparent, which means a programmer writes application code as if interrupts do not exist. The hardware is designed so that the result of computation is the same if no interrupts occur, one interrupt occurs, or many interrupts occur during the execution of the instructions.

How does I/O hardware interrupt a processor? In fact, devices only request ser vice. Interrupts are implemented by a modified fetch-execute cycle that allows a processor to respond to a request. As Algorithm 1 explains, an interrupt occurs between the execution of two instructions.

---------------

Algorithm 1

Repeat forever {

Test: if any device has requested interrupt, handle the interrupt, and then continue with the next iteration of the loop.

Fetch: access the next step of the program from the location in which the program has been stored.

Execute: Perform the step of the program.

}

-------------------------

Algorithm 1 -- A Fetch-Execute Cycle That Handles Interrupts.

12. Handling an Interrupt

To handle an interrupt, processor hardware takes the five steps that FIG. 5 lists.

-- Save the current execution state

-- Determine which device interrupted

-- Call the function that handles the device

-- Clear the interrupt signal on the bus

-- Restore the current execution state


FIG. 5 Five steps that processor hardware performs to handle an interrupt. The steps are hidden from a programmer.

Saving and restoring state is easiest to understand: the hardware saves information when an interrupt occurs (usually in memory), and a special return from interrupt instruction reloads the saved state. In some architectures, the hardware saves complete state information, including the contents of all general-purpose registers. In other architectures, the hardware saves basic information, such as the instruction counter, and re quires software to explicitly save and restore values, such as the general-purpose registers. In any case, saving and restoring state are symmetric operations - hardware is designed so the instruction that returns from an interrupt reloads exactly the same state information that the hardware saves when an interrupt occurs. We say that the processor temporarily switches the execution context when it handles an interrupt.

13. Interrupt Vectors

How does the processor know which device is interrupting? Several mechanisms have been used. For example, some architectures use a special-purpose coprocessor to handle all I/O. To start a device, the processor sends requests to the coprocessor.

When a device needs service, the coprocessor detects the situation and interrupts the processor.

Most architectures use control signals on a bus to inform the processor when an interrupt is needed. The processor checks the bus on each iteration of the fetch-execute cycle. When it detects an interrupt request, interrupt hardware in the processor sends a special command over the bus to determine which device needs service. The bus is arranged so that exactly one device can respond at a time. Typically, each device is as signed a unique number, and the device responds by giving its number.

Numbers assigned to devices are not random. Instead, numbers are configured in a way that allows the processor hardware to interpret the number as an index into an array of pointers at a reserved location in memory. An item in the array, which is known as an interrupt vector, is a pointer to software that handles the device; we say that the interrupts are vectored. The software is known as an interrupt handler. FIG. 6 illustrates the data structure.

The figure shows the simplest interrupt vector arrangement in which each physical device is assigned a unique interrupt vector. In practice, computer systems designed to accommodate many devices often use a variation in which multiple devices share a common interrupt vector. After the interrupt occurs, code in the interrupt handler uses the bus a second time to determine which physical device interrupted. Once it deter mines the physical device, the handler chooses an interaction that is appropriate for the device. The chief advantage of sharing an interrupt vector among multiple devices arises from scale - a processor with a fixed set of interrupt vectors can accommodate an arbitrary number of devices.


FIG. 6 Illustration of interrupt vectors in memory. Each vector points to an interrupt handler for the device.

14. Interrupt Initialization and Disabled Interrupts

How are values installed in an interrupt vector table? Software must initialize interrupt vectors because neither the processor nor the device hardware enters or modifies the table. Instead, the hardware blindly assumes that the interrupt vector table has been initialized - when an interrupt occurs, the processor saves state, uses the bus to request a vector number, uses the value as an index into the table of vectors, and then branches to the code at that address. No matter what address is found in a vector, the processor will jump to the address and attempt to execute the instruction.

To ensure that no interrupts occur before the table has been initialized, most processors start in a mode that has interrupts disabled. That is, the processor continues to run the fetch-execute cycle without checking for interrupts. Later, once the software (usually the operating system) has initialized the interrupt vectors, the software must execute a special instruction that explicitly enables interrupts. In many processors, the interrupt status is controlled by the mode of the processor; interrupts are automatically enabled when the processor changes from the initial startup mode to a mode suitable for executing programs.

15. Interrupting an Interrupt Handler

Once an interrupt occurs and an interrupt handler is running, what happens if another device becomes ready and requests an interrupt? The simplest hardware follows a straightforward policy: once an interrupt occurs, further interrupts are automatically disabled until the current interrupt completes and returns. Thus, there is never any confusion.

The most sophisticated processors offer a multiple level interrupt mechanism which is also known as multiple interrupt priorities. Each device is assigned an interrupt priority level, typically in the range 1 through 7. At any given time, the processor is said to be operating at one of the priority levels. Priority zero means the processor is not currently handling an interrupt (i.e., is running an application); a priority N greater than zero means the processor is currently handling an interrupt from a device that has been assigned to level N.

The rule is:

When operating at priority level K, a processor can only be interrupted by a device that has been assigned to level K+1 or higher.

Note that when an interrupt happens at priority K, no more interrupts can occur at priority K or lower. The consequence is that at most one interrupt can be in progress at each priority level.

16. Configuration of Interrupts

We said that each device must be assigned an interrupt vector and (possibly) an interrupt priority. Both the hardware in the device and the software running on the processor must agree on the assignments -- when a device returns an interrupt vector number, the corresponding interrupt vector must point to the handler for the device.

How are interrupt assignments made? Two approaches have been used:

-- Manual assignment only used for small, embedded systems

-- Automated assignment used on most computer systems

Manual Assignment. Some small embedded systems still use the method that was used on early computers: a manual approach in which computer owners configure both the hardware and software. For example, some devices are manufactured with physical switches on the circuit board, and the switches are used to enter an interrupt vector address. Of course, the operating system must be configured to match the values chosen for devices.

Automated Assignment. Automated interrupt vector assignment is the most widely used approach because it eliminates manual configuration and allows devices to be in stalled without requiring the hardware to be modified. When the computer boots, the processor uses the bus to determine which devices are attached. The processor assigns an interrupt vector number to each device, places a copy of the appropriate device handler software in memory, and builds the interrupt vector in memory. Of course, automated assignment means higher delay when booting the computer.

17. Dynamic Bus Connections and Pluggable Devices

Our description of buses and interrupt configuration has assumed that devices are attached to a bus while a computer is powered down, that interrupt vectors are assigned at startup, and that all devices remain in place as the computer operates. Early buses were indeed designed as we have described. However, more recent buses have been in vented that permit devices to be connected and disconnected while the computer is running. We say that such buses support pluggable devices. For example, a Universal Serial Bus (USB) permits a user to plug in a device at any time.

How does a USB operate? In essence, a USB appears as a single device on the computer's main bus. When the computer boots, the USB is assigned an interrupt vector as usual, and a handler is placed in memory. Later, when a user plugs in a new de vice, the USB hardware generates an interrupt, and the processor executes the handler.

The handler, in turn, sends a request over the USB bus to interrogate devices and deter mine which device has been attached. Once it identifies the device, the USB handler loads a secondary device-specific handler. When a device needs service, the device re quests an interrupt. The USB handler receives the interrupt, determines which device interrupted, and passes control to the device-specific handler.

18. Interrupts, Performance, and Smart Devices

Why did the interrupt mechanism cause a revolution in computer architecture? The answer is easy. First, I/O is an important aspect of computing that must be optimized. Second, interrupt-driven I/O automatically overlaps computation and I/O without requiring a programmer to take any special action. That is, interrupts adapt to any speed processor and I/O devices automatically. Because a programmer does not need to estimate how many instructions can be performed during an I/O operation, interrupts never underestimate or overestimate. We can summarize:

A computer that uses interrupts is both easier to program and offers better overall performance than a computer that uses polling. In addition, interrupts allow any speed processor to adapt to any speed I/O devices automatically.

Interestingly, once the basic interrupt mechanism had been invented, architects realized that further improvements are possible. To understand the improvements, consider a disk device. The underlying hardware requires several steps to read data from the disk and place it in memory. FIG. 7 summarizes the steps.

-- If disk is not spinning, bring it to full speed

-- Compute the cylinder that contains the requested block and move the disk arm to the cylinder

-- Wait for the disk to rotate to the correct sector

-- Read bytes of data from a block on the disk and place them in a hardware FIFO

-- Transfer bytes of data from the FIFO into memory


FIG. 7 Example of the steps required to read a block from a disk device.

Early hardware required the processor to handle each step by starting the operation and waiting for an interrupt. For example, the processor had to verify that the disk was spinning. If the disk was idle, the processor had to issue a command that started the motor and wait for an interrupt.

The key insight is that the more digital logic an I/O device contains, the less the device needs to rely on the processor. Informally, architects use the term dumb device to refer to a device that requires a processor to handle each step and the term smart de vice to characterize a device that can perform a series of steps on its own. A smart version of a disk device contains sufficient logic (perhaps even an embedded processor) to handle all the steps involved in reading a block. Thus, a smart device does not interrupt as often, and does not require the processor to handle each step. FIG. 8 lists an ex ample interaction between a processor and a smart disk device.

-- The processor uses the bus to send the disk a location in memory and request a read operation

-- Disk device performs all steps required, including moving bytes into memory, and interrupts only after the operation completes


FIG. 8 The interaction between a processor and a smart disk device when reading a disk block.

Our discussion of device interaction has omitted many details. For example, most I/O devices detect and report errors (e.g., a disk does not spin or a flaw on a surface prevents the hardware from reading a disk block). Thus, interrupt processing is more complex than described: when an interrupt occurs, the processor must interrogate the CSRs associated with the disk to determine whether the operation was successful or an error occurred. Furthermore, for devices that report soft errors (i.e., temporary errors), the processor must retry the operation to determine whether an error was temporary or permanent.

19. Direct Memory Access (DMA)

The discussion above implies that a smart I/O device can transfer data into memory without using the CPU. Indeed, such transfers are not only possible but key to high-speed I/O. The technology that allows an I/O device to interact with memory is known as Direct Memory Access (DMA).

To understand DMA, recall that in most architectures, both memory and I/O de vices attach to a central bus. Thus, there is a direct path between an I/O device and memory. If we imagine that a smart I/O device contains an embedded processor, the idea behind DMA should be clear: the embedded processor in the I/O device issues fetch or store requests to which the memory responds. Of course, the bus design must make it possible for multiple processors (the main processor and an embedded processor in each smart device) to take turns sharing the bus and prevent them from sending multiple requests simultaneously. If the bus supports such a mechanism, an I/O device can transfer data between the memory and the device without using the processor.

To summarize:

A technology known as Direct Memory Access (DMA) allows a smart I/O device to access memory directly. DMA improves performance by allowing a device to transfer data between the device and memory without using the processor.

20. Extending DMA with Buffer Chaining

It may seem that a smart device using DMA is sufficient to guarantee high performance: data can be transferred between the device and memory without using the processor, and the device does not interrupt for each step of the operation. However, an optimization has been discovered that further improves performance.

To understand how DMA can be improved, consider a high-speed network. Packets tend to arrive from the network in bursts, which means a set of packets arrives back-to-back with minimum time between successive packets. If the network interface device uses DMA, the device will interrupt the processor after accepting an incoming packet and placing the packet in memory. The processor must then specify the location of a buffer for the next packet and restart the device. The sequence of events must occur quickly (i.e., before the next packet arrives). Unfortunately, other devices on the system may also be generating interrupts, which means the processor may be delayed slightly. For the highest-speed networks, a processor may not be able to service an interrupt in time to capture the next packet.

To solve the problem of back-to-back arrivals, some smart I/O devices use a technique known as buffer chaining. The processor allocates multiple buffers, and creates a linked list in memory. The processor then passes the list to the I/O device, and allows the device to fill each buffer. Because a smart device can use the bus to read values from memory, the device can follow the linked list and place incoming packets in successive buffers. FIG. 9 illustrates the concept†.


FIG. 9 Illustration of buffer chaining. A processor passes a list of buffers to a smart I/O device, and the device fills each buffer on the list without waiting for the processor.

The network example given above describes the use of buffer chaining for high speed input. A buffer chain can also be used with output: a processor places data in a set of buffers, links the buffers on a list, passes the address of the linked list to a smart I/O device, and starts the device. The device moves through the list, taking the data from each buffer in memory and sending the data to the device.

21. Scatter Read and Gather Write Operations

Buffer chaining is especially helpful for computer systems in which the buffer size used by software is smaller than the size of a data block used by an I/O device. On in put, chained buffers allow a device to divide a large data transfer into a set of smaller buffers. On output, chained buffers allow a device to extract data from a set of small buffers and combine the data into a single block. For example, some operating systems create a network packet by placing the packet header in one buffer and the packet pay load in another buffer. Buffer chaining allows the operating system to send the packet without the overhead of copying all the bytes into a single, large buffer.

We use the term scatter read to capture the idea of dividing a large block of in coming data into multiple small buffers, and the term gather write to capture the idea of combining data from multiple small buffers into a single output block. Of course, to make buffer chaining useful, a linked list of output buffers must specify the size of each buffer (i.e., the number of bytes to write). Similarly, a linked list of input buffers must include a length field that the device can set to specify how many bytes were deposited in the buffer.

†Although the figure shows three buffers, network devices typically use a chain of 32 or 64 buffers.

22. Operation Chaining

Although buffer chaining handles situations in which a given operation is repeated over many buffers, further optimization is possible in cases where a device can perform multiple operations. To understand, consider a disk device that offers read and write operations on individual blocks. To optimize performance, we need to start another operation as soon as the current operation completes. Unfortunately, the operations are a mixture of reads and writes.

The technology used to start a new operation without delay is known as operation chaining. Like buffer chaining, a processor that uses operation chaining must create a linked list in memory, and must pass the list to a smart device. Unlike buffer chaining, however, nodes on the linked list specify a complete operation: in addition to a buffer pointer, the node contains an operation and necessary parameters. For example, a node on the list used with a disk might specify a read operation and a disk block. FIG. 10 illustrates operation chaining.


FIG. 10 Illustration of operation chaining for a smart disk device. Each node specifies an operation (R or W), a disk block number, and a buffer in memory.

23. Summary

Two paradigms can be used to handle I/O devices: programmed I/O and interrupt-driven I/O. Programmed I/O requires a processor to handle each step of an operation by polling the device. Because a processor is much faster than an I/O device, the processor spends many cycles waiting for the device.

Third-generation computers introduced interrupt-driven I/O that allows a device to perform a complete operation before informing the processor. A processor that uses interrupts includes extra hardware that tests once during each execution of a fetch-execute cycle to see whether any device has requested an interrupt.

Interrupts are vectored, which means the interrupting device supplies a unique integer that the processor uses as an index into an array of pointers to handlers. To guarantee that interrupts do not affect a running program, the hardware saves and re stores state information during an interrupt. Multilevel interrupts are used to give some devices priority over others.

Smart I/O devices contain additional logic that allows them to perform a series of steps without assistance from the processor. Smart devices use the techniques of buffer chaining and operation chaining to further optimize performance.

EXERCISES

1. Assume a RISC processor takes two microseconds to execute each instruction and an I/O device can wait at most 1 millisecond before its interrupt is serviced. What is the maximum number of instructions that can be executed with interrupts disabled?

2. List and explain the two I/O paradigms.

3. Expand the acronym CSR and explain what it means.

4. A software engineer is trying to debug a device driver, and discovers what appears to be an infinite loop:

w while e( (**ccssrrppttrr-->>ttssttb busy y! != =0 0) )

; ;/ /* *d do on nothing**/ /

When the software engineer shows you the code, how do you respond?

5. Read about devices on a bus and the interrupt priorities assigned to each. Does a disk or mouse have higher priority? Why?

6. In most systems, part or all of the device driver code must be written in assembly language. Why?

7. Conceptually, what data structure is an interrupt vector, and what does one find in each entry of an interrupt vector?

8. What is the most significant advantage of a device that uses chained operations?

9. What is the chief advantage of interrupts over polling?

10. Suppose a user installs ten devices that all perform DMA into a single computer and at tempts to operate the devices simultaneously. What components in the computer might become a bottleneck?

11. If a smart disk device uses DMA and blocks on the disk each contain 512 bytes, how many times will the disk interrupt when the processor transfers 2048 bytes (four separate blocks)?

12. When a device uses chaining, what is the type of the data structure that a device driver places in memory to give a set of commands to the device?

PREV. | NEXT

Related Articles -- Top of Page -- Home

Updated: Wednesday, April 26, 2017 21:23 PST