Computer Architecture: CPUs -- A Programmer's View of Devices, I/O, and Buffering



Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software

Input Devices
| Data Loggers + Recorders | Books | Links + Resources


AMAZON multi-meters discounts AMAZON oscilloscope discounts


1. Introduction

Previous sections cover the hardware aspects of I/O. They explain the bus architecture that is used to interconnect devices, processors, and memory, as well as the interrupt mechanism that an external device uses to inform a processor when an operation completes.

This section changes the focus to software, and considers I/O from a programmer's perspective. The section examines both the software needed to control a device and the application software that uses I/O facilities. We will understand the important concept of a device driver, and see how a driver implements operations like read and write. We will learn that devices can be divided into two broad types: byte oriented and block-oriented, and we will understand the interaction used with each.

Although few programmers write device drivers, understanding how a device driver operates and how low-level I/O occurs can help programmers write more efficient applications. Once we have looked at the mechanics of device drivers, we will focus on the concept of buffering, and see why it is essential for programmers to use buffering.

2. Definition of a Device Driver

The previous section explains the basic hardware interrupt mechanism. We are now ready to consider how low-level software uses the interrupt mechanism to perform I/O operations. We use the term device driver to refer to software that provides an interface between an application program and an external hardware device. In most cases, a computer system has a device driver for each external device, and all applications that access a given device use the same driver. Typically, device drivers are part of the computer's operating system, which means any application running on the computer uses a device driver when communicating with a device.

Because a device driver understands the details of a particular hardware device, we say that a driver contains low-level code. The driver interacts with the device over a bus, understands the device's Control And Status Registers (CSRs), and handles interrupts from the device.

3. Device Independence, Encapsulation, and Hiding

The primary purpose of a device driver is device independence. That is, the device driver approach removes all hardware details from application programs and relegates them to a driver.

To understand why device independence is important, we need to know how early software was built. Each application program was designed for a specific brand of computer, a specific memory size, and a specific set of I/O devices. An application contained all the code needed to use the bus to communicate with particular devices.

Unfortunately, a program written to use a specific set of devices could not be used with any other devices. For example, upgrading a printer to a newer model required all application programs to be rewritten.

A device driver solves the problem by providing a device-independent interface to applications. For example, because all applications that use a printer rely on the printer's device driver, an application does not have detailed knowledge of the hardware built in. Consequently, changing a printer only requires changing the device driver; all applications remain unchanged. We say that the device driver hides hardware details from applications or that the device driver encapsulates the hardware details.

To summarize:

A device driver consists of software that understands and handles all the low-level details of communication with a particular device. Be cause the device driver provides a high-level interface to applications, an application program does not need to change if a device changes.

4. Conceptual Parts of a Device Driver

A device driver contains multiple functions that all must work together, including code to communicate over a bus, code to handle device details, and code to interact with an application. Furthermore, a device driver must interact with the computer's operating system. To help manage complexity, programmers think of a device driver as partitioned into three parts:

-- A lower half comprised of a handler that is invoked when an interrupt occurs

-- An upper half comprised of functions that are invoked by applications to request I /O operations

-- A set of shared variables that hold state information used to coordinate the two halves

The names upper half and lower half reflect the view of a programmer who writes device drivers: hardware is low level and application programs are high level. Thus, a programmer thinks of applications at the top of a hierarchy and hardware at the bottom.

FIG. 1 illustrates a programmer's view.


FIG. 1 The conceptual division of a device driver into three parts. A device driver provides the interface between applications that operate at a high level and the device hardware that operates at a low level.

5. Two Categories of Devices

Before we can understand more about device drivers, we need to know more about the interface the hardware presents to the driver. Devices can be divided into two broad categories, depending on the style of interface the device uses:

-- Character-oriented devices

-- Block-oriented devices

A character-oriented device transfers a single byte of data at a time. For example, the serial interface used to connect a keyboard to a computer transfers one character (i.e., byte) for each keystroke. From a device driver's point of view, a character oriented device generates an interrupt each time a character is sent or received -- sending or receiving a block of N characters generates N interrupts.

A block-oriented device transfers an entire block of data at a time. In some cases, the underlying hardware specifies a block size, B, and all blocks must contain exactly B bytes. For example, a disk device defines a block size equal to the disk's sector size.

In other cases, however, blocks are of variable size. For example, a network interface defines a block to be as large as a packet (although it places an upper-bound on packet size, packet switching hardware allows packet sizes to vary from one packet to the next). From a device driver's point of view, a block-oriented device only generates one interrupt each time a block is sent or received.

6. Example Flow Through a Device Driver

The details of programming device drivers are beyond the scope of this text. How ever, to help us understand the concept, we will consider how a device driver might handle basic output. For our example, we will assume that an application sends data over the Internet. The application specifies data to be sent, and the protocol software creates a packet and transfers the packet to the device driver for the network device.

FIG. 2 illustrates the modules involved in a packet transfer, and lists the steps that are taken for output.

As the figure shows, even a straightforward operation requires a complex sequence of steps. When an application sends data, the application process enters the operating system and control passes to protocol software that creates a packet. The protocol software, in turn, passes the outgoing packet to the upper half of the appropriate device driver. The device driver places the packet in the shared variables section, starts the de vice performing packet transmission, and returns to the protocol software which returns to the application process.

Although control has returned from the operating system, the outgoing packet remains in the shared variables data area where the device can use DMA to access it.

Once the device completes sending the packet, the device interrupts and control passes to the lower half. The lower half then removes the packet from the shared area.

----------------------

Steps Taken

1. The application sends data over the Internet

2. Protocol software passes a packet to the driver

3. The driver stores the outgoing packet in the shared variables

4. The upper half specifies the packet location and starts the device

5. The upper half returns to the protocol module

6. The protocol software returns to the application

7. The device interrupts and the lower half of the driver executes

8. The lower half removes the copy of the packet from the shared variables


FIG. 2 A simplified example of the steps that occur when an application requests an output operation. A device driver located in the operating system handles all communication with the device.

----------------------

7. Queued Output Operations

Although the design used in our example driver is feasible, the approach is too inefficient to use in a production system. In particular, if our application sends another packet before the device has finished sending the first one, the device driver must poll until the device finishes using the packet. To avoid waiting, device drivers used in production systems implement a queue of requests. On output, the upper half does not wait for the device to be ready. Instead, the upper half deposits the data to be written in a queue, ensures that the device will generate an interrupt, and returns to the application. Later, when the device finishes its current operation and generates an interrupt, the lower half extracts the next request from the queue, starts the device, and returns from the interrupt. FIG. 3 illustrates the conceptual organization.


FIG. 3 The conceptual organization of a device driver that uses a queue of requests. On output, the upper half deposits items in the re quest queue without waiting for the device, and the lower half controls the device.

A device driver that uses an output queue is elegant - the queue of requests pro vides coordination between the upper and lower halves of the driver. FIG. 4 lists the steps that each half of a device driver takes for output.

Initialization (computer system starts):

1. Initialize input queue to empty

Upper half (application performs write):

1. Deposit data item in queue

2. Use the CSR to request an interrupt

3. Return to application

Lower half (interrupt occurs):

1. If the queue is empty, stop the device from interrupting

2. If the queue is nonempty, extract an item and start output

3. Return from interrupt


FIG. 4. The steps that the upper and lower halves of a device driver take for an output operation when queueing is used. The upper half forces an interrupt, but does not start output on the device.

As the figure indicates, the steps for each half of the device driver are straightforward. Notice that the lower half performs most of the work: in addition to handling interrupts from the device, the lower half checks the queue and, if the queue is not empty, extracts the next item and starts the device. Because the device interrupts each time it completes an operation, the lower half will be invoked once per output operation, which allows it to start the next operation. Thus, the lower half will continue to be invoked until the queue is empty.

What happens after the last item has been removed from the queue? The lower half will be invoked after the last output operation completes, but will find the queue empty. At that point, the device will be idle. To prevent useless interrupts, the lower half controls the device to stop all interrupts. Later, when an application calls the upper half to place a new item in the queue, the upper half will start the device interrupting again, and output will proceed.

8. Forcing a Device to Interrupt

Because a request queue is used in so many device drivers, engineers have designed hardware that works well with the programming paradigm outlined in FIG. 4. In particular, a device often includes a CSR bit that a processor can set to force the device to interrupt. Recall from Section 16 that the code required to set a CSR bit is trivial -- it consists of a single assignment statement. Software does not need to check the current device status. Instead, the mechanism is designed so that setting the bit has no effect if the device is already active:

-- A device has a CSR bit, B, that is used to force the device to interrupt

-- If the device is idle, setting bit B causes the device to generate an interrupt

-- If the device is currently performing an operation, setting bit B has no effect

In other words, if an interrupt is already destined to occur when the current operation completes, the device waits for the operation to complete and generates an interrupt as usual; if no operation is in progress, the device generates an interrupt immediately.

The concept - arranging the hardware so that setting a CSR bit will not affect a busy device until the operation completes -- greatly simplifies programming. To see why, look at the steps FIG. 4 lists. The upper half does not need to test whether the de vice is busy (i.e., whether an operation is in progress). Instead, the upper half always sets the CSR bit. If an operation is already in progress, the device hardware ignores the bit being set, and waits until the operation completes. If the device is idle, setting the bit causes the device to interrupt immediately, which forces the lower half to select the next request in the queue and start the device.

9. Queued Input Operations

A device driver can also use queueing for input. However, additional coordination is required for two reasons. First, a device driver is configured to accept input before an application is ready to read the input (e.g., in case a user types ahead). Therefore, an input queue must be created when the device is initialized. Second, if input does not arrive before an application reads, the device driver must temporarily block the application until input does arrive. FIG. 5 lists the steps a device driver uses to handle in put when a queue is present.

Initialization (computer system starts):

1. Initialize input queue to empty

2. Force the device to interrupt

Upper half (application performs read):

1. If input queue is empty, temporarily stop the application

2. Extract the next item from the input queue

3. Return the item to the application

Lower half (interrupt occurs):

1. If the queue is not full, start another input operation

2. If an application is stopped, allow the application to run

3. Return from interrupt


FIG. 5 The steps that the upper and lower halves of a device driver take for an input operation when queueing is used. The upper half temporarily stops an application until data becomes available.

Although our description of device drivers omits many details, it gives an accurate picture of the general approach that device drivers use. We can summarize:

A production device driver uses input and output queues to store items. The upper half places a request in the queue, and the lower half handles the details of communication with a device.

10. Asynchronous Device Drivers and Mutual Exclusion

In Section 16, we said that an interrupt mechanism implies an asynchronous programming model. We can now understand why. Like a conventional program, polling is synchronous because control passes through the code from beginning to end. A de vice driver that handles interrupts is asynchronous because the programmer writes separate pieces of code that respond to events. One of the upper-half routines is invoked when an application requests I/O. A lower-half routine is invoked when an I/O operation occurs or when an interrupt occurs, and an initialization routine is invoked when a device is started.

Asynchronous programming is more challenging than synchronous programming.

Because events can occur in any order, a programmer must use shared variables to en code the current state of the computation (i.e., the events that have occurred in the past and their effect). It can be difficult to test asynchronous programs because a programmer cannot easily control the sequence of events. More important, applications running on the processor and device hardware can generate events simultaneously. Simultaneous events make programming asynchronous device drivers especially difficult. For ex ample, consider a smart device that uses command chaining. The processor creates a linked list of operations in memory, and the device follows the list and performs the operations automatically.

A programmer must coordinate the interaction between a processor and a smart de vice. To understand why, imagine a smart device extracting items from a list at the same time the upper half of a driver is adding items. A problem can occur if the smart device reaches the end of the list and stops processing just before the device driver adds a new item. Similarly, if two independent pieces of hardware attempt to manipulate pointers in the list simultaneously, links can become invalid.

To avoid errors caused by simultaneous access, a device driver that interacts with a smart device must implement mutual exclusion. That is, a device driver must ensure that the smart device will not access the list until changes have been completed, and the smart device must ensure that the device driver will not access the list until changes have been completed. A variety of schemes are used to ensure exclusive access. For example, some devices have special CSR values that the processor can set to temporarily stop the device from accessing the command list. Other systems have a facility that allows the processor to temporarily restrict use of the bus (if it cannot use the bus, a smart device cannot make changes to a list in memory). Finally, some processors offer test-and-set instructions that can be used to provide mutual exclusion.

11. I/O as Viewed by an Application

The sections above describe how a device driver is programmed. We said earlier that few programmers write device drivers. Thus, the details of CSR addresses, interrupt vectors, and request queues remain hidden from a typical programmer. The motivation for considering device drivers and low-level I/O is background: it helps us understand how to create applications that use low-level services efficiently.

Because they tend to use high-level languages, few programmers invoke low-level I/O facilities directly -- to express I/O operations, the programmer uses abstractions that the programming language offers. For example, application programs seldom use a disk device. Instead, the programming language or the underlying system presents a programmer with a high-level file abstraction. Similarly, instead of exposing a programmer to display hardware, most systems present the programmer with a window abstraction.

The point is:

In many programming systems, I/O is hidden from the programmer.

Instead of manipulating hardware devices, such as disks and display screens, a programmer only uses abstractions such as files and windows.

Even in embedded systems that do allow application programmers to control I/O devices, the software is usually designed to hide as many details as possible from the programmer. In particular, an application can only specify generic, high-level I/O operations. When a compiler translates the program into a binary form for use on a specific computer, the compiler maps each high-level I/O operation into a sequence of low-level steps.

Interestingly, a typical compiler does not translate each I/O operation directly into a sequence of basic machine instructions. Instead, the compiler generates code that invokes library functions to perform I/O operations. Therefore, before it can be executed, the program must be combined with the appropriate library functions.

We use the term run-time library to refer to the set of library functions that accompany a compiled program. Of course, the compiler and run-time library must be designed to work together - the compiler must know which functions are available, the exact arguments used by each function, and the meaning of the function.

Application programmers seldom interact with device drivers directly.

Instead, they rely on a run-time library to act as an intermediary.

The chief advantage of using a run-time library as an intermediary arises from the flexibility and ease of change. Only the run-time library functions understand how to use the underlying I/O mechanisms (i.e., the device drivers). If the I/O hardware and/or the device drivers change, only the run-time library needs to be updated - the compiler can remain unchanged. In fact, separating a run-time library from a compiler allows code to be compiled once and then combined with various run-time libraries to produce images for more than one version of an operating system.

12. The Library/Operating System Dichotomy

We know that a device driver resides in the operating system and the run-time library functions that an application uses to perform I/O reside outside the operating sys tem (because they are linked with the application). Conceptually, we imagine three layers of software on top of the device hardware as FIG. 6 illustrates.


FIG. 6 The conceptual arrangement of application code, run-time library code, and a device driver with interfaces labeled.

Several questions arise. What services does each layer of software provide? What is the interface between an application and the run-time library, or the interface between the run-time library and the operating system? What are the relative costs of using the two interfaces?

13. I/O Operations that the OS Supports

We begin by examining the interface between the run-time library and the operating system. In a low-level programming language such as C, the operating system interface is directly available to applications. Thus, a programmer can choose to use an I/O library or make operating system calls directly†.

Although the exact details of I/O operations depend on the operating system, a general approach has become popular. Known as the open/read/write/close paradigm, the approach offers six basic functions. FIG. 7 lists the functions with the names used by the Unix operating system.

Operation | Meaning

open Prepare a device for use (e.g., power up) read Transfer data from the device to the application write Transfer data from the application to the device close Terminate use of the device seek Move to a new location of data on the device ioctl Miscellaneous control functions (e.g., change volume)


FIG. 7 Six basic I/O functions that comprise the open/read/write/close paradigm. The names are taken from the Unix operating system.

†A later section discusses the standard I/O library used with C.

As an example, consider a device that can read or write a Digital Video Disk (DVD). The open function can be used to start the drive motor and ensure that a disc has been inserted. Once the drive has been started, the read function can be used to read data from the disc, and the write function can be used to write data onto the disc.

The seek function can be used to move to a new position (e.g., a specific video segment), and the close function can be used to power down the disc. Finally, the ioctl function (an abbreviation of I/O control) can be used for all other functions (e.g., the eject function).

Of course, each of the operations takes arguments that specify details. For example, a write operation needs arguments that specify the device to use, the location of data, and the amount of data to write. More important, the device driver must under stand how to map each operation and arguments to operations on the underlying device.

For example, when the driver receives a control operation, such as an eject, the driver must know how to implement the operation with the device hardware (e.g., how to as sign values to the device's CSR registers).

14. The Cost of I/O Operations

When an application program invokes a function in the run-time library, the cost is exactly the same as calling a function because a copy of the code for the library function is incorporated into the application when the program is built. Thus, the cost of invoking library functions is relatively low.

When an application program or a run-time library function invokes an I/O operation such as read or write, however, control must pass through a system call† to the appropriate device driver in the operating system. Unfortunately, invoking an operating system function through a system call incurs extremely high overhead. There are three reasons. First, the processor must change privilege mode because the operating system runs with greater privilege than an application. Second, the processor must change the address space from the application's virtual address space to the operating system's ad dress space. Third, the processor must copy data between the application's address space and the operating system's address space.

We can summarize:

The overhead involved in using a system call to communicate with a device driver is high; a system call is much more expensive than a conventional function call, such as the call used to invoke a library function.

More important, much of the system call overhead is associated with making the call rather than the work performed by the driver. Therefore, to optimize performance, programmers seek ways to minimize the number of system calls.

†Some computer architectures use the term trap in place of system call.

15. Reducing System Call Overhead

To understand how we can reduce the overhead of system calls, consider a worst case example. Suppose an application needs to print a document, and suppose printing requires the application to send a total of N bytes of data to the printer. The highest cost occurs if the application makes a separate system call to transfer each byte of data because the application will make a total of N system calls. As an alternative, if the application generates a complete line of text and then makes a system call to transfer the entire line, the overhead is reduced from N system calls to L system calls, where L is the number of lines in the document (i.e., L <N).

Can we further reduce the overhead of printing a document? Yes, we can. The application can be redesigned to allocate enough memory to hold an entire page of the document, generate the page, and then make one system call to transfer the entire page to the device driver. The result is an application that only makes P system calls, where P is the number of pages in the document (presumably P <<N).

A general principle can be stated:

To reduce overhead and optimize I/O performance, a programmer must reduce the number of system calls that an application invokes.

The key to reducing system calls involves transferring more data per system call.

Of course, it is not always possible to reduce the number of system calls used for I/O. For example, an application like a text editor or email composer displays characters as the user enters them. The application cannot wait until the user enters an entire line of text or an entire page because each character must appear on the screen immediately. Similarly, input from a keyboard often requires a program to accept one character at a time without waiting for a user to enter an entire line or page. Fortunately, such applications often involve user interaction in which I/O is relatively slow, so optimization is unimportant.

16. The Key Concept of Buffering

The above discussion shows that an application programmer can optimize I/O performance by rewriting code in such a way that the number of systems calls is lower.

The optimization is so important for high-speed I/O that it has been incorporated into most computer software. Instead of requiring a programmer to rewrite code, I/O run time libraries have been designed to handle the optimization automatically.

We use the term buffering to describe the concept of accumulating data before an I/O transfer, and the term buffer to refer to the area of memory in which the data is placed.

The buffering principle: to reduce the number of system calls, accumulate data in a buffer, and transfer a large amount of data each time a system call is made.

To automate buffering, library routines need a scheme that works for any application. Thus, instead of lines or pages, library functions use a fixed-size buffer. To take advantage of buffering, an application must call library functions instead of the operating system. In the case of a programming language that contains built-in I/O facilities, the run-time library implements buffering, and the compiler generates code that invokes the appropriate library routines; in the case of a programming language that does not have built-in I/O facilities, the programmer must call buffering library routines instead of system calls.

Library routines that implement buffering usually provide the five conceptual operations that FIG. 8 lists.

Operation -- Meaning

setup -Initialize the buffer

input - Perform an input operation

output - Perform an output operation

terminate - Discontinue use of the buffer

flush - Force contents of buffer to be written


FIG. 8 The conceptual operations provided by a typical library that offers buffered I/O.

The operations listed in the figure are analogous to those that an operating system offers as an interface to a device. In fact, we will see that at least one implementation of a buffered I/O library uses function names that are variants of open, read, write, and close. FIG. 8 uses alternate terminology to help clarify the distinction.

17. Implementation of Buffered Output

To understand how buffering works, consider how an application uses the buffered output functions in FIG. 8. When it begins, the application calls a setup function to initialize buffering. Some implementations provide an argument that allows the application to specify a buffer size; in other implementations, the buffer size is a constant†. In any case, we will assume setup allocates a buffer, and initializes the buffer to empty. Once the buffer has been initialized, the application can call the output function to transfer data. On each call, the application supplies one or more bytes of data. Finally, when it finishes transferring data, the application calls the terminate function.

(Note: a later section describes the use of function flush).

†Typical buffer sizes range from 8 Kbytes to 128 Kbytes, depending on the computer system.

The amount of code required to implement buffered output is trivial. FIG. 9 describes the steps used to implement each output function. In a language such as C, each step can be implemented with one or two lines of code.

The motivation for a terminate function should now be clear: because output is buffered, the buffer may be partially full when the application finishes. Therefore, the application must force the remaining contents of the buffer to be written.

Setup(N):

1. Allocate a buffer of N bytes.

2. Create a global pointer, p, and initialize p to the ad dress of the first byte of the buffer.

Output(D):

1. Place data byte D in the buffer at the position given by pointer p, and move p to the next byte.

2. If the buffer is full, make a system call to write the contents of the entire buffer, and reset pointer p to the start of the buffer.

Terminate:

1. If the buffer is not empty, make a system call to write the contents of the buffer prior to pointer p.

2. If the buffer was allocated dynamically, de-allocate it.


FIG. 9 The steps taken to achieve buffered output.

18. Flushing a Buffer

It may seem that output buffering cannot be used with some applications. For ex ample, consider an application that allows two users to communicate over a computer network. When it emits a message, an application assumes the message will be transmitted and delivered to the other end. Unfortunately, if buffering is used, the message may wait in the buffer unsent.

Of course, a programmer can rewrite an application to buffer data internally and make system calls directly. However, designers of general-purpose buffering libraries have devised a way to permit applications that use buffered I/O to specify when a sys tem call is needed. The mechanism consists of the flush function that an application can call to force data to be sent even if the buffer is not full. Programmers use the phrase flushing a buffer to describe the process of forcing output of a partially full buffer. If a buffer is empty when an application calls flush, the call has no effect. If the buffer contains data, however, the flush function makes a system call to write the data, and then resets the global pointer to indicate that the buffer is empty. FIG. 10 lists the steps of a flush operation.

Flush

1. If the buffer is currently empty, return to the caller without taking any action.

2. If the buffer is not currently empty, make a system call to write the contents of the buffer and set the global pointer p to the address of the first byte of the buffer.


FIG. 10 The steps required to implement a flush function in a buffered I/O library. Flush allows an application to force data to be written before the buffer is full.

Look back at the implementation of the terminate function given in FIG. 9. If the library offers a flush function, the first step of terminate can be replaced by a call to the flush function.

To summarize:

A programmer uses a flush function to specify that outgoing data in a buffer should be sent even if the buffer is not full. A flush operation has no effect if a buffer is currently empty.

19. Buffering on Input

The descriptions above explain how buffering can be used with output. In many cases, buffering can also be used to reduce the overhead on input. To understand how, consider reading data sequentially. If an application reads N bytes of data, one byte at a time, the application will make N system calls.

Assuming the underlying device allows transfer of more than one byte of data, buffering can be used to reduce the number of system calls. The application (or the run-time library) allocates a large buffer, makes one system call to fill the buffer, and then satisfies subsequent requests from the buffer. FIG. 11 lists the steps required.

As with output buffering, the implementation is straightforward. In a language such as C, each step can be implemented with a trivial amount of code.

20. Effectiveness of Buffering

Why is buffering so important? Because even a small buffer can have a large effect on I/O performance. To see why, observe that when buffered I/O is used, a system call is only needed once per buffer†. As a result, a buffer of N bytes reduces the number of system calls by a factor of N. Thus, if an application makes S system calls, a buffer of only 8 K bytes reduces the number of system calls to S / 8192.

†Our analysis ignores situations in which an application calls flush frequently.

Setup(N):

1. Allocate a buffer of N bytes.

2.Create a global pointer, p, and initialize p to indicate that the buffer is empty.

Input(N):

1. If the buffer is empty, make a system call to fill the entire buffer, and set pointer p to the start of the buffer.

2. Extract a byte, D, from the position in the buffer given by pointer p, move p to the next byte, and return D to the caller.

Terminate:

1. If the buffer was dynamically allocated, deallocate it.

FIG. 11 The steps required to achieve buffered input.

Buffering is not limited to run-time libraries. The technique is so important that device drivers often implement buffering. For example, in some disk drivers, the driver maintains a copy of the disk block in memory, and allows an application to read or write data from the block. Of course, buffering in an operating system does not eliminate system calls. However, such buffering does improve performance because external data transfers are slower than system calls. The important point is that buffering can be used to reduce I/O overhead whenever a less expensive operation can be substituted for an expensive operation.

We can summarize the importance of buffering:

Using a buffer of N bytes can reduce the number of calls to the under lying system by a factor of N. A large buffer can mean the difference between an I/O mechanism that is fast and one that is intolerably slow.

21. Relationship to Caching

Buffering is closely related to the concept of caching that is described in Section 12. The chief difference arises from the way items are accessed: a cache system is optimized to accommodate random access, and a buffering system is optimized for sequential access.

In essence, a cache stores items that have been referenced, and a buffer stores items that will be referenced (assuming sequential references). Thus, in a virtual memory system, a cache stores entire pages of memory -- when any byte on the page is referenced, the entire page is placed in the cache. In contrast, a buffer stores sequential bytes. Thus, when a byte is referenced, a buffering system preloads the next bytes - if the referenced byte lies at the end of a page, the buffering system preloads bytes from the next page.

22. An Example: The C Standard I/O Library

One of the best-known examples of a buffering I/O library was created for the C programming language and the Unix operating system. Known as the standard I/O library (stdio), the library supports both input and output buffering. FIG. 12 lists a few of the functions found in the Unix standard I/O library along with their purpose.

Function - Meaning

fopen - Set up a buffer

fgetc - Buffered input of one byte

fread - Buffered input of multiple bytes

fwrite - Buffered output of multiple bytes

fprintf - Buffered output of formatted data

fflush - Flush operation for buffered output

fclose - Terminate use of a buffer

FIG. 12 Examples of functions included in the standard I/O library used with the Unix operating system. The library includes additional functions not listed here.

23. Summary

Two aspects of I/O are pertinent to programmers. A systems programmer who writes device driver code must understand the low-level details of the device, and an application programmer who uses I/O facilities must understand the relative costs.

A device driver is divided into three parts: an upper half that interacts with application programs, a lower half that interacts with the device itself, and a set of shared variables. A function in the upper half receives control when an application reads or writes data; the lower half receives control when the device generates an input or output interrupt.

The fundamental technique programmers use to optimize sequential I/O performance is known as buffering. Buffering can be used for both input and output, and is often implemented in a run-time library. Because it gives an application control over when data is transferred, a flush operation allows buffering to be used with arbitrary applications.

Buffering reduces system call overhead by transferring more data per system call.

Buffering provides significant performance improvement because a buffer of N bytes reduces the number of system calls that an application makes by a factor of N.

EXERCISES

1. What does a device driver provide, and how do device drivers make it easier to write applications?

2. Name the three conceptual parts of a device driver and state how each is used.

3. Explain the use of an output queue in a device driver by describing how and when items are inserted in the queue, as well as how and when they are removed.

4. A user invokes an app that writes a file. The app displays a progress bar that shows how much of the file has been written. Just as the progress bar reaches 50%, the battery fails and the device crashes. When the user reboots the device, he or she discovers that less than 20% of the file has actually been written. Explain why the app reported writing 50%.

5. When a program calls fputc, what does the program invoke?

6. What is a flush operation, and why is it needed?

7. To increase the performance of an app, a programmer rewrites the app so that instead of reading one byte at a time, the app reads eight thousand bytes and then processes them.

What technique is the programmer using?

8. Compare the time needed to copy a large file using write and fwrite.

9 The standard I/O function fseek allows random access. Measure the difference in the time required to use fseek within a small region of a file and within a large region.

10. Build an output buffering routine, bufputc, that accepts as an argument a character to be printed. On each call to bufputc, store the character in a buffer, and call write once for the entire buffer. Compare the performance of your buffered routine to a program that uses write for each character.

PREV. | NEXT

Related Articles -- Top of Page -- Home

Updated: Wednesday, April 26, 2017 21:40 PST