Computer Architecture: CPUs -- Physical Memory and Physical Addressing [part 1]

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

The previous section introduces the topic of memory, lists characteristics of memory systems, and explains the concept of a memory hierarchy. This section ex plains how a basic memory system operates. The section considers both the underlying technologies used to construct a typical computer memory and the organization of the memory into bytes and words. The next section expands the discussion to consider virtual memory.

2. Characteristics of Computer Memory

Engineers use the term Random Access Memory (RAM) to denote the type of memory used as the primary memory system in most computers. As the name implies, RAM is optimized for random (as opposed to sequential) access. In addition, RAM offers read-write capability that makes access and update equally inexpensive. Finally, we will see that most RAM is volatile - values do not persist after the computer is powered down.

3. Static and Dynamic RAM Technologies

The technologies used to implement Random Access Memory can be divided into two broad categories. Static RAM (SRAM†) is the easiest type for programmers to understand because it is a straightforward extension of digital logic. Conceptually, SRAM stores each data bit in a latch, a miniature digital circuit composed of multiple transistors similar to the latch discussed in Section 2. Although the internal implementation is beyond the scope of this text, FIG. 1 illustrates the three external connections used for a single-bit of RAM.

FIG. 1 Illustration of a miniature static RAM circuit that stores one data bit. The circuit contains multiple transistors.

In the figure, the circuit has two inputs and one output. When the write enable in put is on (i.e., logical 1), the circuit sets the output value equal to the input (0 or 1);

when the write enable input is off (i.e., logical 0), the circuit ignores the input and keeps the output at the last setting. Thus, to store a value, the hardware places the value on the input, turns on the write enable line, and then turns the enable line off again.

Although it performs at high speed, SRAM has a significant disadvantage: high power consumption (which generates heat). The miniature SRAM circuit contains multiple transistors that operate continuously. Each transistor consumes a small amount of power, and therefore, generates a small amount of heat.

The alternative to static RAM, which is known as Dynamic RAM (DRAM‡), consumes less power. The internal working of dynamic RAM is surprising and can be confusing. At the lowest level, to store information, DRAM uses a circuit that acts like a capacitor, a device that stores electrical charge. When a value is written to DRAM, the hardware charges or discharges the capacitor to store a 1 or 0. Later, when a value is read from DRAM, the hardware examines the charge on the capacitor and generates the appropriate digital value.

The conceptual difficulty surrounding DRAM arises from the way a capacitor works: because physical systems are imperfect, a capacitor gradually loses its charge.

In essence, a DRAM chip is an imperfect memory device - as time passes, the charge dissipates and a one becomes zero. More important, DRAM loses its charge in a short time (e.g., in some cases, under a second).

How can DRAM be used as a computer memory if values can quickly become zero? The answer lies in a simple technique: devise a way to read a bit from memory before the charge has time to dissipate, and then write the same value back again. Writing a value causes the capacitor to start again with the appropriate charge. So, reading and then writing a bit will reset the capacitor without changing the value of the bit.

In practice, computers that use DRAM contain an extra hardware circuit, known as a refresh circuit, that performs the task of reading and then writing a bit. FIG. 2 illustrates the concept.

FIG. 2 Illustration of a bit in dynamic RAM. An external refresh circuit must periodically read the data value and write it back again, or the charge will dissipate and the value will be lost.

The refresh circuit is more complex than the figure implies. To keep the refresh circuit small, architects do not build one refresh circuit for each bit. Instead, a single, small refresh mechanism is designed that can cycle through the entire memory. As it reaches a bit, the refresh circuit reads the bit, writes the value back, and then moves on.

Complexity also arises because a refresh circuit must coordinate with normal memory operations. First, the refresh circuit must not interfere or delay normal memory operations. Second, the refresh circuit must ensure that a normal write operation does not change the bit between the time the refresh circuit reads the bit and the time the refresh circuit writes the same value back. Despite the need for a refresh circuit, the cost and power consumption advantages of DRAM are so beneficial that most computer memory is composed of DRAM rather than SRAM.

4. The Two Primary Measures of Memory Technology

Architects use several quantitative measures to assess memory technology; two stand out:

-- Density

-- Latency and cycle times

5. Density

In a strict sense, the term density refers to the number of memory cells per square area of silicon. In practice, however, density often refers to the number of bits that can be represented on a standard size chip or plug-in module. For example, a Dual In-line Memory Module (DIMM) might contain a set of chips that offer 128 million locations of 64 bits per location, which equals 8.192 billion bits or one Gigabyte. Informally, it is known as a 1 gig module. Higher density is usually desirable because it means more memory can be held in the same physical space. However, higher density has the disadvantages of increased power utilization and increased heat generation.

The density of memory chips is related to the size of transistors in the underlying silicon technology, which has followed Moore's Law. Thus, memory density tends to double approximately every eighteen months.

6. Separation of Read and Write Performance

A second measure of a memory technology focuses on speed: how fast can the memory respond to requests? It may seem that speed should be easy to measure, but it is not. For example, as the previous section discusses, some memory technologies take much longer to write values than to read them. To choose an appropriate memory technology, an architect needs to understand both the cost of access and the cost of update.

Thus, a principle arises:

In many memory technologies, the time required to fetch information from memory differs from the time required to store information in memory, and the difference can be dramatic. Therefore, any measure of memory performance must give two values: the performance of read operations and the performance of write operations.

7. Latency and Memory Controllers

In addition to separating read and write operations, we must decide exactly what to measure. It may seem that the most important measure is latency (i.e., the time that elapses between the start of an operation and the completion of the operation).

However, latency is a simplistic measure that does not provide complete information.

To see why latency does not suffice as a measure of memory performance, we need to understand how the hardware works. In addition to the memory chips them selves, additional hardware known as a memory controller† provides an interface between the processor and memory. FIG. 3 illustrates the organization.

FIG. 3 Illustration of the hardware used for memory access. A controller sits between the processor and physical memory.

To access memory, a device (typically a processor) presents a read or write request to the controller. The controller translates the request into signals appropriate for the underlying memory, and passes the signals to the memory chips. To minimize latency, the controller returns an answer as quickly as possible (i.e., as soon as the memory responds). However, after it responds to a device, a controller may need additional clock cycle(s) to reset hardware circuits and prepare for the next operation.

A second principle of memory performance arises:

Because a memory system may need extra time between operations, latency is an insufficient measure of performance; a performance measure needs to measure the time required for successive operations.

That is, to assess the performance of a memory system, we need to measure how fast the system can perform a sequence of operations. Engineers use the term memory cycle time to capture the idea. Specifically, two separate measures are used: the read cycle time (abbreviated tRC) and the write cycle time (abbreviated tWC).

We can summarize:

The read cycle time and write cycle time are used as measures of memory system performance because they assess how quickly the memory system can handle successive requests.

8. Synchronous and Multiple Data Rate Technologies

Like most other digital circuits in a computer, a memory system uses a clock that controls exactly when a read or write operation begins. As FIG. 3 indicates, a memory system must also coordinate with a processor. The controller may also coordinate with I/O devices. What happens if the processor's clock differs from the clock used for memory? The system still works because the controller can hold a request from the processor or a response from the memory until the other side is ready.

Unfortunately, the difference in clock rates can impact performance - although the delay is small, if delay occurs on every memory reference, the accumulated effect can be large. To eliminate the delay, some memory systems use a synchronous clock system. That is, the clock pulses used with the memory system are aligned with the clock pulses used to run the processor. As a result, a processor does not need to wait for memory references to complete. Synchronization can be used with DRAM or SRAM; the two technologies are named:

SDRAM- Synchronous Dynamic Random Access Memory

SSRAM- Synchronous Static Random Access Memory

In practice, synchronization has been effective; most computers now use synchronous DRAM as the primary memory technology.

In many computer systems, memory is the bottleneck - increasing memory performance improves overall performance. As a result, engineers have concentrated on finding memory technologies with lower cycle times. One approach uses a technique that runs the memory system at a multiple of the normal clock rate (e.g., double or quadruple). Because the clock runs faster, the memory can deliver data faster. The technologies are sometimes called fast data rate memories, typically double data rate or quadruple data rate. Fast data rate memories have been successful, and are now standard on most computer systems, including consumer systems such as laptops.

Although we have covered the highlights, our discussion of RAM memory technology does not begin to illustrate the range of choices available to an architect or the de tailed differences among them. For example, FIG. 4 lists a few commercially available RAM technologies:

==============

Technology --- Description

DDR-DRAM Double Data Rate Dynamic RAM

DDR-SDRAM Double Data Rate Synchronous Dynamic RAM

FCRAM Fast Cycle RAM

FPM-DRAM Fast Page Mode Dynamic RAM

QDR-DRAM Quad Data Rate Dynamic RAM

QDR-SRAM Quad Data Rate Static RAM

SDRAM Synchronous Dynamic RAM

SSRAM Synchronous Static RAM

ZBT-SRAM Zero Bus Turnaround Static RAM

RDRAM Rambus Dynamic RAM

RLDRAM Reduced Latency Dynamic RAM

FIG. 4 Examples of commercially available RAM technologies. Many other technologies exist.

============

9. Memory Organization

Recall that there are two key aspects of memory: the underlying technology and the memory organization. As we have seen, an architect can choose from a variety of memory technologies; we will now consider the second aspect. Memory organization refers to both the internal structure of the hardware and the external addressing structure that the memory presents to a processor. We will see that the two are related.

10. Memory Access and Memory Bus

To understand how memory is organized, we need to examine the access paradigm.

Recall from FIG. 3 that a memory controller provides the interface between a physical memory and a processor that uses the memory†. Several questions arise. What is the structure of the connection between a processor and memory? What values pass across the connection? How does the processor view the memory system? To achieve high performance, memory systems use parallelism: the connection between the processor and controller consists of many wires that are used simultaneously. Each wire can transfer one data bit at any time. FIG. 5 illustrates the concept.

FIG. 5 The parallel connection between a processor and memory. A connection that contains N wires allows N bits of data to be transferred simultaneously.

†In later sections, we will learn that I/O devices also access memory through the memory controller; for now, we will use a processor in the examples.

The technical name for the hardware connection between a processor and memory is bus (more specifically, memory bus). We will learn about buses in the sections on I/O; for now, it is sufficient to understand that a bus provides parallel connections.

11. Words, Physical Addresses, and Memory Transfers

The parallel connections of a memory bus are pertinent to programmers as well as computer architects. From an architectural standpoint, using parallel connections can improve performance. From a programming point of view, the parallel connections de fine a memory transfer size (i.e., the amount of data that can be read or written to memory in a single operation). We will see that transfer size is a crucial aspect of memory organization.

To permit parallel access, the bits that comprise a physical memory are divided into blocks of N bits per block, where N is the memory transfer size. A block of N bits is sometimes called a word, and the transfer size is called the word size or the width of a word. We can think of memory being organized into an array. Each entry in the array is assigned a unique index known as a physical memory address; the approach is known as word addressing. FIG. 6 illustrates the idea and shows that the physical memory address is exactly like an array index.

FIG. 6 Physical memory addressing on a computer where a word is thirty-two bits. We think of the memory as an array of words.

12. Physical Memory Operations

The controller for physical memory supports two operations: read and write. In the case of a read operation, the processor specifies an address; in the case of a write operation, the processor specifies an address as well as data to be written. The fundamental idea is that the controller always accepts or delivers an entire word; physical memory hardware does not provide a way to read or write less than a complete word (i.e., the hardware does not allow the processor to access or alter part of a word).

The point is:

Physical memory is organized into words, where a word is equal to the memory transfer size. Each read or write operation applies to an entire word.

13. Memory Word Size and Data Types

Recall that the parallel connection between a processor and a memory is designed for high performance. In theory, performance can be increased by adding more parallel wires. For example, an interface that has 128 wires can transfer data at twice the rate of an interface that has 64 wires. The question arises: how many wires should an architect choose? That is, what word size is optimal? The question is complicated by several factors. First, because memory is used to store data, the word size should accommodate common data values (e.g., the word should be large enough to hold an integer).

Second, because memory is used to store programs, the word size should accommodate frequently used instructions. Third, because the connection of a processor to a memory requires pins on the processor, adding wires to the interface increases the pin requirements (the number of pins can be a limiting factor in the design of a CPU chip). Thus, the word size is chosen as a compromise between performance and various other considerations. A word size of thirty-two bits is popular, especially for low-power systems; many high-performance systems use a sixty-four-bit word size.

In most cases, an architect designs all parts of a computer system to work together.

Thus, if an architect chooses a memory word size equal to thirty-two bits, the architect will make a standard integer and a single-precision floating point value each occupy thirty-two bits. As a result, a computer system is often characterized by stating the word size (e.g., a thirty-two-bit processor).

14. Byte Addressing and Mapping Bytes to Words

Programmers who use a conventional computer may be surprised to learn that physical memory is organized into words because most programmers are familiar with an alternate form of addressing known as byte addressing. Byte addressing is especially convenient for programming because it gives a programmer an easy way to access small data items such as characters.

Conceptually, when byte addressing is used, memory must be organized as an array of bytes rather than an array of words. The choice of byte addressing has two important consequences. First, because each byte of memory is assigned an address, byte addressing requires more addresses than word addressing. Second, because byte ad dressing allows a program to read or write a single byte, the memory controller must support byte transfer.

A larger word size results in higher performance because many bits can be transferred at the same time. Unfortunately, if the word size is equal to an eight-bit byte, only eight bits can be transferred at one time. That is, a memory system built for byte addressing will have lower performance than a memory system built for a larger word size. Interestingly, even when byte addressing is used, many transfers between a processor and memory involve multiple bytes. For example, an instruction occupies multiple bytes, as does an integer, a floating point value, and a pointer.

Can we devise a memory system that combines the higher speed of word addressing with the programming convenience of byte addressing? The answer is yes. To do so, we need an intelligent memory controller that can translate between the two addressing schemes. The controller accepts requests from the processor that specify a byte ad dress and size. The controller uses word addressing to access the appropriate word(s) in the underlying memory and extract the specified bytes. FIG. 7 shows an example of the mapping used between byte addressing and word addressing for a word size of thirty-two bits.

FIG. 7 Example of a byte address assigned to each byte of memory even though the underlying hardware uses word addressing and a thirty-two-bit word size.

To implement the mapping shown in the figure, a controller must convert byte ad dresses issued by the processor to word addresses used by the memory system. For ex ample, if the processor requests a read operation for byte address 17, the controller must issue a read request for word 4, and then extract the second byte from the word.

Because the memory can only transfer an entire word at a time, a byte write operation is expensive. For example, if a processor writes byte 11, the controller must read word 2 from memory, replace the rightmost byte, and then write the entire word back to memory.

Mathematically, the translation of addresses is straightforward. To translate a byte address, B, to the corresponding word address, W, the controller divides B by N, the number of bytes per word, and ignores the remainder. Similarly, to compute a byte offset, O, within a word, the controller computes the remainder of B divided by N.

That is, the word address is given by:

W =

N B

and the offset is given by:

O = B mod N

As an example, consider the values in FIG. 7, where N= 4. A byte address of 11 translates to a word address of 2 and an offset of 3, which means that byte 11 is found in word 2 at byte offset 3†.

15. Using Powers of Two

Performing a division or computing a remainder is time consuming and requires extra hardware (e.g., an Arithmetic Logic Unit). To avoid computation, architects organize memory using powers of two. Doing so means that hardware can perform the two computations above simply by extracting bits. In FIG. 7, for example, N=22 , which means that the offset can be computed by extracting the two low-order bits, and the word address can be computed by extracting everything except the two low-order bits. FIG. 8 illustrates the idea:

FIG. 8 An example of a mapping from byte address 17 to word address 4 and offset 1. Using a power of two for the number of bytes per word avoids arithmetic calculations.

We can summarize:

To avoid arithmetic calculations, such as division or remainder, physical memory is organized such that the number of bytes per word is a power of two, which means the translation from a byte address to a word address and offset can be performed by extracting bits.

16. Byte Alignment and Programming

Knowing how the underlying hardware works helps explain a concept that programmers encounter: byte alignment. We say that an integer value is aligned if the bytes of the integer correspond to a word in the underlying physical memory. In FIG. 7, for example, an integer composed of bytes 12, 13, 14, and 15 is aligned, but an integer composed of bytes 6, 7, 8, and 9 is not.

†The offset is measured from zero.

On some architectures, byte alignment is required - the processor raises an error if a program attempts an integer access using an unaligned address. On other processors, arbitrary alignment is allowed, but unaligned accesses result in lower performance than aligned accesses. We can now understand why an unaligned address requires more accesses of physical memory: the memory controller must convert each processor re quest into operations on the underlying memory. If an integer spans two words, the controller must perform two read operations to obtain the requested bytes. Thus, even if the processor permits unaligned access, programmers are strongly encouraged to align data values.

We can summarize:

The organization of physical memory affects programming: even if a processor allows unaligned memory access, aligning data on boundaries that correspond to the physical word size can improve program performance.

PREV. | NEXT