Computer Architecture: CPUs -- Power and Energy

AMAZON multi-meters discounts AMAZON oscilloscope discounts

1. Introduction

The related topics of power consumption and total energy consumption have become increasingly important in the design of computing systems. For portable devices, designs strive for a balance between maximizing battery life and maximizing features that users desire. For large data centers, the power consumed and the consequent cooling required are now critical factors in the design and scale.

This brief section introduces the topic without going into much detail. It defines terminology, explains the types of power that digital circuits consume, and describes the relationship between power and energy. Most important, the section describes how software systems can be used to shut down parts of a system to reduce power consumption.

2. Definition of Power

We define power to be the rate at which energy is consumed (e.g., transferred or transformed). For an electronic circuit, power is the product of voltage and current.

Taking the definitions from physics, power is measured in units of watts, where a watt is defined as one joule per second (J/s). The higher the wattage of an electronic device, the more power it consumes; some devices use kilowatts (10^3 watts) of power. For a large data center cluster, the aggregate power consumed by all the computers in the cluster is so large that it is measured in megawatts (10^6 watts). For small hand-held de vices, such as cell phones, the power requirements are so minimal that they are measured in milliwatts (10^-3 watts).

It is important to note that the amount of power a system uses can vary over time.

For example a smart phone uses less power when the display is turned off than when the screen is on. Therefore, to be precise, we define the instantaneous power at time t, P(t), to be the product of the voltage at time t, V(t), and the current at time t, I(t):

P (t) = V (t) × I (t) (eqn.1)

We will see that the ability of a system to vary its power usage over time can be important for both extremely large and extremely small computing systems (e.g., powerful computers in a data center and small battery powered devices).

The maximum power that a system uses is especially important for large systems, such as a cluster of computers in a data center. We use the term peak instantaneous power to specify the maximum power a system will need. Peak power is especially important when constructing a large computing system because the designer must arrange to meet the peak power requirements. For example, when planning a data center, a designer must guarantee that an electric utility can supply sufficient power to meet the peak instantaneous power demand.

3. Definition of Energy

From the above, the total energy that a system uses is computed as the power consumed over a given time, measured in joules. Electrical energy is usually reported in multiples of watts multiplied by a unit of time. Typically, the time unit is an hour, and the multiples of watts are kilowatts, megawatts, or milliwatts. Thus, the energy consumed by a data center during a week might be reported in kilowatt hours (kWh)or megawatt hours (MWh), and the energy consumed by a battery during a week might be reported in milliwatt hours (mWh).

If power utilization is constant, the energy consumed can be computed easily by multiplying power utilization, P, by the time the power is used. For example, during the time period from t0 to t1, the energy used is given by:

E = P × ( t1 - t0 ) (eqn.2)

A system that uses exactly 6 kilowatts during an hour has an energy consumption of 6 kWh as does a system that has an energy consumption of 3 kilowatts for a period of two hours.

As we described above, most systems do not consume power at a constant rate.

Instead, the power consumption varies over time. To capture the idea that power varies continuously, we define energy to be the integral of instantaneous power over time:

(eqn.3)

Although power is defined to be an instantaneous measure that can change over time, some electronic systems specify a value known as the average power. Recall that power is the rate at which energy is used, which means the average power over a time interval can be computed by taking the amount of energy used during the interval and dividing by the time:

Pavg = ( t1 - t0 ) E (eqn.4)

4. Power Consumption by a Digital Circuit

Recall that a digital circuit is created from logic gates. At the lowest level, all logic gates are composed of transistors, and transistors consume power in two ways†:

-- Switching or dynamic power (denoted Ps or Pd ) d Leakage power (denoted Pleak )

Switching Power. The term switching refers to a change in the output in response to an input. When one or more inputs of a gate change, the output may change. A change in output can only occur because electrons flow through transistors. Individual transistors consume more power during switching, which means that the total power for the system increases.

Leakage Power. Although we think of a digital circuit as having a binary value (on or off), solid state physicists realize that transistors are imperfect switches. That is, even when a transistor is off, a few electrons can penetrate the semiconductor boundary.

Therefore, whenever power is supplied to a digital circuit, some amount of current will always flow, even if the outputs are not switching. We use the term leakage to refer to current that flows when a circuit is not operating.

For a given transistor, the amount of leakage current is insignificant. However, a single processor can have a billion transistors, meaning that the aggregate leakage current can be quite high. In fact, for some digital systems, the leakage current ac counts for more than half of the power utilization. The point can be summarized:

In a typical computing system, 40 to 60 percent of the power the sys tem uses is leakage power.

A further point is important in the discussion of power management. The basic principle is that leakage always occurs when power is present:

Leakage current can only be eliminated by removing power from a circuit.

†In addition to the two major sources, a minor amount of short-circuit power is consumed because CMOS transistors form a brief connection between the power source and ground when switching.

5. Switching Power Consumed by a CMOS Digital Circuit

Our focus is using software to manage the power use of a digital circuit. To understand power management techniques, we need a few basic concepts. First, we will consider the total energy consumed by switching. The energy, required for a single change of a gate is denoted Ed, and is given by:

(eqn.5)

where C is a value of capacitance that depends on the underlying CMOS technology, and Vdd is the voltage at which the circuit operates†.

To understand the power consequences of Equation 20.5, consider a clock. The clock generates a square wave at a fixed frequency. Suppose the clock signal is connected to an inverter. The inverter output will change twice during a clock cycle, once when the clock goes from zero to one and once when the clock goes from one back to zero. Therefore, if the clock has period Tclock , the average power used is:

(eqn.6) The frequency of the clock is the inverse of the period:

(eqn.7)

which means we can rewrite Equation 6 in terms of clock frequency:

(eqn.8)

One additional term is used to compute the average power: a fraction of the circuit whose outputs are switching. We use a to denote the fraction, 0 =a= 1, which makes the final form of Equation 8 for average power:

Pavg = a C Vdd 2

Fclock (eqn.9)

Equation 9 captures the three main components of power that are pertinent to the following discussion. Constant C is a property of the underlying technology and cannot be changed easily. Thus, the three components that can be controlled are:

-- The fraction of the circuit that is active, a

-- The clock frequency, Fclock

-- The voltage in the circuit, Vdd

†The notation Vdd is used to specify the voltage used to operate a CMOS circuit; the notation V (voltage) can be used if the context is understood.

6. Cooling, Power Density, and the Power Wall

Recall that instantaneous power use is often associated with data centers or other large installations where a key aspect is peak power utilization. In addition to the question of whether an electric utility is able to deliver the megawatts needed during peak use, designers focus on two other aspects of power use: cooling and power density.

Cooling. When a digital device operates, it generates heat. A huge power load means many devices are operating, and each device is generating heat. Consequently, the heat being produced is related to the power being consumed. All electronic circuits must be cooled or circuits will overheat and burn out. For the smallest devices, enough heat escapes to the surrounding air that no further cooling is needed. For medium-size devices, cooling requires a fan that blows cold air across the circuits constantly; the air must be brought in through a Heating, Ventilation, and Air Conditioning (HVAC) sys tem. In the most extreme cases, air cooling is insufficient, and a form of liquid cooling is required.

Power Density. Although the total amount of heat a circuit produces dictates the total cooling capacity required, another aspect of heat is important: the concentration of heat in a small area. In a data center, for example, if many computers are placed adjacent to one another, they can overheat. Thus, spacing is added between computers and between racks of computers to permit cool air to flow through the racks and remove heat.

Power density is also important on an individual integrated circuit, where power density refers to the amount of power that is dissipated per a given area of silicon. For many years, the semiconductor industry followed Moore's Law. The size of an individual transistor continued to shrink, and every eighteen months, the number of transistors that fit on a single chip doubled. However, following Moore's Law had a negative aspect: power density also increased. As power density increases, the amount of heat generated per unit area increases, which means that a modern processor produces much more heat per square centimeter than earlier processors.

Consequently, packing transistors closer together has led to a major problem: we are reaching the limits of the rate at which heat can be removed from a chip. Engineers refer to the limit as the power wall because it means power cannot be increased. With current cooling technologies, the limit can be approximated:

Power Wall

~ ~ 100 cm^2 watts (eqn.10)

7. Energy Use

Unlike power, which measures instantaneous flow of current, energy measures the total power consumed over a given time interval. A focus on energy is especially pertinent to portable devices that use batteries. We can think of a battery as a bucket of energy, and imagine the device extracting energy as needed. The total time a battery can power a device (measured in milliwatt hours) is derived from the amount of energy in the battery.

Modeling a battery as a bucket of energy (analogous to a bucket of water) is overly simplistic. However, three aspects of water buckets apply to batteries. First, like water in a bucket, the energy stored in a battery can evaporate. In the case of a battery, chemical and physical processes are imperfect - internal resistance allows a trivial amount of current to flow inside the battery. Although the flow is almost imperceptible, allowing a battery to sit for a long time (e.g., a year) will result in loss of charge. Second, just as some of the water poured from a bucket is likely to spill when extracting energy from a battery, some of the energy is lost. Third, energy can be removed from a battery at various rates, just as water can be extracted from a bucket at various rates. The important idea behind the third property is a battery becomes more efficient at lower current levels (i.e., lower power levels). Thus, designers look for ways to minimize power that a battery operated device consumes.

8. Power Management

The above discussion shows that reducing power consumption is desirable in all cases. In a large data center, reducing power consumption reduces the heat generated.

For a small portable device, reducing power consumption extends the battery life. Two questions arise: what methods can be used to reduce power consumption, and which of the power reduction techniques can be controlled by software? Recall from Equation .9†, three primary factors contribute to power consumption: a, the fraction of a circuit that is active, Fclock , the clock frequency, and Vdd, the voltage used to operate a circuit. The next sections describe how voltage and frequency can be used to reduce power consumption; a later section considers the fraction of a circuit that is active.

8.1 Voltage and Delay

Because power utilization depends on the square of the voltage, lowering voltage will produce the largest reduction in power. However, voltage is not an independent variable. First, decreasing voltage increases gate delay, the time a gate takes to change its outputs after inputs change. A processor is designed carefully so that all hardware units operate according to the clock. If the delay for a single gate becomes sufficiently large, the delay across an entire hardware unit (many gates) will exceed the design specification.

For current technology, the delay can be estimated by:

(eqn.11)

where Vdd is the voltage used, VTH is a threshold voltage determined by the underlying CMOS technology, K is a constant that depends on the technology, and ß is a constant (approximately 1.3 for current technology).

A second aspect of power is related to voltage: leakage current. The leakage current depends on the temperature of a circuit and the threshold voltage of the CMOS technology. Lowering voltage decreases leakage current, but has an interesting consequence: lower voltage means increased delay, which results in more total energy being consumed. To understand why increasing leakage can be significant, recall that leakage can account for 40% to 60% of the power a circuit uses. The point is:

Although power depends on the square of voltage, reducing voltage increases delay which increases total energy usage.

Despite the problems, voltage is the most significant factor in power reduction.

Therefore, researchers who work on solid state physics and silicon technologies have devised transistors that operate correctly at much lower voltages. For example, although early digital circuits operated at 5 volts, current technologies used in cell phones operate at lower voltages. A fully charged cell phone battery provides about 4 volts, and the circuits continue to operate as the battery discharges. In fact, some cell phones that use NiMH battery technology can still receive calls with a battery that pro vides only 1.2 volts, and the phone only declares a battery dead when the voltage falls below 0.8 volts. (Lithium-based batteries tend to die at approximately 3.65 volts.)

8.2 Decreasing Clock Frequency

Clock frequency forms a second factor in power utilization. In theory, power is proportional to clock frequency, so slowing the clock will save power. In practice, reducing the clock frequency lowers performance, which may be critical in systems that have real-time requirements (e.g., a system that displays video or plays music).

Interestingly, adjusting the clock frequency can be used in conjunction with a reduction in voltage. That is, a slower clock can accommodate the increased delays that a lower voltage causes. Thus, if a designer decreases the clock frequency as voltage is decreased, performance will suffer but the circuit will operate correctly.

When both clock frequency and voltage are reduced, the resulting reduction in power can be dramatic. In one specific case, reducing the frequency to one-half the original rate allowed the voltage to be divided by 1.7. Because voltage is squared in the power equation (Equation 9), reducing the voltage allows the resulting power to be reduced dramatically. For the example, the resulting power was approximately 15% of the original power. Although the savings depend on the technology being used, the general idea can be summarized:

If a circuit can deliver adequate performance with a reduced clock frequency, power can be cut dramatically because reducing the clock frequency also allows voltage to be reduced.

Intel has invented an interesting twist on reduced clock frequency by permitting dynamic changes. The idea is straightforward. When the processor is busy, the operating system sets the clock frequency high. If the processor exceeds a preset thermal limit (i.e., overheats) or a power limit (e.g., would drain a battery quickly), the operating system reduces the clock frequency until the processor operates within the prescribed limits. For example, clock frequency might be increased or decreased dynamically by multiples of 100 MHz. If the processor is idle, the clock frequency can also be reduced to save energy. Instead of advertising the capability as dynamic speed reduction, Intel marketing turns the situation around and advertises the feature as Turbo Boost.

8.3 Slower Clock Frequency and Multicore Processors

In the early 2000s, at the same time power utilization was becoming a problem, chip vendors introduced multicore processors. On the surface, a shift to multicore architectures seems counterproductive because two cores will require twice as much power as a single core. Of course, the cores may share some of the circuitry (e.g., a memory or bus interface), which means the power consumption of a dual-core chip will not be exactly double the power consumption of a single core chip. However, a second core adds substantial additional power requirements.

Why would vendors introduce more cores if reducing power consumption is important? To understand, look carefully at clock frequency. Before multicore chips appeared, clock frequency increased every few years as new processors appeared. We know from the above discussion that slowing down a clock to one-half of its original speed allows voltage to be lowered and cuts power consumption significantly. Now consider a dual-core chip. Suppose that each core runs at one-half the clock frequency of a single-core chip. The computational power of the dual-core version is still approximately the same as a single core that runs twice as fast. In terms of power utilization, however, the voltage can be reduced, which means that each of the two cores takes a fraction, F, of the power required by the single-core version. As a result, the multicore chip takes approximately 2F as much power as the single core version. Provided F is less than 50%, the slower dual-core chip consumes less power. In the example above, F is 15%, which means a dual-core chip will provide equivalent computational power at only 30% of the original power requirements. We can summarize:

A multicore chip in which each core runs at a slower clock frequency and lower voltage can deliver approximately the same computational capability as a single core chip while incurring significantly lower power utilization.

Of course, the discussion above makes an important assumption about multicore processing. Namely, it assumes that computation can be divided among multiple cores.

Unfortunately, Section 18 points out that experience with parallelism has not been promising. For computations where a parallel approach is not feasible, a slow clock can make the system unusable. Even in cases where some parallelism is feasible, memory contention and other inefficiencies can result in disappointing performance. When parallel processing is used to handle multiple input items at the same time, overall throughput from two cores can be the same as that of a single, faster core. However, latency (i.e., the time required to process a given item) is higher. Finally, one should remember that the discussion has focused on switching power - leakage can still be a significant problem.

9. Software Control of Energy Use

Software on a system usually has little or no ability to make minor increases or de creases in the voltage used. Instead, software is often restricted to two basic operations:

-- Clock gating

-- Power gating

Clock Gating. The term refers to reducing the clock frequency to zero which effectively stops a processor. Before a processor can be stopped, a programmer must arrange for a way to restart it. Typically, the code image is kept in memory, and the memory retains power. Thus, the image remains ready whenever the processor restarts.

Power Gating. The term refers to cutting off power from the processor. A special solid state device that has extremely low leakage current is used to cut off power. As with clock gating, a programmer must arrange for a restart, either by saving and then restoring a copy of the memory image or by ensuring that the memory remains powered on so the image is retained.

Systems that offer power gating capabilities do not apply gating across the entire system. Instead, the system is divided into islands, and gating is applied to some is lands while others continue to operate normally. Memory cache forms a particularly important power island - if power is removed from a memory cache, all cached data will be lost. We know from Section 12 that caching is important for performance.

Therefore, a memory cache can be placed in a power island that is not shut down when power is removed from other parts of the processor.

Some processors extend the idea to provide a set of low power modes that software can use to reduce power consumption. Vendors use a variety of names to describe the modes, such as sleep, deep sleep, and hibernation. We will use the generic names LPM0, LPM1, LPM2, LPM3, and LPM4. In general, low power modes are arranged in a hierarchy. LPM0 turns off the least amount of circuitry and has the fastest recovery.

LPM4, the deepest sleep mode, turns off almost the entire processor. As a consequence, restarting from LPM4 takes much longer than other low power modes.

10. Choosing When to Sleep and When to Awaken

Two questions must be answered: when should a system enter a sleep mode, and when should it awaken? Choosing when to awaken from sleep mode is usually straight forward: wake up on demand. That is, the hardware waits until an event occurs that re quires the processor, and the hardware then moves the processor out of sleep mode. For example, a screen saver restarts the display whenever a user moves a mouse, touches a touch-sensitive screen, or presses a key on a keyboard.

The question of when to enter a low power mode is more complex. The motivation is to reduce power utilization. Therefore, we want to gate power to a subsystem (i.e., turn it off) if the subsystem will not be needed for a reasonably long time. Because we usually cannot know future requirements, most systems employ a heuristic to estimate when a subsystem will be needed: if a sufficiently long period of inactivity occurs, assume the subsystem will remain inactive for a while longer. Typically, if a processor or a device remains inactive for N seconds, the processor or device enters a sleep mode. The heuristic can also be applied to cause deeper sleep -- if a processor remains in a light sleep state for K seconds, the hardware moves the processor to a deeper sleep state (i.e., additional parts of the processor are turned off).

What value of N should be used as a timeout for sleep mode? Subsystems that provide interaction with a human user typically allow the user to choose a timeout. For example, a screen saver allows a user to specify how long the input devices should remain idle before the screen saver runs. Allowing users to specify a timeout means that each user can tailor the system to their needs.

Choosing a timeout for a system that does not involve human preference requires a more careful analysis. A simplified model will help illustrate the calculation. For the model, we will assume two states: a RUN state in which the processor runs with full power and an OFF state in which all power is removed. When the processor makes a transition, some time elapses, which we denote Tshutdown and Twakeup. FIG. 1 illustrates the simplified model.

FIG. 1 A simplified model of transitions among low power modes.

Power is used for each transition (i.e., to save state information or prepare I/O de vices for the transition). To make calculations easier, we will assume the power used during a transition is constant. Therefore, the energy required for a transition can be calculated by multiplying the power used by the time that elapses:

Eshutdown = Es = Pshutdown × Tshutdown (eqn. 12)

and Ewakeup = Ew = Pwakeup × Twakeup (eqn. 13)

Understanding the energy required for transitions and the energy used when the system runs and when it is shut down allows us to assess potential energy savings. In essence, shutting down is beneficial if shutdown, sleep, and later wakeup consume less energy than continuing to run over the same time interval.

Let t be the time interval being considered. If we assume the power used by the running system is constant, the energy consumed when the system remains running for time t is:

Erun = Prun × t (eqn.14)

The energy consumed if the system is put into sleep mode for time t consists of the energy required for each of the transitions plus Poff , the energy used (if any) while the processor is shut down:

Esleep = Es + Ew + Poff

( t - Tshutdown - Twakeup ) (eqn.15)

Shutting down the system will be beneficial if:

Esleep < E run (eqn.16)

By using Equations 12 through 15, the inequality can be expressed in terms of a single free variable, the time interval t. Therefore, it is possible to compute a break even point that specifies the minimum value of t for which shutting down saves energy.

Of course, the analysis above is based on a simplified model. Power usage may not remain constant; the time and power required for transitions may depend on the state of the system. More important, the analysis focuses on energy consumed by switching and ignores leakage. However, the analysis does illustrate a basic point:

Even for a simplified model with only one low power state, details such as the energy used during state transitions complicate the decision about when to move to low power mode.

11. Sleep Modes and Network Devices

Many devices have a low power mode that is used to save energy. For example, a printer usually sleeps after N minutes of inactivity. Similarly, wireless network adapters can enter a sleep mode to reduce power consumption. For a network adapter, handling output (transmission) is trivial because the adapter can be awakened whenever an application generates an outgoing packet. However, input (reception) poses a difficult challenge for low power mode because a computer cannot know when another computer will send a packet.

As an example, the Wi-Fi (802.11) standard includes a Power Saving Polling (PSP) mode. To save power, laptops and other devices using Wi-Fi shut down and only wake up periodically. We use the term duty-cycle to characterize the repeated cycle of a device running and then being shut down. A radio must be up when an access point transmits. A Wi-Fi base station periodically sends a beacon that includes a list of recipients for which the base station has undelivered packets. The beacon is frequent enough so a device is guaranteed to receive the beacon during the part of the duty cycle when they are awake. If a device finds itself on the recipient list, the device remains awake to receive the packet.

Two basic approaches have been used to allow a network adapter to sleep without missing packets indefinitely. In one approach, each device synchronizes its sleep cycles with the base station. In the other approach, a base station transmits each packet many times until the receiver wakes up and receives it.

12. Summary

Power is an instantaneous measure of the rate at which energy is used; energy is the total amount of power used over a given time. A digital circuit uses dynamic or switching power (i.e., an output changes in response to the change of an input) and leakage power. Leakage can account for 40 to 60 percent of the power a circuit consumes.

Power consumption can be reduced by making parts of a circuit inactive, reducing the clock frequency, and reducing the voltage. Reducing the voltage has the largest effect, but also increases delay. Power density refers to the concentration of power in a given space; power density is related to heat. The power wall refers to the limit of approximately 100 watts per cm^2 that gives the maximum power density for which heat can be removed from a silicon chip using current cooling technologies.

Clock gating and power gating can be used to turn off a circuit (or part of a circuit). For devices that use battery power, the overall goal of power management systems is a reduction in total energy use. Because moving into and out of a low power (sleep) mode consumes energy, sleeping is only justified if the energy required for sleep mode is less than the energy required to remain running. A simplified model shows that the computation involves the cost to shut down and the cost to wake up.

Devices can also use low-power modes. Network interfaces pose a challenge because the interface must be awake to receive packets and a computer does not always know when packets will arrive. The Wi-Fi standard includes a Power Saving Polling mode.

EXERCISES

1. Estimate the amount of power required for the Tianhe-2 supercomputer described on page 376. Hint: start by finding an estimate of the number of watts used by a single processor.

2. Suppose the frequency of a clock is reduced by 10% and all other parameters remain the same. How much is the power reduced?

3. Suppose the voltage, Vdd, is reduced by 10% and all other parameters remain the same.

How much is the power reduced?

4. Use Equation 16 to find a break-even value for t.

5. Extend the model in FIG. 1 to a three-state system in which the processor has both a sleep mode and a deep sleep mode.

PREV. | NEXT