Design of VLSI Systems
Design of VLSI Systems

Chapter 7
LOW-POWER VLSI CIRCUITS AND SYSTEMS



[previous] [Table of Contents] [next]


7.1 Introduction

The increasing prominence of portable systems and the need to limit power consumption (and hence, heat dissipation) in very-high density ULSI chips have led to rapid and innovative developments in low-power design during the recent years. The driving forces behind these developments are portable applications requiring low power dissipation and high throughput, such as notebook computers, portable communication devices and personal digital assistants (PDAs). In most of these cases, the requirements of low power consumption must be met along with equally demanding goals of high chip density and high throughput. Hence, low-power design of digital integrated circuits has emerged as a very active and rapidly developing field of CMOS design.

The limited battery lifetime typically imposes very strict demands on the overall power consumption of the portable system. Although new rechargeable battery types such as Nickel-Metal Hydride (NiMH) are being developed with higher energy capacity than that of the conventional Nickel-Cadmium (NiCd) batteries, revolutionary increase of the energy capacity is not expected in the near future. The energy density (amount of energy stored per unit weight) offered by the new battery technologies (e.g., NiMH) is about 30 Watt-hour/pound, which is still low in view of the expanding applications of portable systems. Therefore, reducing the power dissipation of integrated circuits through design improvements is a major challenge in portable systems design.

The need for low-power design is also becoming a major issue in high-performance digital systems, such as microprocessors, digital signal processors (DSPs) and other applications. Increasing chip density and higher operating speed lead to the design of very complex chips with high clock frequencies. Typically, the power dissipation of the chip, and thus, the temperature, increase linearly with the clock frequency. Since the dissipated heat must be removed effectively to keep the chip temperature at an acceptable level, the cost of packaging, cooling and heat removal becomes a significant factor. Several high-performance microprocessor chips designed in the early 1990s (e.g., Intel Pentium, DEC Alpha, PowerPC) operate at clock frequencies in the range of 100 to 300 MHz, and their typical power consumption is between 20 and 50 W.

ULSI reliability is yet another concern which points to the need for low-power design. There is a close correlation between the peak power dissipation of digital circuits and reliability problems such as electromigration and hot-carrier induced device degradation. Also, the thermal stress caused by heat dissipation on chip is a major reliability concern. Consequently, the reduction of power consumption is also crucial for reliability enhancement.

The methodologies which are used to achieve low power consumption in digital systems span a wide range, from device/process level to algorithm level. Device characteristics (e.g., threshold voltage), device geometries and interconnect properties are significant factors in lowering the power consumption. Circuit-level measures such as the proper choice of circuit design styles, reduction of the voltage swing and clocking strategies can be used to reduce power dissipation at the transistor level. Architecture-level measures include smart power management of various system blocks, utilization of pipelining and parallelism, and design of bus structures. Finally, the power consumed by the system can be reduced by a proper selection of the data processing algorithms, specifically to minimize the number of switching events for a given task.

In this chapter, we will primarily concentrate on the circuit- or transistor-level design measures which can be applied to reduce the power dissipation of digital integrated circuits. Various sources of power consumption will be discussed in detail, and design strategies will be introduced to reduce the power dissipation. The concept of adiabatic logic will be given a special emphasis since it emerges as a very effective means for reducing the power consumption.

[Table of Contents] [Top of Document]


7.2 Overview of Power Consumption

In the following, we will examine the various sources (components) of time-averaged power consumption in CMOS circuits. The average power consumption in conventional CMOS digital circuits can be expressed as the sum of three main components, namely, (1) the dynamic (switching) power consumption, (2) the short-circuit power consumption, and (3) the leakage power consumption. If the system or chip includes circuits other than conventional CMOS gates that have continuous current paths between the power supply and the ground, a fourth (static) power component should also be considered. We will limit our discussion to the conventional static and dynamic CMOS logic circuits.

Switching Power Dissipation

This component represents the power dissipated during a switching event, i.e., when the output node voltage of a CMOS logic gate makes a power consuming transition. In digital CMOS circuits, dynamic power is dissipated when energy is drawn from the power supply to charge up the output node capacitance. During the charge-up phase, the output node voltage typically makes a full transition from 0 to VDD, and the energy used for the transition is relatively independent of the function performed by the circuit. To illustrate the dynamic power dissipation during switching, consider the circuit example given in Fig. 7.1. Here, a two-input NOR gate drives two NAND gates, through interconnection lines. The total capacitive load at the output of the NOR gate consists of (1) the output capacitance of the gate itself, (2) the total interconnect capacitance, and (3) the input capacitances of the driven gates.

[Zoom]Figure-7.1
[Click to enlarge image]


Figure-7.1: A NOR gate driving two NAND gates through interconnection lines.

The output capacitance of the gate consists mainly of the junction parasitic capacitances, which are due to the drain diffusion regions of the MOS transistors in the circuit. The important aspect to emphasize here is that the amount of capacitance is approximately a linear function of the junction area. Consequently, the size of the total drain diffusion area dictates the amount of parasitic capacitance. The interconnect lines between the gates contribute to the second component of the total capacitance. The estimation of parasitic interconnect capacitance was discussed thoroughly in Chapter 4. Note that especially in sub-micron technologies, the interconnect capacitance can become the dominant component, compared to the transistor-related capacitances. Finally, the input capacitances are mainly due to gate oxide capacitances of the transistors connected to the input terminal. Again, the amount of the gate oxide capacitance is determined primarily by the gate area of each transistor.

[Zoom]Figure-7.2
[Click to enlarge image]


Figure-7.2: Generic representation of a CMOS logic gate for switching power calculation

Any CMOS logic gate making an output voltage transition can thus be represented by its nMOS network, pMOS network, and the total load capacitance connected to its output node, as seen in Fig. 7.2. The average power dissipation of the CMOS logic gate, driven by a periodic input voltage waveform with ideally zero rise- and fall-times, can be calculated from the energy required to charge up the output node to VDD and charge down the total output load capacitance to ground level.

equation-7.1 (7.1)

Evaluating this integral yields the well-known expression for the average dynamic (switching) power consumption in CMOS logic circuits.

equation-7.2 (7.2)

or

equation-7.3 (7.3)

Note that the average switching power dissipation of a CMOS gate is essentially independent of all transistor characteristics and transistor sizes. Hence, given an input pattern, the switching delay times have no relevance to the amount of power consumption during the switching events as long as the output voltage swing is between 0 and VDD.

Equation (7.3) shows that the average dynamic power dissipation is proportional to the square of the power supply voltage, hence, any reduction of VDD will significantly reduce the power consumption. Another way to limit the dynamic power dissipation of a CMOS logic gate is to reduce the amount of switched capacitance at the output. This issue will be discussed in more detail later. First, let us briefly examine the effect of reducing the power supply voltage VDD upon switching power consumption and dynamic performance of the gate.

Although the reduction of power supply voltage significantly reduces the dynamic power dissipation, the inevitable design trade-off is the increase of delay. This can be seen by examining the following propagation delay expressions for the CMOS inverter circuit.

equation-7.4 (7.4)

Assuming that the power supply voltage is being scaled down while all other variables are kept constant, it can be seen that the propagation delay time will increase. Figure 7.3 shows the normalized variation of the delay as a function of VDD, where the threshold voltages of the nMOS and the pMOS transistor are assumed to be constant, VT,n = 0.8 V and VT,p = - 0.8 V, respectively. The normalized variation of the average switching power dissipation as a function of the supply voltage is also shown on the same plot.

[Zoom]Figure-7.3
[Click to enlarge image]


Figure-7.3: Normalized propagation delay and average switching power dissipation of a CMOS inverter, as a function of the power supply voltage VDD.

Notice that the dependence of circuit speed on the power supply voltage may also influence the relationship between the dynamic power dissipation and the supply voltage. Equation (7.3) suggests a quadratic improvement (reduction) of power consumption as the power supply voltage is reduced. However, this interpretation assumes that the switching frequency (i.e., the number of switching events per unit time) remains constant. If the circuit is always operated at the maximum frequency allowed by its propagation delay, on the other hand, the number of switching events per unit time (i.e., the operating frequency) will obviously drop as the propagation delay becomes larger with the reduction of the power supply voltage. The net result is that the dependence of switching power dissipation on the power supply voltage becomes stronger than a simple quadratic relationship, shown in Fig. 7.3.

The analysis of switching power dissipation presented above is based on the assumption that the output node of a CMOS gate undergoes one power-consuming transition (0-to-VDD transition) in each clock cycle. This assumption, however, is not always correct; the node transition rate can be smaller than the clock rate, depending on the circuit topology, logic style and the input signal statistics. To better represent this behavior, we will introduce aT (node transition factor), which is the effective number of power-consuming voltage transitions experienced per clock cycle. Then, the average switching power consumption becomes

equation-7.5 (7.5)

The estimation of switching activity and various measures to reduce its rate will be discussed in detail in Section 7.4. Note that in most complex CMOS logic gates, a number of internal circuit nodes also make full or partial voltage transitions during switching. Since there is a parasitic node capacitance associated with each internal node, these internal transitions contribute to the overall power dissipation of the circuit. In fact, an internal node may undergo several transitions while the output node voltage of the circuit remains unchanged, as illustrated in Fig. 7.4.

[Zoom]Figure-7.4
[Click to enlarge image]


Figure-7.4: Switching of the internal node in a two-input NOR gate results in dynamic power dissipation even if the output node voltage remains unchanged.

In the most general case, the internal node voltage transitions can also be partial transitions, i.e., the node voltage swing may be only Vi which is smaller than the full voltage swing of VDD. Taking this possibility into account, the generalized expression for the average switching power dissipation can be written as

equation-7.6 (7.6)

where Ci represents the parasitic capacitance associated with each node and aTi represents the corresponding node transition factor associated with that node.

Short-Circuit Power Dissipation

The switching power dissipation examined above is purely due to the energy required to charge up the parasitic capacitances in the circuit, and the switching power is independent of the rise and fall times of the input signals. Yet, if a CMOS inverter (or a logic gate) is driven with input voltage waveforms with finite rise and fall times, both the nMOS and the pMOS transistors in the circuit may conduct simultaneously for a short amount of time during switching, forming a direct current path between the power supply and the ground, as shown in Fig. 7.5.

The current component which passes through both the nMOS and the pMOS devices during switching does not contribute to the charging of the capacitances in the circuit, and hence, it is called the short-circuit current component. This component is especially prevalent if the output load capacitance is small, and/or if the input signal rise and fall times are large, as seen in Fig. 7.5. Here, the input/output voltage waveforms and the components of the current drawn from the power supply are illustrated for a symmetrical CMOS inverter with small capacitive load. The nMOS transistor in the circuit starts conducting when the rising input voltage exceeds the threshold voltage VT,n. The pMOS transistor remains on until the input reaches the voltage level (VDD - |VT,p|). Thus, there is a time window during which both transistors are turned on. As the output capacitance is discharged through the nMOS transistor, the output voltage starts to fall. The drain-to-source voltage drop of the pMOS transistor becomes nonzero, which allows the pMOS transistor to conduct as well. The short circuit current is terminated when the input voltage transition is completed and the pMOS transistor is turned off. An similar event is responsible for the short- circuit current component during the falling input transition, when the output voltage starts rising while both transistors are on.

Note that the magnitude of the short-circuit current component will be approximately the same during both the rising-input transition and the falling-input transition, assuming that the inverter is symmetrical and the input rise and fall times are identical. The pMOS transistor also conducts the current which is needed to charge up the small output load capacitance, but only during the falling-input transition (the output capacitance is discharged through the nMOS device during the rising-input transition). This current component, which is responsible for the switching power dissipation of the circuit (current component to charge up the load capacitance), is also shown in Fig. 7.5. The average of both of these current components determines the total amount of power drawn from the supply.

For a simple analysis consider a symmetric CMOS inverter with k = kn = kp and VT = VT,n = |VT,p|, and with a very small capacitive load. If the inverter is driven with an input voltage waveform with equal rise and fall times (t = trise = tfall), it can be derived that the time-averaged short circuit current drawn from the power supply is

equation-7.7 (7.7)

[Zoom]Figure-7.5
[Click to enlarge image]


Figure-7.5: Input-output voltage waveforms, the supply current used to charge up the load capacitance and the short-circuit current in a CMOS inverter with small capacitive load. The total current drawn from the power supply is the sum of both current components.

Hence, the short-circuit power dissipation becomes

equation-7.8 (7.8)

Note that the short-circuit power dissipation is linearly proportional to the input signal rise and fall times, and also to the transconductance of the transistors. Hence, reducing the input transition times will obviously decrease the short-circuit current component.

Now consider the same CMOS inverter with a larger output load capacitance and smaller input transition times. During the rising input transition, the output voltage will effectively remain at VDD until the input voltage completes its swing and the output will start to drop only after the input has reached its

[Zoom]Figure-7.6
[Click to enlarge image]


Figure-7.6: Input-output voltage waveforms, the supply current used to charge up the load capacitance and the short-circuit current in a CMOS inverter with larger capacitive load and smaller input transition times. The total current drawn from the power supply is approximately equal to the charge-up current.

final value. Although both the nMOS and the pMOS transistors are on simultaneously during the transition, the pMOS transistor cannot conduct a significant amount of current since the voltage drop between its source and drain terminals is approximately equal to zero. Similarly, the output voltage will remain approximately equal to 0 V during a falling input transition and it will start to rise only after the input voltage completes its swing. Again, both transistors will be on simultaneously during the input voltage transition, yet the nMOS transistor will not be able to conduct a significant amount of current since its drain-to-source voltage is approximately equal to zero. This situation is illustrated in Fig. 7.6, which shows the simulated input and output voltage waveforms of the inverter as well as the short-circuit and dynamic current components drawn from the power supply. Notice that the peak value of the supply current to charge up the output load capacitance is larger in this case. The reason for this is that the pMOS transistor remains in saturation during the entire input transition, as opposed to the previous case shown in Fig. 7.5 where the transistor leaves the saturation region before the input transition is completed.

The discussion concerning the magnitude of the short-circuit current may suggest that the short-circuit power dissipation can be reduced by making the output voltage transition times larger and/or by making the input voltage transition times smaller. Yet this goal should be balanced carefully against other performance goals such as propagation delay, and the reduction of the short-circuit current should be considered as one of the many design requirements that must satisfied by the designer.

Leakage Power Dissipation

The nMOS and pMOS transistors used in a CMOS logic gate generally have nonzero reverse leakage and subthreshold currents. In a CMOS VLSI chip containing a very large number of transistors, these currents can contribute to the overall power dissipation even when the transistors are not undergoing any switching event. The magnitude of the leakage currents is determined mainly by the processing parameters.

Of the two main leakage current components found in a MOSFET, the reverse diode leakage occurs when the pn-junction between the drain and the bulk of the transistor is reversely biased. The reverse-biased drain junction then conducts a reverse saturation current which is eventually drawn from the power supply. Consider a CMOS inverter with a high input voltage, where the nMOS transistor is turned on and the output node voltage is discharged to zero. Although the pMOS transistor is turned off, there will be a reverse potential difference of VDD between its drain and the n-well, causing a diode leakage through the drain junction. The n-well region of the pMOS transistor is also reverse-biased with VDD, with respect to the p-type substrate. Therefore, another significant leakage current component exists due to the n-well junction (Fig. 7.7).

[Zoom]Figure-7.7
[Click to enlarge image]


Figure-7.7: Reverse leakage current paths in a CMOS inverter with high input voltage.

A similar situation can be observed when the input voltage is equal to zero, and the output voltage is charged up to VDD through the pMOS transistor. Then, the reverse potential difference between the nMOS drain region and the p-type substrate causes a reverse leakage current which is also drawn from the power supply (through the pMOS transistor).

The magnitude of the reverse leakage current of a pn-junction is given by the following expression

equation-7.9 (7.9)

where Vbias is the magnitude of the reverse bias voltage across the junction, JS is the reverse saturation current density and the A is the junction area. The typical magnitude of the reverse saturation current density is 1 - 5 pA/mm2, and it increases quite significantly with temperature. Note that the reverse leakage occurs even during the stand-by operation when no switching takes place. Hence, the power dissipation due to this mechanism can be significant in a large chip containing several million transistors.

Another component of leakage currents which occur in CMOS circuits is the subthreshold current, which is due to carrier diffusion between the source and the drain region of the transistor in weak inversion. An MOS transistor in the subthreshold operating region behaves similar to a bipolar device and the subthreshold current exhibits an exponential dependence on the gate voltage. The amount of the subthreshold current may become significant when the gate-to-source voltage is smaller than, but very close to the threshold voltage of the device. In this case, the power dissipation due to subthreshold leakage can become comparable in magnitude to the switching power dissipation of the circuit. The subthreshold leakage current is illustrated in Fig. 7.8.

[Zoom]Figure-7.8
[Click to enlarge image]


Figure-7.8: Subthreshold leakage current path in a CMOS inverter with high input voltage.

Note that the subthreshold leakage current also occurs when there is no switching activity in the circuit, and this component must be carefully considered for estimating the total power dissipation in the stand-by operation mode. The subthreshold current expression is given below, in order to illustrate the exponential dependence of the current on terminal voltages.

equation-7.10 (7.10)

One relatively simple measure to limit the subthreshold current component is to avoid very low threshold voltages, so that the VGS of the nMOS transistor remains safely below VT,n when the input is logic zero, and the |VGS| of the pMOS transistor remains safely below |VT,p| when the input is logic one.

In addition to the three major sources of power consumption in CMOS digital integrated circuits discussed here, some chips may also contain components or circuits which actually consume static power. One example is the pseudo-nMOS logic circuits which utilize a pMOS transistor as the pull-up device. The presence of such circuit blocks should also be taken into account when estimating the overall power dissipation of a complex system.

[Table of Contents] [Top of Document]


7.3 Low-Power Design Through Voltage Scaling

The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage. Therefore, reduction of VDD emerges as a very effective means of limiting the power consumption. Given a certain technology, the circuit designer may utilize on-chip DC- DC converters and/or separate power pins to achieve this goal. As we have already discussed briefly in Section 7.2, however, the savings in power dissipation comes at a significant cost in terms of increased circuit delay. When considering drastic reduction of the power supply voltage below the new standard of 3.3 V, the issue of time-domain performance should also be addressed carefully. In the following, we will examine reduction of the power supply voltage with a corresponding scaling of threshold voltages, in order to compensate for the speed degradation. At the system level, architectural measures such as the use of parallel processing blocks and/or pipelining techniques also offer very feasible alternatives for maintaining the system performance (throughput) despite aggressive reduction of the power supply voltage.

The propagation delay expression (7.4) clearly shows that the negative effect of reducing the power supply voltage upon delay can be compensated for, if the threshold voltage of the transistor is scaled down accordingly. However, this approach is limited due to the fact that the threshold voltage cannot be scaled to the same extent as the supply voltage. When scaled linearly, reduced threshold voltages allow the circuit to produce the same speed-performance at a lower VDD. Figure 7.9 shows the variation of the propagation delay of a CMOS inverter as a function of the power supply voltage, and for different threshold voltage values.

[Zoom]Figure-7.9
[Click to enlarge image]


Figure-7.9: Variation of the normalized propagation delay of a CMOS inverter, as a function of the power supply voltage VDD and the threshold voltage VT.

We can see, for example, that reducing the threshold voltage from 0.8 V to 0.2 V can improve the delay at VDD = 2 V by a factor of 2. The influence of threshold voltage reduction upon propagation delay is especially pronounced at low power supply voltages. It should be noted, however, that the threshold voltage reduction approach is restricted by the concerns on noise margins and the subthreshold conduction. Smaller threshold voltages lead to smaller noise margins for the CMOS logic gates. The subthreshold conduction current also sets a severe limitation against reducing the threshold voltage. For threshold voltages smaller than 0.2 V, leakage power dissipation due to subthreshold conduction may become a very significant component of the overall power consumption.

In certain types of applications, the reduction of circuit speed which comes as a result of voltage scaling can be compensated for at the expense of more silicon area. In the following, we will examine the use of architectural measures such as pipelining and hardware replication to offset the loss of speed at lower supply voltages.

Pipelining Approach

First, consider the single functional block shown in Fig. 7.10 which implements a logic function F(INPUT) of the input vector, INPUT. Both the input and the output vectors are sampled through register arrays, driven by a clock signal CLK. Assume that the critical path in this logic block (at a power supply voltage of VDD) allows a maximum sampling frequency of fCLK; in other words, the maximum input-to-output propagation delay tP,max of this logic block is equal to or less than TCLK = 1/fCLK. Figure 7.10 also shows the simplified timing diagram of the circuit. A new input vector is latched into the input register array at each clock cycle, and the output data becomes valid with a latency of one cycle.

[Zoom]Figure-7.10
[Click to enlarge image]


Figure-7.10: Single-stage implementation of a logic function and its simplified timing diagram.

Let Ctotal be the total capacitance switched every clock cycle. Here, Ctotal consists of (i) the capacitance switched in the input register array, (ii) the capacitance switched to implement the logic function, and (iii) the capacitance switched in the output register array. Then, the dynamic power consumption of this structure can be found as

equation-7.11 (7.11)

Now consider an N-stage pipelined structure for implementing the same logic function, as shown in Fig. 7.11. The logic function F(INPUT) has been partitioned into N successive stages, and a total of (N-1) register arrays have been introduced, in addition to the original input and output registers, to create the pipeline. All registers are clocked at the original sample rate, fCLK. If all stages of the partitioned function have approximately equal delay of

equation-7.12 (7.12)

Then the logic blocks between two successive registers can operate N-times slower while maintaining the same functional throughput as before. This implies that the power supply voltage can be reduced to a value of VDD,new, to effectively slow down the circuit by a factor of N. The supply voltage to achieve this reduction can be found by solving (7.4).

[Zoom]Figure-7.11
[Click to enlarge image]


Figure-7.11: N-stage pipeline structure realizing the same logic function as in Fig. 7.10. The maximum pipeline stage delay is equal to the clock period, and the latency is N clock cycles.

The dynamic power consumption of the N-stage pipelined structure with a lower supply voltage and with the same functional throughput as the single-stage structure can be approximated by

equation-7.13 (7.13)

where Creg represents the capacitance switched by each pipeline register. Then, the power reduction factor achieved in a N-stage pipeline structure is

equation-7.14 (7.14)

As an example, consider replacing a single-stage logic block (VDD = 5 V, fCLK = 20 MHz) with a four-stage pipeline structure, running at the same clock frequency. This means that the propagation delay of each pipeline stage can be increased by a factor of 4 without sacrificing the data throughput. Assuming that the magnitude of the threshold voltage of all transistors is 0.8 V, the desired speed reduction can be achieved by reducing the power supply voltage from 5 V to approximately 2 V (see Fig. 7.9). With a typical ratio of (Creg/Ctotal) = 0.1, the overall power reduction factor is found from (7.14) as 1/5. This means that replacing the original single-stage logic block with a four-stage pipeline running at the same clock frequency and reducing the power supply voltage from 5 V to 2 V will provide a dynamic power savings of about 80%, while maintaining the same throughput as before.

The architectural modification described here has a relatively small area overhead. A total of (N-1) register arrays have to be added to convert the original single-stage structure into a pipeline. While trading off area for lower power, this approach also increases the latency from one to N clock cycles. Yet in many applications such as signal processing and data encoding, latency is not a very significant concern.

Parallel Processing Approach (Hardware Replication)

Another possibility of trading off area for lower power dissipation is to use parallelism, or hardware replication. This approach could be useful especially when the logic function to be implemented is not suitable for pipelining. Consider N identical processing elements, each implementing the logic function F(INPUT) in parallel, as shown in Fig. 7.12. Assume that the consecutive input vectors arrive at the same rate as in the single-stage case examined earlier. The input vectors are routed to all the registers of the N processing blocks. Gated clock signals, each with a clock period of (N TCLK), are used to load each register every N clock cycles. This means that the clock signals to each input register are skewed by TCLK, such that each one of the N consecutive input vectors is loaded into a different input register. Since each input register is clocked at a lower frequency of (fCLK / N), the time allowed to compute the function for each input vector is increased by a factor of N. This implies that the power supply voltage can be reduced until the critical path delay equals the new clock period of (N TCLK). The outputs of the N processing blocks are multiplexed and sent to an output register which operates at a clock frequency of fCLK, ensuring the same data throughput rate as before. The timing diagram of this parallel arrangement is given in Fig. 7.13.

Since the time allowed to compute the function for each input vector is increased by a factor of N, the power supply voltage can be reduced to a value of VDD,new, to effectively slow down the circuit. The new supply voltage can be found, as in the pipelined case, by solving (7.4). The total dynamic power dissipation of the parallel structure (neglecting the dissipation of the multiplexor) is found as the sum of the power dissipated by the input registers and the logic blocks operating at a clock frequency of (fCLK / N), and the output register operating at a clock frequency of fCLK.

equation-7.15 (7.15)

[Zoom]Figure-7.12
[Click to enlarge image]


Figure-7.12: N-block parallel structure realizing the same logic function as in Fig. 7.10. Notice that the input registers are clocked at a lower frequency of (fCLK / N).

Note that there is also an additional overhead which consists of the input routing capacitance, the output routing capacitance and the capacitance of the output multiplexor structure, all of which are increasing functions of N. If this overhead is neglected, the amount of power reduction achievable in a N-block parallel implementation is

equation-7.16 (7.16)

The lower bound of dynamic power reduction realizable with architecture-driven voltage scaling is found, assuming zero threshold voltage, as

(7.17)

equation-7.17 (7.17)

[Zoom]Figure-7.13
[Click to enlarge image]


Figure-7.13: Simplified timing diagram of the N-block parallel structure shown in Fig. 7.12.

Two obvious consequences of this approach are the increased area and the increased latency. A total of N identical processing blocks must be used to slow down the operation (clocking) speed by a factor of N. In fact, the silicon area will grow even faster than the number of processor because of signal routing and the overhead circuitry. The timing diagram in Fig. 7.13 shows that the parallel implementation has a latency of N clock cycles, as in the N-stage pipelined implementation. Considering its smaller area overhead, however, the pipelined approach offers a more efficient alternative for reducing the power dissipation while maintaining the throughput.

[Table of Contents] [Top of Document]


7.4 Estimation and Optimization of Switching Activity

In the previous section, we have discussed methods for minimizing dynamic power consumption in CMOS digital integrated circuits by supply voltage scaling. Another approach to low power design is to reduce the switching activity and the amount of the switched capacitance to the minimum level required to perform a given task. The measures to accomplish this goal can range from optimization of algorithms to logic design, and finally to physical mask design. In the following, we will examine the concept of switching activity, and introduce some of the approaches used to reduce it. We will also examine the various measures used to minimize the amount of capacitance which must be switched to perform a given task in a circuit.

The Concept of Switching Activity

It was already discussed in Section 7.2 that the dynamic power consumption of a CMOS logic gate depends, among other parameters, also on the node transition factor aT, which is the effective number of power-consuming voltage transitions experienced by the output capacitance per clock cycle. This parameter, also called the switching activity factor, depends on the Boolean function performed by the gate, the logic family, and the input signal statistics.

Assuming that all input signals have an equal probability to assume a logic "0" or a logic "1" state, we can easily investigate the output transition probabilities for different types of logic gates. First, we will introduce two signal probabilities, P0 and P1. P0 corresponds to the probability of having a logic "0" at the output, and P1 = (1 - P0) corresponds to the probability of having a logic "1" at the output. Therefore, the probability that a power-consuming (0-to-1) transition occurs at the output node is the product of these two output signal probabilities. Consider, for example, a static CMOS NOR2 gate. If the two inputs are independent and uniformly distributed, the four possible input combinations (00, 01, 10, 11) are equally likely to occur. Thus, we can find from the truth table of the NOR2 gate that P0 = 3/4, and P1 = 1/4. The probability that a power-consuming transition occurs at the output node is therefore

equation-7.18 (7.18)

The transition probabilities can be shown on a state transition diagram which consists of the only two possible output states and the possible transitions among them (Fig. 7.14). In the general case of a CMOS logic gate with n input variables, the probability of a power-consuming output transition can be expressed as a function of n0, which is the number of zeros in the output column of the truth table.

equation-7.19 (7.19)

[Zoom]Figure-7.14
[Click to enlarge image]


Figure-7.14: State transition diagram and state transition probabilities of a NOR2 gate.

The output transition probability is shown as a function of the number of inputs in Fig. 7.15, for different types of logic gates and assuming equal input probabilities. For a NAND or NOR gate, the truth table contains only one "0" or "1", respectively, regardless of the number of inputs. Therefore, the output transition probability drops as the number of inputs is increased. In a XOR gate, on the other hand, the truth table always contains an equal number of logic "0" and logic "1" values. The output transition probability therefore remains constant at 0.25.

[Zoom]Figure-7.15
[Click to enlarge image]


Figure-7.15: Output transition probabilities of different logic gates, as a function of the number of inputs. Note that the transition probability of the XOR gate is independent of the number or inputs.

In multi-level logic circuits, the distribution of input signal probabilities is typically not uniform, i.e., one cannot expect to have equal probabilities for the occurrence of a logic "0" and a logic "1". Then, the output transition probability becomes a function of the input probability distributions. As an example, consider the NOR2 gate examined above. Let P1,A represent the probability of having a logic "1" at the input A, and P1,B represent the probability of having a logic "1" at the input B. The probability of obtaining a logic "1" at the output node is

equation-7.20 (7.20)

Using this expression, the probability of a power-consuming output transition is found as a function of P1,A and P1,B.

equation-7.21 (7.21)

Figure 7.16 shows the distribution of the output transition probability in a NOR2 gate, as a function of two input probabilities. It can be seen that the evaluation of switching activity becomes a complicated problem in large circuits, especially when sequential elements, reconvergent nodes and feedback loops are involved. The designer must therefore rely on computer-aided design (CAD) tools for correct estimation of switching activity in a given network.

[Zoom]Figure-7.16
[Click to enlarge image]


Figure-7.16: Output transition probability of NOR2 gate as a function of two input probabilities.

In dynamic CMOS logic circuits, the output node is precharged during every clock cycle. If the output node was discharged (i.e., if the output value was equal to "0") in the previous cycle, the pMOS precharge transistor will draw a current from the power supply during the precharge phase. This means that the dynamic CMOS logic gate will consume power every time the output value equals "0", regardless of the preceding or following values. Therefore, the power consumption of dynamic logic gates is determined by the signal-value probability of the output node and not by the transition probability. From the discussion above, we can see that signal-value probabilities are always larger than transition probabilities, hence, the power consumption of dynamic CMOS logic gates is typically larger than static CMOS gates under the same conditions.

Reduction of Switching Activit

Switching activity in CMOS digital integrated circuits can be reduced by algorithmic optimization, by architecture optimization, by proper choice of logic topology and by circuit-level optimization. In the following, we will briefly some of the measures that can be applied to optimize the switching probabilities, and hence, the dynamic power consumption.

Algorithmic optimization depends heavily on the application and on the characteristics of the data such as dynamic range, correlation, statistics of data transmission. Some of the techniques can be applied only for specific algorithms such as Digital Signal Processing (DSP) and cannot be used for general purpose processing. One possibility is the choosing a proper vector quantization (VQ) algorithm which results in minimum switching activity. For example, the number of memory accesses, the number of multiplications and the number of additions can be reduced by about a factor of 30 if differential tree search algorithm is used instead of the full search algorithm.

The representation of data may also have a significant impact on switching activity at the system level. In applications where data bits change sequentially and are highly correlated (such as the address bits to access instructions) for example, the use of Gray coding leads to a reduced number of transitions compared to simple binary coding. Another example is using sign- magnitude representation instead of the conventional two's complement representation for signed data. A change in sign will cause transitions of the higher-order bits in the two's complement representation, whereas only the sign bit will change in sign-magnitude representation. Therefore, the switching activity can be reduced by using the sign-magnitude representation in applications where the data sign changes are frequent.

An important architecture-level measure to reduce switching activity is based on delay balancing and the reduction of glitches. In multi-level logic circuits, the finite propagation delay from one logic block to the next can cause spurious signal transitions, or glitches as a result of critical races or dynamic hazards. In general, if all input signals of a gate change simultaneously, no glitching occurs. But a dynamic hazard or glitch can occur if input signals change at different times. Thus, a node can exhibit multiple transitions in a single clock cycle before settling to the correct logic level (Fig. 7.17). In some cases, the signal glitches are only partial, i.e., the node voltage does not make a full transition between the ground and VDD levels, yet even partial glitches can have a significant contribution to dynamic power dissipation.

[Zoom]Figure-7.17
[Click to enlarge image]


Figure-7.17: Signal glitching in multi-level static CMOS circuits.

Glitches occur primarily due to a mismatch or imbalance in the path lengths in the logic network. Such a mismatch in path length results in a mismatch of signal timing with respect to the primary inputs. As an example, consider the simple parity network shown in Fig. 7.18. Assuming that all XOR blocks have the same delay, it can be seen that the network in Fig. 7.18(a) will suffer from glitching due to the wide disparity between the arrival times of the input signals for the gates. In the network shown in Fig. 7.18(b), on the other hand, all arrival times are identical because the delay paths are balanced. Such redesign can significantly reduce the glitching transitions, and consequently, the dynamic power dissipation in complex multi-level networks. Also notice that the tree structure shown in Fig. 7.18(b) results in smaller overall propagation delay. Finally, it should be noted that glitching is not a significant issue in multi-level dynamic CMOS logic circuits, since each node undergoes at most one transition per clock cycle.

[Zoom]Figure-7.18
[Click to enlarge image]


Figure-7.18: (a) Implementation of a four-input parity (XOR) function using a chain structure. (b) Implementation of the same function using a tree structure which will reduce glitching transitions.

[Table of Contents] [Top of Document]


7.5 Reduction of Switched Capacitance

It was already established in the previous sections that the amount of switched capacitance plays a significant role in the dynamic power dissipation of the circuit. Hence, reduction of this parasitic capacitance is a major goal for low-power design of digital integrated circuits.

At the system level, one of the approaches to reduce the switched capacitance is to limit the use of shared resources. A simple example is the use of a global bus structure for data transmission between a large number of operational modules (Fig. 7.19). If a single shared bus is connected to all modules as in Fig. 7.19(a), this structure results in a large bus capacitance due to (i) the large number of drivers and receivers sharing the same transmission medium, and (ii) the parasitic capacitance of the long bus line. Obviously, driving the large bus capacitance will require a significant amount of power consumption during each bus access. Alternatively, the global bus structure can be partitioned into a number of smaller dedicated local busses to handle the data transmission between neighboring modules, as sown in Fig. 7.19(b). In this case, the switched capacitance during each bus access is significantly reduced, yet multiple busses may increase the overall routing area on chip.

[Zoom]Figure-7.19
[Click to enlarge image]


Figure-7.19: (a) Using a single global bus structure for connecting a large number of modules on chip results in large bus capacitance and large dynamic power dissipation. (b) Using smaller local busses reduces the amount of switched capacitance, at the expense of additional chip area.

The type of logic style used to implement a digital circuit also affects the physical capacitance of the circuit. The physical capacitance is a function of the number of transistors that are required to implement a given function. For example, one approach to reduce the physical capacitance is to use transfer gates over conventional CMOS logic gates to implement logic functions. Pass-gate logic design is attractive since fewer transistors are required certain functions such as XOR and XNOR. In many arithmetic operations where binary adders and multipliers are used, pass transistor logic offers significant advantages. Similarly, multiplexors and other key building blocks can also be simplified using this design style.

The amount of parasitic capacitance that is switched (i.e. charged up or charged down) during operation can be also reduced at the physical design level, or mask level. The parasitic gate and diffusion capacitances of MOS transistors in the circuit typically constitute a significant amount of the total capacitance in a combinational logic circuit. Hence, a simple mask-level measure to reduce power dissipation is keeping the transistors (especially the drain and source regions) at minimum dimensions whenever possible and feasible, thereby minimizing the parasitic capacitances. Designing a logic gate with minimum-size transistors certainly affects the dynamic performance of the circuit, and this trade-off between dynamic performance and power dissipation should be carefully considered in critical circuits. Especially in circuits driving a large extrinsic capacitive loads, e.g., large fan-out or routing capacitances, the transistors must be designed with larger dimensions. Yet in many other cases where the load capacitance of a gate is mainly intrinsic, the transistor sizes can be kept at minimum. Note that most standard cell libraries are designed with larger transistors in order to accommodate a wide range of capacitive loads and performance requirements. Consequently, a standard-cell based design may have considerable overhead in terms of switched capacitance in each cell.

[Table of Contents] [Top of Document]


7.6 Adiabatic Logic Circuits

In conventional level-restoring CMOS logic circuits with rail-to-rail output voltage swing, each switching event causes an energy transfer from the power supply to the output node, or from the output node to the ground. During a 0-to-VDD transition of the output, the total output charge Q = Cload VDD is drawn from the power supply at a constant voltage. Thus, an energy of Esupply = Cload VDD2 is drawn from the power supply during this transition. Charging the output node capacitance to the voltage level VDD means that at the end of the transition, the amount of energy Estored = Cload VDD2/2 is stored on the output node. Thus, half of the injected energy from the power supply is dissipated in the pMOS network while only one half is delivered to the output node. During a subsequent VDD-to-0 transition of the output node, no charge is drawn from the power supply and the energy stored in the load capacitance is dissipated in the nMOS network.

To reduce the dissipation, the circuit designer can minimize the switching events, decrease the node capacitance, reduce the voltage swing, or apply a combination of these methods. Yet in all cases, the energy drawn from the power supply is used only once before being dissipated. To increase the energy efficiency of logic circuits, other measures must be introduced for recycling the energy drawn from the power supply. A novel class of logic circuits called adiabatic logic offers the possibility of further reducing the energy dissipated during switching events, and the possibility of recycling, or reusing, some of the energy drawn from the power supply. To accomplish this goal, the circuit topology and the operation principles have to be modified, sometimes drastically. The amount of energy recycling achievable using adiabatic techniques is also determined by the fabrication technology, switching speed and voltage swing.

The term "adiabatic" is typically used to describe thermodynamic processes that have no energy exchange with the environment, and therefore, no energy loss in the form of dissipated heat. In our case, the electric charge transfer between the nodes of a circuit will be viewed as the process, and various techniques will be explored to minimize the energy loss, or heat dissipation, during charge transfer events. It should be noted that fully adiabatic operation of a circuit is an ideal condition which may only be approached asymptotically as the switching process is slowed down. In practical cases, energy dissipation associated with a charge transfer event is usually composed of an adiabatic component and a non-adiabatic component. Therefore, reducing all energy loss to zero may not be possible, regardless of the switching speed.

Adiabatic Switching

Consider the simple circuit shown in Fig. 7.20 where a load capacitance is charged by a constant current source. This circuit is similar to the equivalent circuit used to model the charge-up event in conventional CMOS circuits, with the exception that in conventional CMOS, the output capacitance is charged by a constant voltage source and not by a constant current source. Here, R represents the on-resistance of the pMOS network. Also note that a constant charging current corresponds to a linear voltage ramp. Assuming that the capacitance voltage VC is equal to zero initially, the variation of the voltage as a function of time can be found as

equation-7.22 (7.22)

Hence, the charging current can be expressed as a simple function of VC and time t.

equation-7.23 (7.23)

The amount of energy dissipated in the resistor R from t = 0 to t = T can be found as

equation-7.24 (7.24)

Combining (7.23) and (7.24), the dissipated energy can also be expressed as follows.

equation-7.25 (7.25)

[Zoom]Figure-7.20
[Click to enlarge image]


Figure-7.20: Constant-current source charging a load capacitance C, through a resistance R.

Now, a number of simple observations can be made based on (7.25). First, the dissipated energy is smaller than for the conventional case if the charging time T is larger than 2 RC. In fact, the dissipated energy can be made arbitrarily small by increasing the charging time, since Ediss is inversely proportional to T. Also, we observe that the dissipated energy is proportional to the resistance R, as opposed to the conventional case where the dissipation depends on the capacitance and the voltage swing. Reducing the on-resistance of the pMOS network will reduce the energy dissipation.

We have seen that the constant-current charging process efficiently transfers energy from the power supply to the load capacitance. A portion of the energy thus stored in the capacitance can also be reclaimed by reversing the current source direction, allowing the charge to be transferred from the capacitance back into the supply. This possibility is unique to adiabatic operation, since in conventional CMOS circuits the energy is dissipated after being used only once. The constant-current power supply must certainly be capable of retrieving the energy back from the circuit. Adiabatic logic circuits thus require non-standard power supplies with time-varying voltage, also called pulsed-power supplies. The additional hardware overhead associated with these specific power supply circuits is one of the design trade-off that must be considered when using adiabatic logic.

Adiabatic Logic Gates

In the following, we will examine simple circuit configurations which can be used for adiabatic switching. Note that most of the research on adiabatic logic circuits are relatively recent, therefore, the circuits presented here should be considered as examples only. Other circuit topologies are also possible, but the overall approach of energy recycling should still be applicable, regardless of the specific circuit configuration.

First, consider the adiabatic amplifier circuit shown in Fig. 7.21, which can be used to drive capacitive loads. It consists of two CMOS transmission gates and two nMOS clamp transistors. Both the input (X) and the output (Y) are dual-rail encoded, which means that the inverses of both signals are also available, to control the CMOS T-gates.

[Zoom]Figure-7.21
[Click to enlarge image]


Figure-7.21: Adiabatic amplifier circuit which transfers the complementary input signals to its complementary outputs through CMOS transmission gates.

When the input signal X is set to a valid value, one of the two transmission gates becomes transparent. Next, the amplifier is energized by applying a slow voltage ramp VA, rising from zero to VDD. The load capacitance at one of the two complementary outputs is adiabatically charged to VDD through the transmission gate, while the other output node remains clamped to ground potential. When the charging process is completed, the output signal pair is valid and can be used as an input to other, similar circuits. Next, the circuit is de-energized by ramping the voltage VA back to zero. Thus, the energy that was stored in the output load capacitance is retrieved by the power supply. Note that the input signal pair must be valid and stable throughout this sequence.

[Zoom]Figure-7.22
[Click to enlarge image]


Figure-7.22: (a) The general circuit topology of a conventional CMOS logic gate. (b) The topology of an adiabatic logic gate implementing the same function. Note the difference in charge-up and charge-down paths for the output capacitance.

The simple circuit principle of the adiabatic amplifier can be extended to allow the implementation of arbitrary logic functions. Figure 7.22 shows the general circuit topology of a conventional CMOS logic gate and an adiabatic counterpart. To convert a conventional CMOS logic gate into an adiabatic gate, the pull-up and pull-down networks must be replaced with complementary transmission-gate networks. The T-gate network implementing the pull-up function is used to drive the true output of the adiabatic gate, while the T-gate network implementing the pull-down function drives the complementary output node. Note the all inputs should also be available in complementary form. Both networks in the adiabatic logic circuit are used to charge-up as well as charge-down the output capacitances, which ensures that the energy stored at the output node can be retrieved by the power supply, at the end of each cycle. To allow adiabatic operation, the DC voltage source of the original circuit must be replaced by a pulsed-power supply with ramped voltage output. Note that the circuit modifications which are necessary to convert a conventional CMOS logic circuit into an adiabatic logic circuit increase the device count by a factor of two. Also, the reduction of energy dissipation comes at the cost of slower switching speed, which is the ultimate trade-off in all adiabatic methods.

Stepwise Charging Circuits

We have seen earlier that the dissipation during a charge-up event can be minimized, and in the ideal case be reduced to zero, by using a constant-current power supply. This requires that the power supply be able to generate linear voltage ramps. Practical supplies can be constructed by using resonant inductor circuits to approximate the constant output current and the linear voltage ramp with sinusoidal signals. But the use of inductors presents several difficulties at the circuit level, especially in terms of chip-level integration and overall efficiency.

An alternative to using pure voltage ramps is to use stepwise supply voltage waveforms, where the output voltage of the power supply is increased and decreased in small increments during charging and discharging. Since the energy dissipation depends on the average voltage drop traversed by the charge that flows onto the load capacitance, using smaller voltage steps, or increments, should reduce the dissipation considerably.

Figure 7.23 shows a CMOS inverter driven by a stepwise supply voltage waveform. Assume that the output voltage is equal to zero initially. With the input voltage set to logic low level, the power supply voltage VA is increased from 0 to VDD, in n equal voltage steps (Fig. 7.24). Since the pMOS transistor is conducting during this transition, the output load capacitance will be charged up in a stepwise manner. The on-resistance of the pMOS transistor can be represented by the linear resistor R. Thus, the output load capacitance is being charged up through a resistor, in small voltage increments. For the ith time increment, the amount of capacitor current can be expressed as

equation-7.26 (7.26)

Solving this differential equation with the initial condition Vout(ti) = VA(i) yields

equation-7.27 (7.27)

[Zoom]Figure-7.23
[Click to enlarge image]


Figure-7.23: A CMOS inverter circuit with a stepwise-increasing supply voltage.
[Zoom]Figure-7.24
[Click to enlarge image]


Figure-7.24: Equivalent circuit, and the input and output voltage waveforms of the CMOS inverter circuit in Fig. 7.23 (stepwise charge-up case).

Here, n is the number of steps of the supply voltage waveform. The amount of energy dissipated during one voltage step increment can now be found as

equation-7.28 (7.28)

Since n steps are used to charge up the capacitance to VDD, the total dissipation is

equation-7.29 (7.29)

According to this simplified analysis, charging the output capacitance with n voltage steps, or increments, reduces the energy dissipation per cycle by a factor of n. Therefore, the total power dissipation is also reduced by a factor of n using stepwise charging. This result implies that if the voltage steps can be made very small and the number of voltage steps n approaches infinity (i.e., if the supply voltage is a slow linear ramp), the energy dissipation will approach zero.

Another example for simple stepwise charging circuits is the stepwise driver for capacitive loads, implemented with nMOS devices as shown in Fig. 7.25. Here, a bank of n constant voltage supplies with evenly distributed voltage levels is used. The load capacitance is charged up by connecting the constant voltage sources V1 through VN to the load successively, using an array of switch devices. To discharge the load capacitance, the constant voltage sources are connected to the load in the reverse sequence.

The switch devices are shown as nMOS transistors in Fig. 7.25, yet some of them may be replaced by pMOS transistors to prevent the undesirable threshold-voltage drop problem and the substrate-bias effects at higher voltage levels. One of the most significant drawbacks of this circuit configuration is the need for multiple supply voltages. A power supply system capable of efficiently generating n different voltage levels would be complex and expensive. Also, the routing of n different supply voltages to each circuit in a large system would create a significant overhead. In addition, the concept is not easily extensible to general logic gates. Therefore, stepwise charging driver circuits can be best utilized for driving a few critical nodes in the circuit that are responsible for a large portion of the overall power dissipation, such as output pads and large busses.

In general, we have seen that adiabatic logic circuits can offer significant reduction of energy dissipation, but usually at the expense of switching times. Therefore, adiabatic logic circuits can be best utilized in cases where delay is not critical. Moreover, the realization of unconventional power supplies needed in adiabatic circuit configurations typically results in an overhead both in terms of overall energy dissipation and in terms of silicon area. These issues should be carefully considered when adiabatic logic is used as a method for low-power design.

[Zoom]Figure-7.25
[Click to enlarge image]


Figure-7.25: Stepwise driver circuit for capacitive loads. The load capacitance is successively connected to constant voltage sources Vi through an array of switch devices.

[Table of Contents] [Top of Document]


References

  1. A.P. Chandrakasan and R.W. Brodersen, Low Power Digital CMOS Design, Norwell, MA: Kluwer Academic Publishers, 1995.

  2. J.M. Rabaey and M. Pedram, ed., Low Power Design Methodologies, Norwell, MA: Kluwer Academic Publishers, 1995.

  3. A. Bellaouar and M.I. Elmasry, Low-Power Digital VLSI Design, Norwell, MA: Kluwer Academic Publishers, 1995.

  4. F. Najm, "A survey of power estimation techniques in VLSI circuits," IEEE Transactions on VLSI Systems, vol. 2, pp. 446-455, December 1994.

  5. W.C. Athas, L. Swensson, J.G. Koller and E. Chou, "Low-power digital systems based on adiabatic- switching principles," IEEE Transactions on VLSI Systems, vol. 2, pp. 398-407, December 1994.

[Table of Contents] [Top of Document]


This chapter edited by Y. Leblebici