Low-Power Adder Design for Nano-Scale CMOS
(Short Paper)

S. R. Talebian* and S. Hosseini-Khayat*

Abstract: A fast low-power 1-bit full adder circuit suitable for nano-scale CMOS implementation is presented. Out of the three modules in a common full-adder circuit, we have replaced one with a new design, and optimized another one, all with the goal to reduce the static power consumption. The design has been simulated and evaluated using the 65 nm PTM models.

Keywords: Nano-Scale CMOS Technology, Static Power Consumption, Adder Subcomponents.

1 Introduction
Adders are important building blocks in arithmetic units [1]. Therefore, any optimization of their speed and power consumption can have considerable impact on power efficiency and speed of the overall system. Some of the classical designs of 1-bit full adder circuits use standard static CMOS and complementary pass-transistor logic circuits [2]. There are some new designs and optimizations on full adder circuits for deep submicron technology [3-7] however they focus on 0.18 or 0.35-μm CMOS technology. These designs have better performance than classical designs, especially those presented in [6] and [7]. Therefore any new design should normally be compared to these new designs.

The migration towards deep submicron technologies has drastically changed the face of low-power design. The static leakage current is now a significant component of the overall power consumption. As a result, special attention must be paid to minimizing the static power. Therefore, in this paper the focus is on static power consumption and suitable techniques are used to reduce leakage power consumption.

It is shown that by placing more than one transistor serially, the static power consumption will be decreased [8]. In this paper, this technique will be used effectively to decrease the leakage power consumption of the 1-bit full adder circuit. Here the focus is on deep submicron technology, therefore only the optimized versions of 1-bit full adder circuit sub-components specially from leakage power consumption point of view are presented.

2 Adder Subcomponents
A 1-bit full adder can be divided into three main modules [3-7]. This work can be done by extracting the logical equation of 1bit-full adder circuit. Equations (1) and (2) show the logical equations of the full adder outputs.

\[ \text{Sum} = A \oplus B \oplus C \]  
\[ C_{out} = A \cdot B + C_{in} \cdot (A \oplus B) \]

By considering H as the XOR of A and B signals the output logical equations changed to Eq. (3) and Eq. (4).

\[ \text{Sum} = H \oplus C_{in} = H' \cdot C_{in}' + H \cdot C_{in} \]  
\[ C_{out} = A \cdot H' + C_{in} \cdot H \]

These equations show that the three main part subcomponents of the full adder circuit. The main part of them is an XOR/XNOR circuit that produces the H and H' signals. The other parts are XOR and MUX circuits that work with respect to the Eq. (3) and Eq. (4). The XOR circuit is really a MUX circuit that selects H and H' signals by C in. Then subcomponent II can be implemented by MUX. Fig. 1 shows the full adder circuit subcomponents. The goal is to design modules that minimize the static power consumption while providing enough drive current for the next stages. This can be done by disallowing too few transistors to exist in the path from the power supply to ground. We do an optimization on module I and present a new circuit on module II.

3 Optimization of Module I
Fig. 2 shows the common designs for module I [3-7]. Circuit (c) from [7] has potentially high static power
consumption at deep submicron because it uses an inverter. In inverter, there is a direct path between power supply and ground with only two transistors. Therefore, inverter is one of the high static power elements. Circuit (a) from [6] does not have sufficient speed and drive power compared to the alternative designs. On the other hand, circuit (b) is an enhanced version of circuit (a) with the following additions: 2 PMOS transistors to increase its speed when the inputs are "00" and 2 NMOS transistors to increase its speed when the inputs are "11" [7]. But this circuit needs larger area. The delay in the "00" case is larger than the "11" case because two PMOS transistors must work, but in case "11" in which two NMOS transistors work. Therefore, the transistors N3 and N4 can be removed. This idea is implemented in circuit (d).

Table 1 shows the HSPICE simulation results for 65nm PTM (Predictive Technology Model) models [9]. The parameter PDP is the power delay product, and the parameter power²×delay is shown to emphasize the power consumption. The power consumption of our optimized circuit is lower than other circuits and its performance is medium. Thus we maintain speed while reducing power consumption. The leakage power consumption (Leakage Power) that is presented here is obtained by .op instruction of HSPICE when all inputs are in low logic. The signals frequency of all simulated circuits is 2 GHz and the capacitive load of the outputs is set to 2 fF suggested by the ITRS technology roadmap [10]. Although the gate capacitance is very small but the interconnection capacitance is considerable in nano-meter technologies.

According to Table 1 the leakage power consumption of module I has at least 7% improvement in comparison to the proposed circuit that is presented in [7]. The test configurations as well as the waveforms for the simulation are shown Fig. 3.

<table>
<thead>
<tr>
<th>65 nm Tech.</th>
<th>Comparison of Main Circuits for Module I</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Circuit (b)</td>
</tr>
<tr>
<td>No. of Tr.</td>
<td>10</td>
</tr>
<tr>
<td>Leakage</td>
<td></td>
</tr>
<tr>
<td>Power (nW)</td>
<td>135.8</td>
</tr>
<tr>
<td>Power (μW)</td>
<td>38.498</td>
</tr>
<tr>
<td>Delay (ps)</td>
<td>39.051</td>
</tr>
<tr>
<td>PDP (fJ)</td>
<td>1.5033</td>
</tr>
<tr>
<td>Power²×Delay</td>
<td>57.8740</td>
</tr>
</tbody>
</table>

4 Design of Module II
Two of the most common circuits for module II are shown in Fig. 4(a) and Fig. 4(b) [3-7]. However, these circuits are not suitably optimized for deep submicron CMOS technology. Circuit (a) lacks sufficient drive current at the output and Circuit (b) uses an inverter at its output in order to increase drive current, but this also increases its static power consumption due to the direct path between VDD and GND.
At first we propose a useful method for reducing the static power consumption. The main reason for static power consumption is leakage current. On the other hand, it is shown that by placing more than one transistor serially, the leakage current will be decreased [8]. Therefore this technique can reduce static power consumption. However, too many serial transistors can reduce speed and increase dynamic power consumption. Therefore we must select the proper number of serial transistors.

Our proposed circuit, shown in Fig. 5, uses only two serial transistors. These extra transistors in the path from $V_{DD}$ to ground help to decrease the overall static power consumption in deep submicron technology and implement the logical equation for producing the $SUM$ output. This circuit also eliminates the drawbacks in previous designs. In this circuit, the input $H$ is the selector; if $H = 0$ then $sum = C_{in}$, otherwise $sum = H \cdot C_{in}$.

We performed HSPICE simulations using the PTM 65 nm technology models, and compared the performance of our proposed circuit to those of the two commonly-used circuits shown in Fig. 4(a) and Fig. 4(b). The simulation test setup is like as module I. The simulation results are shown in Table 2. Both power consumption and delay time show very good improvement in comparison with the circuit in Fig. 4(b). The delay in this new circuit is more than the circuit in Fig. 4(a). The increased delay is a result of increased serial transistors in our circuit. Leakage power consumption of this new circuit shows very good improvement in comparison with other circuits.

### Table 2. Module II HSPICE simulation results at 65 nm, $V_{dd}=1.1$ V, $C_I=2$ fF

<table>
<thead>
<tr>
<th>65 nm Tech.</th>
<th>Circuit (a)</th>
<th>Circuit (b)</th>
<th>Proposed circuit</th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of Tr.</td>
<td>4</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>Leakage Power (nW)</td>
<td>121.0</td>
<td>148.5</td>
<td>108.1</td>
</tr>
<tr>
<td>Power ($\mu W$)</td>
<td>23.933</td>
<td>24.609</td>
<td>23.09</td>
</tr>
<tr>
<td>Delay (ps)</td>
<td>27.136</td>
<td>42.338</td>
<td>29.821</td>
</tr>
<tr>
<td>PDP ($fJ$)</td>
<td>0.6494</td>
<td>1.0418</td>
<td>0.6885</td>
</tr>
<tr>
<td>Power$\times$Delay</td>
<td>15.5420</td>
<td>25.6376</td>
<td>15.8974</td>
</tr>
</tbody>
</table>

5 Simulation Results

Now with these newly designed subcomponents we can have a new design for 1-bit full adder. We select the module III, which is presented in [7] to form this new 1-bit full adder circuit. Fig. 6 shows this new full adder circuit. The test configuration of this circuit is the same as the one used in [7].

This new circuit is compared to another full adder circuits that used from hybrid-CMOS logic style for 1-bit full adder cells [5-7]. Hybrid-CMOS logic style is suitable for deep submicron technology. However the circuits that are proposed up to now are simulated in 0.18 $\mu m$ CMOS technology. Our simulation is done in 65 nm PTM technology model with 1.1 V power supply. The capacitive load (like the test setup of module I) is selected to be 2 fF. Fig. 7 shows the full adder circuit which is presented in [7]. The simulation test setup circuit is the same as the one that is used in [7]. As shown in Table 3, there is at least 21% improvement in the overall performance (Power-Delay Product, $PDP$) of the circuit, because both delay and power consumption are reduced. This new full adder circuit has at least 11% improvement in leakage power consumption in comparison with the other circuits.
### Table 3. Full adder HSPICE simulation results at 65 nm, V_{dd}=1.1 V, C_l=2 fF

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Leakage Power (nW)</td>
<td>263.2</td>
<td>259.0</td>
<td>281.8</td>
<td>230.3</td>
</tr>
<tr>
<td></td>
<td>Power (μW)</td>
<td>93.310</td>
<td>64.039</td>
<td>60.984</td>
<td>63.005</td>
</tr>
<tr>
<td></td>
<td>Delay (ps)</td>
<td>133.33</td>
<td>76.88</td>
<td>92.394</td>
<td>61.63</td>
</tr>
<tr>
<td></td>
<td>PDP (fJ)</td>
<td>12.4410</td>
<td>4.9233</td>
<td>5.6345</td>
<td>3.8829</td>
</tr>
<tr>
<td></td>
<td>Power×Delay</td>
<td>1160.86</td>
<td>315.28</td>
<td>343.61</td>
<td>224.64</td>
</tr>
</tbody>
</table>

### Table 4. 4-bit full adder HSPICE simulation results at 65 nm, V_{dd}=1.1 V, C_l=2 fF

<table>
<thead>
<tr>
<th>65 nm Tech.</th>
<th>Comparison of 4-bit full adder Circuits</th>
<th>4-bit full adder composed from the adder in [5]</th>
<th>4-bit full adder composed from the adder in [6]</th>
<th>4-bit full adder composed from the adder in [7]</th>
<th>4-bit full adder from the proposed adder</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Leakage Power (nW)</td>
<td>536.6</td>
<td>519.7</td>
<td>610.9</td>
<td>405.1</td>
</tr>
<tr>
<td></td>
<td>Power (mW)</td>
<td>0.13757</td>
<td>0.13726</td>
<td>0.15333</td>
<td>0.13092</td>
</tr>
<tr>
<td></td>
<td>Delay (ps)</td>
<td>148.91</td>
<td>188.29</td>
<td>198.08</td>
<td>169.52</td>
</tr>
<tr>
<td></td>
<td>PDP (fJ)</td>
<td>20.4855</td>
<td>25.8446</td>
<td>30.3716</td>
<td>22.1935</td>
</tr>
<tr>
<td></td>
<td>Power×Delay</td>
<td>2.8181</td>
<td>3.5474</td>
<td>4.6568</td>
<td>2.9055</td>
</tr>
</tbody>
</table>

### 6 Adder Chaining

We used our optimized 1-bit full adder in a carry-ripple 4-bit full adder circuit to evaluate it in a realistic operating condition. The 4-bit full adder test circuit is shown in Fig. 8 and the results are presented in Table 4. As shown in Table 4, there is at least 22% improvement in leakage power consumption in comparison with other circuits. This is completely compatible to the main goal of this paper, i.e., its suitability in nano-scale CMOS technology. This goal can be achieved by low leakage power design which is done here. Also there is at least 5% improvement in overall power consumption of our proposed circuit.

### 7 Conclusion

We present a new design for a 1-bit full adder circuit by optimizing its subcomponents for leakage current reduction. The simulation results show that the adder is suitable for application in ultra deep submicron CMOS technologies. The proposed full adder circuit has acceptable performance in 65nm CMOS technology.

### References


Seyyed Reza Talebiyan received the B.Sc. degree in electronics engineering from Ferdowsi University of Mashhad in 2000 and the M.Sc. degree in electronics engineering from Semnan University in 2002. He is currently working toward the Ph.D. degree at Ferdowsi University of Mashhad since 2002. His research interests include VLSI architecture design for basic building blocks of signal processing and communication systems.

Saied Hosseini Khayat holds B.S. degree (1986) in electrical engineering from Shiraz University, Iran, and M.S. (1991) and PhD (1997) degrees in electrical engineering from Washington University in St. Louis, USA. He has held full-time positions at Globespan Semiconductor Inc., Redbank, NJ, Network Programs Inc., Piscataway, NJ, and Erlang Technology Inc., St. Louis, MO, USA. His expertise lies in the area of hardware/software design for communication, networking and cryptography. At present, he is an assistant professor at Ferdowsi University of Mashhad, Iran.