Minimum Energy CMOS Circuit Design
Considering Variation
Chris Ferguson, Max Korbel, and David Money Harris
Harvey Mudd College,
Claremont, CA 91711
Abstract
.
Near

threshold and sub

threshold voltage operation in chips is
a potentially very useful method of lowering power consumption in
devices that must be very energy efficient but do not require
intensive
computation
. Most research has focused
on lowering the
total e
nergy by
dropping the voltage to very low and
extending cycle time as is
necessary to
reduce the failure rate
. Random dopant fluctuation can
cause intrinsic errors due to mismatch between consecutive devices that
cannot be solved by simply increas
ing the cycle time. Taking into
account this process variation may raise the optimal lowest energy near

threshold voltage to a higher value than previously expected.
1
Introduction
A growing body of applications including wireless sensor networks and e
nergy

scavenging systems require modest computational requirements but extremely low energy per
operation. Such systems are typically operated at a relatively low voltage while actively
computing, and then turned off to stop leakage when the computation is
complete. The best
operating voltage is a balance of dynamic and leakage energy; operating at a lower voltage
reduces dynamic energy but increases the computation time and hence the total leakage energy.
Early work in this field has concluded that minimum
energy is achieved by using minimum

sized
transistors nearly everywhere and operating sub

threshold at a supply voltage of about 300 mV.
Much of this work is based on the assumption that at this sizing and voltage, all circuits
will function properly giv
en sufficiently long delay. This assumption fails to consider the
possibility that mismatch between devices could cause a time

independent
failure which can
never be corrected. This type of failure would be caused by mismatch between consecutive
devices, p
rimarily caused by stochastic effects such as random dopant fluctuation (RDF). As the
supply voltage is lowered and sizing is kept small, these random effects are more likely to cause
a catastrophic mismatch. If these failures cause a chip to move away fro
m the minimum energy
voltage, then it is possible that a circuit design created to prevent these failures might allow for
lower

energy operation.
2
Background
Previous work suggests that the optimal operating voltage for minimum energy is near
300 mV
[2]
and predict that, even when considering process variation, the nominal delay is 15
FO4 delays
[1]
.
One paper [4] even claims that functional operation is possible at 60 mV.
Most previous work
done does not take into account RDF as a significant issue
and
predict that the inclusion of RDF will simply require longer hold times.
This
study
shows,
however, that many errors are intrinsic and will cause failures
regardless of
how long the clock
cycles are
extended
.
This
research suggests that this
pushes
up the optimal voltage closer to
500
mV.
Some papers, such as [3], claim that larger sizing is necessary to account for process
variation, but
this research suggests
that the minimum size
at a higher voltage is actually the
lower energy option
.
Other res
earch
has focused on designing circuits especially for operation in
sub

threshold or near

threshold regions.
3
Simulation Methods
This study was carried out using a 65nm IBM process. The investigation primarily
focused on examining the circuit design space in the form of circuit sizing and supply voltage.
By investigating the failure rate of circuits in this design space as well as t
heir energy, a
reliability

focused minimum

energy design could be found. In order to best characterize circuits
in this technology, several different circuits were simulated and examined. These circuits were a
nand2, a nor2, an inverter, a chain of 12 fano
ut

of

four inverter, and a latch. The design of the
latch can be
seen in
Figure 1
.
Figure 1
–
The design of latch used in this study
.
Based on preliminary investigation, current through the pMOS transistors
causes
the
circuits to fail. Thus from the perspective of this study, changing device sizing means changing
the width of the pMOS transistors. For the combinational circuits, this sizing change is simply a
multiple of the minimum width applied to the pMOS transi
stor(s).
3.1
Failure Criteria
Because of the high degree of variation in the circuits involved, the definition of what
constitutes a device failure was carefully considered. The criteria was based on the static noise
margins of an ideal inverter at a g
iven voltage. For any measured circuits, a successful
high
value was at or above the V
OH
of an ideal inverter at the given operational voltage, and a
successful low value was below the
V
OL
under the same conditions
.
Using these criteria for an
individual
device's output is designed to provide the best indication of when that device would
cause an overall failure in a chip.
In addition to these criteria on the output, DC noise was added to the inputs (and clock)
of all circuits under test. This DC offset
was set to be the same as the output failure criteria,
V
OH
and
V
OL
.
This allows for the simulation to best reflect the worst

case scenario, when there is a
marginal gate preceding the device under test.
3.2
Latch Sizing Effects
An initial investigation
determined that the optimal way to resize the latch is to upsize all
of the pMOS transistors in the different devices in the latch, rather than changing any specific
inverter or tristate. This was found by measuring the different failure modes detailed in
section
REF for a minimum

sized latch, and comparing it to the failure rates when each of the individual
devices is upsized.
The simulation was performed using a 500

point Monte Carlo simulation at 0.3V
including local mismatch and global variation as de
termined by the IBM model.
Table 1
details
the results from this simulation. As shown, device X1 improves the write

low failure, X2
improves the write

high failure, X3 improves the hold

high failure, and X4 improves hold

low
failure.
X1
X2
X3
X4
All Data
1x
2x
1x
2x
1x
2x
1x
2x
1x
2x
Write

high
8e

3
8e

3
8e

3
0
8e

3
8e

3
8e

3
8e

3
8e

3
0
Hold

high
26e

3
26e

3
26e

3
18e

3
26e

3
10e

3
26e

3
34e

3
26e

3
2e

3
Write

low
8e

3
2e

3
8e

3
12e

3
8e

3
10e

3
8e

3
8e

3
8e

3
2e

3
Hold

low
30e

3
24e

3
30e

3
30e

3
30e

3
46e

3
30e

3
12e

3
30e

3
4e

3
Total Failures
56e

3
50e

3
56e

3
52e

3
56e

3
56e

3
56e

3
46e

3
56e

3
8e

3
Table 1
–
Latch device resizing results for 0.3 V, simulation size N=500. Failure rates are represented as fraction of
total number of runs that are failures, i.e. a rate of 5e

1 means that half of the runs had a device failure
From these results, it is clear tha
t no individual device in the latch can be upsized to
change the total failure rate with much success. However, if all the pMOS in all the devices are
upsized, there can be a substantial reduction in the number of failures by a latch.
3.3
Timing Effects
Because this study was concerned with failures caused by mismatch and not by timing, a
suitable simulation cycle time had to be selected
so
that even the slowest circuits would have
changed values.
However, because the energy of the circuit is dominated
by leakage energy,
having a cycle time that is too long provides circuit designs that leak less with an advantage over
other designs only because of the overly

long leakage time.
The delay time for the simulation
was set to be the delay time of 50 fanout

of

4 inverters tha
t have the worst

case time (µ
+ 3
*
σ
).
This delay time was calculated using a chain of 50 Fo4 inverters with local and global
variation.
A different value was calculated for each circuit design (sizing and voltage) that was
tested.
Th
is allowed for
a consistent timing method for the simulations across all different sizes
and voltages.
3.4
Final Simulation Setup
For
the
final major simulation,
the grid of voltage and sizing was split into seven blocks
where each block was given a
ppr
oximately
600,000 Monte Carlo simulations total. It was
broken up in this way so that
the most detailed results could be attained
where
it was important
in the least amount of simulation time.
F
rom previous
smaller
simulations,
it was possible to
predict
where the most interesting data would probably be
and
where
the minimum energy
was
likely
to occur. The blocks were chosen as shown in Table 2.
Each set of runs gave a failure
rate and
average
energy.
Each Monte Carlo simulation simulated on a latch, a
n inverter, a
nand2, a nor
2,
and a
fanout

of

4 inverter chain.
The simulations were run on
a clock time equal
to the worst

case time delay of 50 fanout

of

4 inverters as described in the previous section.
Sizing,
Voltage(mV)
1x
1.25x
1.5x
2x
400
40001
40001
40001
85715
425
40001
40001
40001
85715
450
40001
40001
40001
85715
475
40001
40001
40001
85715
500
300000
40001
40001
85715
525
300001
300001
120001
85715
550
300001
300001
120001
85715
575
100001
100001
120001
200001
600
100001
100001
120001
200001
625
100001
100001
120001
200001
Table
2
–
The number of Monte Carlo simulations run for each sizing

voltage pair
.
Note: the point (1x, 50
mV)
was run again at a higher number of simulations to get more accurate results near the minimum
energy location.
4
Results
The results of this research are
primarily
based on these final simulation
runs.
Minimum
energies exist at lower voltages because as voltage decreases, dynamic energy does too, but at a
certain point, leakage energy dominates
the total energy and causes the total energy to rise again
for very low voltages.
Figure
2
shows the total average energy as a function of voltage for
different sizings of a latch.
The energy minimum voltage moves lower as the sizing increases
(from 500
mV for 1x
sizing
to 425
mV for 2x sizing)
, but the lowest ene
rgy point increases with
sizing with 1x having the lowest energy despite its higher voltage.
This makes sense since the
larger sizes will increase leakage energy
if
the clock cycle is kept cons
tant.
Figure 2
–
Energy versus voltage for sizings 1x, 1.25x, 1.5x, and 2x of the latch.
Fig
ure 3 again shows energy versus voltage, but this time it is for 1x sizing fo
r the
inverter, nand2, and nor
2. It is apparent that for these combinational
structures, just as for the
latch, higher voltages such as 500
mV and 525
mV
are the most energy efficient. Also just like
the latch, the minimum energies
for the combinational logic
increase with
transistor
size.
Figure 3
–
Energy versus voltage for I
nverter, NAND2, and NOR2; all for 1x sizing.
0.5
0.6
0.7
0.8
0.9
1
1.1
0.4
0.425
0.45
0.475
0.5
0.525
0.55
0.575
0.6
0.625
Energy (fJ)
Vdd (V)
Energy vs. Vdd for Different Latch Sizings
1x
1.25x
1.5x
2x
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.4
0.425
0.45
0.475
0.5
0.525
0.55
0.575
0.6
0.625
Energy (fJ)
Vdd (V)
Energy vs. Vdd for Combinational Logic
for 1x Sizing
Inverter
NAND2
NOR2
Failure rate data is taken at each of these points as well.
A
s expected
,
for any given
device,
as the voltage increases, the failure rate decreases. However, there is something odd with
the detected
failure rates: after a certain point they suddenly drop to zero, but they don’t drop to
zero due to
lack of
sufficiently
high resolution.
For example
,
the lowest non

zero detected failure rate is 7.27 x 10

4
at 525
mV
for the
inverter at 1x sizing
, but a
t 25mV higher there are no failures detected. The inverte
r at the
simulation run with 550
mV and 1x sizing had about 300,000 Monte Carlo runs which gave it a
failure rate resolution of about 3.33 x 10

6
. That means that the real failure rate could have b
een
two orders of magnitude less than the previous failure rate at 525
mV and still would have
probably been detected
as non

zero
. This same
observation was made
for all
the failure rates
trends
for
almost all devices and
sizes, with no failure rate being
lower than 7.27 x 10

4
despite
high resolutions. Figure 4 shows, for 1x sizing, how
sudden the failure rate drop offs are. All
s
imulations were run up to 625
mV, so in this case the latch, nand2, and inverter suddenly dr
op
to zero failures.
Figure 4
–
Failure rate versus voltage on a logarithmic scale for 1x sizing.
Further investigation of the sudden drop offs show that the failure rate plateaus at
a
constant
value
just
before the sudden drop offs in failure rate
to zero
.
Simulations were run at
successively closer
voltages to the drop off points to confirm. So for the inverter, for example, at
voltages 525 mV and 530 mV, the failure rate was exactly same, but at 540 mV, there were
suddenly zero failures detected. The line does not seem to be co
ntinuous. It was also confirmed
that
the simulations were, in fact, still random and unique each time they were run.
Changes to
the options on the simulator to make the output more accurate did not solve the problem. The
reason for these strange results
has not as of yet been determined
, but it is possible that they are
the result of some kind of intersection between two non

continuous models
used by the
simulation software
in the range of interest to us.
1.00E04
1.00E03
1.00E02
1.00E01
1.00E+00
0.4
0.45
0.5
0.55
0.6
Failure Rate (Log scale)
Voltage (V)
Failure Rate vs. Voltage for 1x sizing
Inverter
NOR2
Latch
NAND2
5
Conclusions
Our results suggest that the op
timal lowest energy voltage is closer to 500
mV than 300
mV as othe
r research has suggested. The accuracy of these results cannot be guaranteed
,
however, since our simulations
were discovered to be flawed.
It is possible that our results
relating to fail
ure rates are
incorrect
and the trends we see are actually different in reality.
Future
research may
determine the source of the simulation issues and
show that the results are correct
which would then
imply that the intrinsic failures caused by RDF and m
ismatched consecutive
devices would push the optimal minimum energy voltage up
to a higher value
than was
previously expected.
References
[1]
B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, “Analysis and Mitigation of Variability in
Subthreshold Design”,
fpim䕄′〰
[㉝
䜮d C桥測n 䐮a B污lu眬w 吮T 䵵摧MⰠ䐮a py汶敳瑥rⰠ
丮k 䭩洬h
“Yield
J
d物癥渠乥ar
J
瑨牥獨潬d
SRAM Design”,
fCC䅄 ㈰〷
[㍝
J. Kwong, A. Chandrakasan, “Variation
J
䑲楶敮ip楺楮i 景f 䵩湩浵m 䕮b牧y p畢
J
瑨牥獨潬d
Circuits”, ISLPED 2006
[㑝
p⸠䝵灴dⰠ䄮A
Raychowdhury, K. Roy, “Digital Computation in Subthreshold Region for
啬瑲r汯l
J
m潷or 佰l牡瑩潮㨠䄠䑥癩ve
J
C楲i畩u
J
Architecture Codesign Perspective”, IEEE
㈰
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο