Power Grid Analysis in VLSI Designs

connectionbuttsΗλεκτρονική - Συσκευές

26 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

209 εμφανίσεις


Power Grid Analysis in VLSI Designs





A Thesis


Submitted for the Degree of


Master of Science (Engineering)

In the Faculty of Engineering




By

Kalpesh Shah




2


3

Acknowledgements

My sincere gratitude t
o
both my guides
-

Prof S K Nandy and Dr. Vish Visvanathan
.
Prof
Nandy, thank you for your guidance right from the start of the MS curriculum till the end. I
would not have dreamt of the final chapter
s had it not been for your
timely guidance
. To
Vish, thank you for bearing with me and guiding me from the beginning till end, in your
busy schedule at office. You are the one who encouraged me from enrolling for this
program till end. Thank you for your v
aluable inputs and comments on the material.
My
sincere thanks to IISc and specifically SERC staff who helped me through various
administrative work.

To my colleagues and managers at Texas Instruments, thank you for your cooperation
-
you are a team I am pr
oud of. Thanks for your support and the camaraderie. A special
thanks to
Harinath for approving my MS Program and
Venugopal Puvvada, my manager
when most of this work happened. Discussions with him made this work relevant to Multi
-
million gate designs and
found real application.

Thanks to many of my friends with whom I discussed similar topics like my research
throughout this period

Ananth, Gokul, Mallik, Suravi, Saby, Bram, Ashish
, Aishwarya and
Sumedha
. A special thanks to Anjana Ghose for all that you
did for me while I was not in
Bangalore.

Thanks to my family for having stood behind me like a rock. To my parents, thanks for
your support and affection



your unrelenting persistence helped me to complete last
step. To Pratiksha

thank you for being my
invisible strength. Y
our constant reassuring
presence and confidence in me
drove me to this point in journey
. To Bhavesh
and Deepti


thank you
for being my savior at times of
load
at home. Without

you folks, this thesis
would not have
m
aterialized. And f
inally, thanks to little
Harsh
who
came to this
world
halfway through my MS and
Darsh who saw my MS from the age of 1 year

you kept me
giving unasked needed breaks and made everything so live.


4


5

Table of Contents

Acknowledgements
................................
................................
................................
..................
3

Abstract
................................
................................
................................
................................
...
11

1

Introduction
................................
................................
................................
...................
13

1.1

Motivation
................................
................................
................................
................................
........
13

1.1.1

Power Estimation
................................
................................
................................
................................
...
16

1.1.2

Power Supply Noise
................................
................................
................................
...............................
17

1.1.3

MTCMOS Analysis
................................
................................
................................
................................
.
22

1.2

Terms
................................
................................
................................
................................
..............
24

1.3

Thesis outline and Contribu
tion
................................
................................
................................
......
25

2

Toggle Activity Estimation
................................
................................
...........................
27

2.1

Overview
................................
................................
................................
................................
.........
27

2.2

Togg
le Activity Estimation
................................
................................
................................
..............
29

2.3

Multi
-
million gate solution
................................
................................
................................
...............
30

2.3.1

Deriving automatic toggle frequency values
................................
................................
..............................
31

2.3.2

Hierarchical Modeling
................................
................................
................................
.............................
35

2.4

Validation and Results
................................
................................
................................
....................
37

2.5

Summary
................................
................................
................................
................................
.........
38

3

Power Estimation
................................
................................
................................
..........
39

3.1

Overview
................................
................................
................................
................................
.........
39

3.2

Current approaches to Pow
er Analysis
................................
................................
..........................
42

3.3

Power analysis Tools
................................
................................
................................
......................
45

3.3.1

Power Compiler: [67]
................................
................................
................................
..............................
45

3.3.2

Power Mill (or Nano Sim) [4][68]
................................
................................
................................
..............
46

3.3.3

Prime Power [66]
................................
................................
................................
................................
....
47

3.3.4

Other Tools
................................
................................
................................
................................
............
47

3.4

Validation Flow
................................
................................
................................
................................
48

3.4.1

Netlist Setup:
................................
................................
................................
................................
..........
50

3.4.2

Vector Generation
................................
................................
................................
................................
..
50

3.4.3

Interconnect setup
................................
................................
................................
................................
..
51

3.5

Validation and Results
................................
................................
................................
....................
51

3.6

Power estimation appli
cations
................................
................................
................................
........
60

3.6.1

Average power/ground bus currents
................................
................................
................................
........
60

3.6.2

Average power dissipation
................................
................................
................................
......................
61

3.6.3

Electro migration failures
................................
................................
................................
.........................
61

3.6.4

Power Routing
................................
................................
................................
................................
........
61

3.6.5

Gate Oxide Integrity Analysis
................................
................................
................................
..................
62

3.7

Summary
................................
................................
................................
................................
.........
62

4

Power Supply Noise Analysis
................................
................................
.....................
63

4.1

Overview
................................
................................
................................
................................
.........
63

4.2

Cell Characterization
................................
................................
................................
.......................
64

4.2.1

Current Characterization Methodology
................................
................................
................................
.....
65

4.2.2

Current Characterization Flow
................................
................................
................................
.................
71

4.3

Power Grid network modeling
................................
................................
................................
........
72

4.3.1

Power Grid Current Waveform Modeling
................................
................................
................................
..
74

4.4

Complete Flow
................................
................................
................................
................................
78


6

4.4.1

Timing Information Generation
................................
................................
................................
................
80

4.4.2

Power Grid Generator
................................
................................
................................
.............................
80

4.4.3

SPICE Simulation
................................
................................
................................
................................
...
82

4.5

Validation and Results
................................
................................
................................
....................
82

4.5.1

Peak Power Results
................................
................................
................................
...............................
83

4.5.2

Peak Dynamic IR Drop Results
................................
................................
................................
...............
84

4.6

Summary
................................
................................
................................
................................
.........
87

5

Power Up Analysis
................................
................................
................................
........
89

5.1

Switched PG Networks
................................
................................
................................
...................
91

5.2

Switch Network Analysis
................................
................................
................................
.................
94

5.2.1

Switch Characterization
................................
................................
................................
..........................
95

5.2.2

Current or Switch Prediction
................................
................................
................................
....................
96

5.3

Re
sults and Analysis
................................
................................
................................
.......................
99

5.4

Summary
................................
................................
................................
................................
.......
104

6

Conclusion
................................
................................
................................
...................
105

6.1

Summa
ry
................................
................................
................................
................................
.......
105

6.2

Scope of Future Work
................................
................................
................................
...................
106

7

References
................................
................................
................................
...................
109

Appendix A
Sample SDC file
................................
................................
...............................
115

Appendix B Sample SPEF Format
................................
................................
......................
116

Appendix C Power Waveforms Analysis
................................
................................
...........
118

Appendix D Current Characterization

sample spice deck
................................
...........
119

Appendix E Waveform transformation example
................................
...............................
120



7

Table of Figures

Figure

1.1 Power Dissipation in CMOS designs
................................
................................
......................
13

Figure

1.2 Power Density trend in CMOS designs
................................
................................
...................
14

Figure

1.3 Leakage and Dynamic Power Dissipation [2]
................................
................................
.........
15

Figure

1.4 Schematic of Power Grid in CMOS designs
................................
................................
...........
18

Figure

1.5 Normalized delay and normalized delay to voltage ratio
................................
........................
21

Figure

1.6 Total power break up into leakage and active
................................
................................
........
23

Figure

2.1 Schematic of logic circuit 1
................................
................................
................................
......
31

Figure

2.2 Schematic of Logic Circuit 2
................................
................................
................................
....
32

Figure

2.3 Gated clock exam
ple
................................
................................
................................
...............
34

Figure

2.4 Gate Level Netlist for 'simple' design
................................
................................
......................
36

Figure

2.5 Timing Arcs in extracted model of 'simple' design
................................
................................
..
37

Figure

3.1 Venn diagram of Power Components
................................
................................
.....................
40

Figure

3.2 Power Estimation in Design Stages
................................
................................
........................
45

Figure

3.3 Power Estimation Validation Flow
................................
................................
...........................
49

Figure

3.4 Legends for Validation Flow
................................
................................
................................
....
49

Figure


4.1 Voltage over time representation at an internal design node
................................
................
63

Figure

4.2 Schematic circuit for instantaneous voltage drop analysis
................................
....................
64

Figure

4.3 Inverter waveforms measured at different nodes
................................
................................
...
66

Figure

4.4 transition time vs. peak power for Inverter
................................
................................
..............
68

Figure

4.5 Transition time vs. peak power for nand gate
................................
................................
.........
68

Figure

4.6 Load vs. peak power for AND gate
................................
................................
.........................
69

Figure

4.7 L
oad vs. Peak power for OR gate
................................
................................
...........................
69

Figure

4.8 State Dependency on cell switching
................................
................................
.......................
70

Figure

4.9 Cell Characterization Flow
................................
................................
................................
.......
72

Figure

4.10 Power Grid Modeling
................................
................................
................................
.............
73

Figure

4.11 Peak IR drop Computation Flow
................................
................................
...........................
79

Figure

4.12 Prime Time flow for arrival time computation
................................
................................
.......
80

Figure

4.13 Power Grid Generation Flow
................................
................................
................................
.
81

Figure

4.14 PSN waveform of Proposed Method
................................
................................
.....................
86

Figure

4.15 PSN Reference Waveform
................................
................................
................................
....
86

Figure

5.1 Gated Power Supply ([74])
................................
................................
................................
......
89

Figure

5.2 Layout of 1M gate with switch network
................................
................................
...................
92

Figure

5.3 Current Glitch and Voltage Ramp at arbitrary switch output
................................
..................
92

Figure

5.4 Typical PG network with Power Switches
................................
................................
...............
93

Figure

5.5 Schematic Switch network Analysis Flow
................................
................................
...............
95

Figure

5.6 Analysis model of Virtual Power Network
................................
................................
...............
96

Figure

5.7 Infinitesimal Time Division for Current Prediction
................................
................................
...
97

Figure

5.8 Reduced Switch Network for validation
................................
................................
................
100

Figure

5.9 Voltage Ramp up over Time for various nodes
................................
................................
....
103

Figure

5.10 Current comparison over time
................................
................................
.............................
103

Figure 1 1MHz, Peak: 838.9 uW
................................
................................
................................
.............
118

Figure 2 100MHz, Peak: 840.7 uW
................................
................................
................................
.........
118


8

Figure 3 1GHz, Peak: 838.2 uW
................................
................................
................................
.............
118

Figure 4 1MHz base Waveform, 830.4uW
................................
................................
.............................
120

Figure 5 100MHz Transformation, 830.4 uW
................................
................................
.........................
120

Figure 6 1GHz Transformation for 1MHz, 830.4uW
................................
................................
..............
121


9


List of Tables

Table

1.1 Consolidation of ITRS2003 Predictions
................................
................................
...................
14

Table

1.2 Generic Term Definitions
................................
................................
................................
..........
25

Table

2.1 Com
parison of Static vs Dynamic approaches for Power Estimation
................................
.....
28

Table

3.1 Power Modeling for CMOS gates
................................
................................
.............................
43

Table

3.2 ISCAS89 c
ircuit description
................................
................................
................................
......
54

Table

3.3 Runtime comparison between vector less and SPICE
................................
............................
55

Table

3.4 Clock Power vs. Total Power
................................
................................
................................
....
57

Table

3.5 Power Estimation across various tools
................................
................................
....................
60

Table

4.1 Comparison of Peak power Dissipation
................................
................................
...................
84

Table

4.2 Comparison of percentage peak instantaneous IR drop
................................
.........................
85

Table

4.3 Comparison of percentage peak IR drop on ISCAS89 circuits
................................
...............
85

Table

5.1 Switch Prediction by proposed algorithm
................................
................................
...............
102

Table

5.2 Voltage Prediction
................................
................................
................................
...................
102

Table

5.3 Power Up analysis
-
Runtime Comparison
................................
................................
............
103



10


11


Abstract

Power has become an important design closure parameter in today’s ultra low submicron
digital designs. The impact of the increase in power is multi
-
discipline to researchers ranging
from power supply design, power converters or voltage regulators design, system, board and
package thermal analysis, power grid design and signal integrity analysis to minimizing power
itself. This work focuses on challeng
es arising due to increase in power to power grid design
and analysis.

Challenges arising due to lower geometries and higher power are very well researched topics
and there is still lot of scope to continue work. Traditionally, designs go through average I
R
drop analysis. Average IR drop analysis is highly dependent on current dissipation estimation.
This work proposes a vector less probabilistic toggle estimation which is extension of one of
the approaches proposed in literature
. We have further used toggl
es computed using this
approach to estimate power of ISCAS89 benchmark circuits. This provides insight into quality
of toggles being generated. Power Estimation work is further extended to comprehend with
various state of the art methodologies available i.
e. spice based power estimation, logic
simulation based power estimation, commercially available tool comparisons etc. We finally
arrived at optimum flow recommendation which can be used as per design need and schedule.

Today’s design complexity

high fre
quencies
, high logic densities
and multiple level clock and
power gating
-
has forced design community to look beyond average IR drop. High rate of
switching activities induce power supply fluctuations to cells in design which is known as

12

instantaneous IR
drop. However, there is no good analysis methodology in place to analyze this
phenomenon. Ad hoc decoupling planning and on chip intrinsic decoupling capacitance helps
to contain this noise
but there is no guarantee
. This work
also applies average toggle
c
omputation approach to compute instantaneous IR drop analysis for designs. Instantaneous IR
drop is also known as dynamic IR drop or power supply noise. We are proposing cell
characterization methodology for standard cells. This data is used to build power
grid model of
the design. Finally, the power network is solved to compute instantaneous IR drop.

Leakage P
ower Minimization has forced design team
s
to do complex power gating

multi
level MTCMOS usage in Power Grid.
This puts additonal analysis challenge
for Power Grid
in
terms of ON/OFF sequencing and noise injection due to it
.
This work explains the state of art
here and
highlights some of the issues and trade offs using MTCMOS logic. It further
suggests
a simple approach to quickly access the impact of
MTCMOS gates in Power Grid in terms of
peak currents and IR drop. Alternatively, the approach suggested also helps in MTCMOS gate
optimization. Early leakage optimization overhead
can
be computed using this approach.


13


1

Introduction


1
.
1

Motivation

VLSI indust
ry is facing one of the biggest challenges in its evolution

Power Integrity closure

the next after cross talk induced integrity issues in previous decade. Power Dissipation has
phenomenally increased across years as shown in
Figure

1
.
1
giving rise to this challenge.
Figure

1
.
2
shows the increase in power density due to ultra low scaling and hence increasing
the components cramped in unit area.


5KW
18KW
1.5KW
500W
4004
8008
8080
8085
8086
286
386
486
Pentium
®
proc
0.1
1
10
100
1000
10000
100000
1971
1974
1978
1985
1992
2000
2004
2008
Year
Power (Watts)
5KW
18KW
1.5KW
500W
4004
8008
8080
8085
8086
286
386
486
Pentium
®
proc
0.1
1
10
100
1000
10000
100000
1971
1974
1978
1985
1992
2000
2004
2008
Year
Power (Watts)

Figure

1
.
1
Po
wer Dissipation in CMOS designs


14

4004
8008
8080
8085
8086
286
386
486
Pentium
®
proc
P6
1
10
100
1000
10000
1970
1980
1990
2000
2010
Year
Power Density (W/cm2)
Hot Plate
Nuclear
Reactor
Rocket
Nozzle
4004
8008
8080
8085
8086
286
386
486
Pentium
®
proc
P6
1
10
100
1000
10000
1970
1980
1990
2000
2010
Year
Power Density (W/cm2)
Hot Plate
Hot Plate
Nuclear
Reactor
Nuclear
Reactor
Nuclear
Reactor
Rocket
Nozzle
Rocket
Nozzle
Rocket
Nozzle

Figure

1
.
2
Power Density trend in CMOS designs

Table

1
.
1

below shows consolidation of ITRS2003 [1] predictions on power as well as its
impact on d
esign as well as operating voltages.



2003

2004

(90u)

2005

2006

2007

(65u)

2008

2009

2010

(45u)

2012

Vdd(High

Perf)

1.2

1.2

1.1

1.1

1.1

1

1

1

0.9

Vdd(Low Power)

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

Hig
h
Perf Power (W)

149

158

167

180

189

200

210

218

240

Battery Operated(W)

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

3

PG Pads

1700

1800

2000

2100

2200

2300

2400

2400

2600

Table

1
.
1
Consolidation of ITRS2003 Predictions


15

Further,
Figure

1
.
3

shows that there is leakage as well as dynamic component of power those
are continuously increasing

leakage dominating dynamic

in ne
wer technology nodes. [2]
Next sections describe how these give rise to challenges in Power Grid analysis and leads to the
work done.


Figure

1
.
3
Leakage and Dynamic Power Dissipa
tion [2]



16

1
.
1
.
1

Power Estimation

One of the challenges in Power Integrity analysis is to predict accurate power dissipation

both
average as well as peak
-
of design. Power Estimation is required for package thermal analysis,
power minimization, and Power Grid
design.

The earliest proposed techniques of estimating power dissipation were strongly
pattern
-
dependent
circuit simulation based e.g. SPICE or fast SPICE simulators [3
-
6]. Besides being
strongly
pattern
-
dependent
, these techniques are too slow to be used
on modern very large
-
scale integrated (VLSI) circuits for which high power dissipation is a major problem.

In order to improve computational efficiency, other simulation
-
based techniques were proposed
using various kinds of timing, switch
-
level, and logic
simulation [7
-
9]. In these approaches,
lookup tables are obtained by electrical simulation of the basic library elements, and the
collected data are then used during gate level simulation. These techniques generally assume
that the power supply and ground
voltages are fixed, and only the supply current waveform is
estimated. While they are indeed more efficient than traditional circuit simulation at the cost of
some loss in accuracy, they remain strongly
pattern
-
dependent
and they are still slow for
modern
multi
-
million gate designs where whole chip can not be simulated together.

In order to overcome the shortcomings of simulation
-
based techniques, research has been
focused on probabilistic and statistical techniques for toggle estimation. The use of
probab
ilities to estimate power was first proposed in [11]. In this work, a zero
-
delay model was
made so that the transition probabilities could be estimated using signal probabilities. A
probabilistic power estimation approach that does compute the toggle power
and does not make
the zero
-
delay or temporal independence assumptions, called
probabilistic simulation
was

17

proposed in a few papers.

In this technique, the use of probabilities was expanded to allow the
specification of
probability waveforms.
This approac
h assumed spatial independence, and was
not restricted only to synchronous circuits.

Another probabilistic approach was proposed, where the
transition density
measure of circuit
activity was introduced by Farid N. [12]. An algorithm was also presented for
propagating the
transition density in to the circuit. This approach does not make a zero
-
delay assumption and
makes only the spatial independence assumption. Result of this independence assumption
makes computed density values insensitive to the internal
circuit delays.

Yet another probabilistic approach was presented in [13] by A. Ghosh et. al., where Binary
Decision Diagrams (BDD’s) were used to take into account internal node correlations and
toggle power, at the cost of increased computation. This app
roach can become computationally
expensive. Apart from that, latest literature describes more accurate toggle estimation methods
based on Bayesian networks [14
-
16]. They get limited to handle high gate count designs. All of
the above
probabilistic
and
stat
istical
techniques are applicable only to combinational circuits.
They require the user to specify information on the activity at the latch outputs.

This work addresses the toggle computation problem or pattern dependence problem for multi
-
million gate des
igns by extending Najm’s approach [12]. Using this average power estimation
has been performed in various stages of the designs.

1
.
1
.
2

Power Supply Noise

With a phenomenal rise in the switching speed in the VSLI circuits, the probability of large
number of cells
switching in a short period of time increases. A large number of simultaneous

18

switching occurring in a short period of time can cause a considerable amount of noise in the
power supply network of a circuit. Power supply noise means decrease in voltage se
en by cell
Power Ground nodes. Schematic of Power Network gird is shown in

Figure

1
.
4
. The resistive
parasitic R in the power distribution network is accountable for the resistive noise, which is the
IR voltage drop in the PG network. Apart from R, on chip decoupling capacitance also plays a
big role. The switching noise in the power
distribution network must be contained to a tolerable
level to ensure the reliability/performance of a circuit.


Figure

1
.
4
Schematic of Power Grid in CMOS
designs

Excessive volt
age drops manifest themselves as glitches on the PG buses and cause:



Erroneous logic signals






1


5

Vss Pad

IO Pad

IO Pad

IO Pad

IO Pad

Vss
Pad

Vdd Pad

Vss Pad

IO Pad

IO Pad

Vdd Pad

Vss Pad

IO Pad

IO Pad


19



Degradation in switching speeds



Reduction in Noise Margin and Driving Capability of the gates

According to a study on Pentium®4 [26], power supply noise can redu
ce clock frequency by
6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these are
handled through various margins in design flow as there are no efficient solutions available to
address dynamic V drop problem in design flow.

Ther
e is
some work done
to estimate peak power as well as decoupling capacitor in this regard.
In [27
]
, a pattern
-
independent, linear time algorithm is described that estimates the maximum
current waveforms at various contact points in the circuit. The algorit
hm is first demonstrated
for simple gate delay and current models. The expression for modeling the delays and current
waveforms for a general gate is derived and the way to
extend the algorithm under more
general models is also described. The
authors impro
ved the work in [28
]
. In [29
] measures of
peak power are proposed in the c
ontext of sequential circuits, and a procedure is presented to
obtain lower bounds on these measures, as well as providing the actual input vectors that attain
such bounds. Automatic
generation of a functional vector loop for near
-
worst case power
consumption is attained.

Paper [30]
present
s
a statistical method for estimating the peak
power dissipation in VLSI circuits. The me
thod is based on the theory of extreme
order
statistics an
d its application to the probabilistic distributions of the cycle
-
by
-
cycle power
consumption, the maximum
-
likelihood estimation, and the Monte
-
Carlo simulation. It can be
used to predict the maximum power of a VLSI circuit in the set of constrained input v
ector
pairs as well as the complete set of all possible input vector pairs. The simulation
-
based nature
of the method avoids the limitations of a gate
-
level delay model and a gate
-
level circuit
stru
cture. Also, the method produces maximum power estimates t
o satisfy user
-
specified error

20

and confidence levels. Experimental results show that this method typically produces maximum
power estimates within 5% of the actual value and with a 90% confidence level by only
simulating less than 2500 input vectors.
Anoth
er technique described in [31] computes peak
powers of design while maintaining the current waveform accuracy. It models logic gates by
breaking the gates into various nodes. It then models various currents in terms of these nodes
which are evaluated quick
ly during logic simulation to measure power. However, this is based
on logical simulation so extremely difficult to scale.

Chen and Ling [36
] proposed an approach to estimate the power supply noise based on an
integrate
d package
-
level and chip
-
level power
bus model. Chang, Gupta, and Breuer
[37
]
proposed an analytical model to estimate the ground bounce caused by the switching in the
int
ernal circuitry for sub
-
micron VLSI circuits. Jiang, Cheng, and Deng
[38]
proposed a
Genetic Algorithm
-
based approach that
considered the dependence of switching noise on input
patterns under a distributed RC
model of the PG network
.
Zhao, Roy, and Kho
proposed an
event
-
driven simulation based approach to calculate
the worst case power supply noise under a
distributed RLC mod
el
[39]
.


There are still more challenges in this area where very little work has been done.

First, to analyze Power Ground (PG) noise, worst case vectors are required using which the
parasitic network of chip is simulated. Not only the whole approach nee
ds lot of data and
memory but today’s SPICE simulators are not able to handle such complexity in terms of
runtime and capacity. Many times (read as all the time) determining the worst case vectors is
not straightforward.


21

Second, today’s design has huge PG
network. It is known that the voltages seen at various
nodes in this network will vary. A resultant voltage across power
-
ground bus for a macro
impacts the delay as shown in
Figure

1
.
5
.
Note that delay is non
-
linear at low voltages. Further,
the change in delay to change is voltage is more non linear compare to delay


this is of very
important to designers
as it can cause delay issues or design failures
.
Due to high dependency
of
delay to voltage, dynamic V
-
drop in PG network is fast becoming a critical concern for the
chip designers [41][59
-
60].

1.2
1.15
1.1
1.05
1
0.95
0.9
0.85
0.8
Voltage
normalized delay and normalized
delay2voltage
Rise Delay
Fall Delay
risedelay2voltage_chan
ge
falldelay2voltage_chang
e

Figure

1
.
5
Normalized delay and normalized delay to voltage ratio

Third aspect to PG
noise problem is that it is an iterative phenomenon [41]. When voltage
across cell decreases due to sudden rise in switching activity, it also changes the delays and
hence the simultaneous switching. This in turn can reduce/increase the dynamic noise issue
s.
Reduce in a sense that the simultaneous switching may reduce all together or increase because
it can move one hot spot of the design to some other hot spot. Handling of this is not a trivial
task from analysis perspective.


22

Four, design methodologies tod
ay expect analysis to meet predefined PG noise targets. In
reality, any acceptable voltage drop is fine if we meet the required timing goals. However, this
is not done due to lack of analysis data.

Five, it has been found that many times the device fail o
n testers due to excessive simultaneous
switching in SCAN testing. This creates serious testability issues and hence not only we need to
analyze dynamic V drop for functional mode but also some other modes like test.

This work addresses the dynamic PG nois
e problem. The problem is also described as dynamic
V drop problem in some literature. Based on the above
-
mentioned issues, the goal is to address
the dynamic V drop problem with efficient runtime that addresses today’s multi million gate
designs. The goa
l is to also evaluate the impact of dynamic V drop on timing.

1
.
1
.
3

MTCMOS Analysis

Leakage power consists of more than half of total power in today’s ultra sub micron designs.
See
Figure

1
.
6
below.



23


24

transistors and controlling signals and are used to dynamically switch off or on the power
supply to specific region in the chip. This work studies the challenges associated with using
power switches and prop
oses fast analysis technique to estimate peak currents while Power
ramp up of logic happens.

1
.
2

Terms

Generic terms used in this report are described below.

ASIC

Acronym for Application Specific Integrated Circuits. A custom or semi
custom integrated circuit,
such as a cell or gate array, created for a specific
application. The complexity of ASICs typically requires significant use of
CAD techniques.

Block

Also known as functional block or module. Any block within the design
hierarchy instantiated one or more
times that will be laid out separately is
referred to as a block module. Block modules are defined divisions of a chip
based on functionality and can be worked on independently of other
functional blocks.




Netlist

A description

of the circuit. The desc
ription can be a gate
-
level or Register
-
Transfer level (
RTL
) one. It can also be in different languages like Verilog
or VHDL or SPICE.

Physical Design

A portion of a chip or circuit corresponding to a block module that is laid
out separately using a Physi
cal Design tool. It is also referred to as a
physical block, layout region, or layout block.

RTL

Acronym for Register Transfer Level

Characterization

Electrical analysis performed for the purpose of determining typical device
performance characteristics
and/or parametric limits.


25

CMOS

Acronym for Complimentary Metal Oxide Semiconductor. An MOS
technology in which both P
-
channel and N
-
channel devices are fabricated
on the same die.

Die

A single square or rectangular piece of silicon into which a specific
semiconductor circuit has been diffused.

Electromigration

Particle migration in aluminum or copper thin
-
film or polysilicon
conductors at grain boundaries as a result of high current densities.
Electromigration can lead to either an open circuit condition
in a conductor
or a short between adjacent connectors.

Interconnect

The metallization connecting two or more active elements on the surface of
a die; also, the wires connecting the die to the package leads.

Timing Window

Timing window specifies the inte
rval of each circuit node at which a
transition activity is anticipated. For a single clock domain, the time interval
can lie within a clock period. There can be more than one intervals or
overlapping intervals based on complexity of path converging to the
node.

Table

1
.
2
Generic Term Definitions

1
.
3

Thesis outline and Contribution

There are 3 distinct problems addressed in this work.

First, Average Power Estimation using probabilistic toggle estimation for multi
-
million gate
designs. Unless s
pecified by the user, the approach calculates switching probabilities as well as
switching rate at different nodes in the circuit (including primary inputs). We have studied
switching activity calculation method with lot of literature already available and
enhanced one
of the techniques to meet multimillion gate design needs. This work helps in average dynamic

26

power estimation as well as addresses the challenges of toggle estimation which has varied
applications like peak power estimation, power supply nois
e analysis and reliability analysis.

Second, Dynamic Power supply Noise estimation. In this regard, a prototype flow is developed
in conjunction with Prime Time STA flow and Spice to measure Power Supply noise. The work
describes gate characterization met
hodology that involves one time SPICE simulation and how
the PG network is modeled using the characterized data.

Third problem addressed is power grid analysis where MTCMOS gates are inserted. The work
focuses on MTCMOS analysis challenges and key factors
to focus on when a bunch of logic
turns ON from OFF state. In this regard, a flow is developed to estimate peak currents or
optimize MTCMOS resistance and switches.

We restrict out scope to CMOS circuits mapped on a predefined cell library and we follow th
e
two step paradigm

library modeling and analysis of design using modeled information.
Library modeling involves description of cells, their functional, structural or electrical behavior
as needed for block or design analysis, which happens once for all.
Electrical behavior
modeling happens through characterization using circuit simulator (e.g. SPICE [3]).

The document is organized as below. Toggle estimation problem is addressed in chapter 2.
Chapter 3 describes the various Power Estimation techniques an
d tools available in industry
and compares the power numbers with the above toggle estimation method. Chapter 4 describes
Power Supply Noise Estimation and Chapter 5 describes MTCMOS Power Up analysis. Finally,
huge lists of publications are shown at the e
nd for further reference.


27


2

Toggle Activity Estimation

2
.
1

Overview

In CMOS technologies, the chip components draw power supply current only during a logic
transition if we ignore the small leakage current. The current is also proportional to the supply
volta
ge value seen by the cell or macro. While this is considered an attractive low
-
power
feature of these technologies, it makes the power estimation and voltage drop highly dependent
on the
switching activity
inside these circuits [11][97]. It means, a
more
a
ctive circuit will
consume
more
current and hence will contribute higher Voltage drop. The activity of circuit is
known by running simulation patterns and analyzing the data. The pattern
-
dependence problem
is serious. Often, the power of a functional bloc
k needs to be estimated when the rest of the
chip has not yet been designed, or even completely specified. In such a case, very little may be
known about the inputs to this functional block, and complete and specific information about
its inputs would be i
mpossible to obtain.

This drives pattern independent toggle activity estimation problem, often referred as vector less
approach. Since vector less approach does not require patterns, it is also called ‘static’ whereas
vector based approach is called ‘dyna
mic’.
Table

2
.
1
compares these 2 approaches.

STATIC

DYNAMIC

Uses probabilistic approach as described
in [12] or zero delay simulation based
Uses Logic simulation to generate switching
activity or SPICE simulation to calculate power.


28

STATIC

DYNAMIC

approach.

Vector
-
less approach.

Vector based approach. Hence quality is as good as
input vectors. Imagine number of patterns possible
for 100 inputs block.

Many times gives upper bound.

Gives accurate result.

Modeling of certain element (hard
macro/complex block) is difficult.

Since i
t is vector based, functional models can be
used during simulation.

Very fast. (few minutes
-
hours)

Very slow.(few days
-
weeks)

Lot of research into products for average
power estimation.

Can give instantaneous power.

Synopsys has: Power Compiler

Synopsys
has: Power Mill (Nano Sim)

Table

2
.
1
Comparison of Static vs Dynamic approaches for Power Estimation

This work describes the approach used for toggle frequency estimation and its limitations.
Further it pro
poses solution to handle these limitations which makes the approach usable for
big designs.

Few terms are used below to clarify discussion:

Transition Density: If a logic signal x(t) makes n(T) transitions in a time internal of
length T, then the transitio
n density of x(t) is defined as:


D(x) = n(T)/T where T is very huge time (infinite ideally)


29

For large T, D(x) becomes time invariant function and hence there is no need to account
for temporal correlation.

Toggle Frequency:
If a node x is toggling n(T) ti
mes over a time interval of length
T, then the toggle frequency F(x) is defined as:


F(x) = n(T)/(2*T) where T is very huge time (infinite ideally)

Example, if the
node is switching at 20 MHz, it is expected that the node will switch 2
times in 50 ns. As i
t can be seen, the toggle frequency can be converted to transition
density or switching activity by the following equation,

Toggle density = #of transitions/Period = Switching Activity

All the three terms mentioned above are used interchangeably in
this
do
cument.

It should be noted that toggle frequency of a node has no direct relation with the clock
domain(s) in which node (or logic) exists. We have used the clock domain frequency to
upper bound the toggle frequency calculated by our approach.

Signal Proba
bility: Signal probability P(x) at a node x is defined as the average
fraction of clock period in which the stead state value of x is logic
high.

2
.
2

Toggle Activity Estimation

This section gives overview of Farid Najm’s work.

Boolean difference of output is
computed with respect to each input pin. Boolean difference of
function y (output) depends on x(each of the input). It is defined as:


30

0
1




x
y
x
y
dx
dy

(1)

It was shown in [5] that, if the inputs x
I
to boolean logic are (spatially) in
dependent, then the
density of its output y is given by:




n
i
xi
D
dxi
dy
P
y
D
1
)
(
)
(
)
(
(2)

In (2), it is assumed that all inputs are independent. This can lead to inaccuracy where primary
inputs will be diverging and than reconverging to primary ou
tputs

they are not really spatially
independent. However, at a block, the primary inputs can be considered pretty much
independent and hence the above approach can be modeled more accurately if the whole
block’s boolean difference is computed.

Given the
signal probability and toggle density values at the primary inputs of a logic circuit, a
single pass over the circuit, using (2), gives the density at every node. Note that apart from
estimating toggle densities at the output node, we also need to calcula
te output signal
probabilities to do toggle density estimation of subsequent circuit logic. This is simple for two
input AND gate.

P(Y) = P(A)*P(B)


or

P(Y) = 1

P(A)P(B) for NAND gate.

2
.
3

Multi
-
million gate solution

When we apply the above approach, it gi
ves good results for designs which are small and can
be analyzed flat and dominated by combinational logic. Beside, it is always not possible to run
flat due to other logistic concerns like blocks are designed first or rest of the design is being

31

done hier
archically or there is reusable IPs in design which do not have net list. The approach
described in previous section was extended to handle such requirements.

We also came across several issues while applying this approach to some large designs [>5M
gate
s] and implementing tool

Toggle Frequency Calculator. In this section, we will discuss
solutions those addresses each of the problem in detail.

2
.
3
.
1

Deriving automatic toggle frequency values

1

Primary Input Handling

The toggle rate at Primary Input is not kno
wn. Since they are driven externally, there is
no easy way to predict toggle rate for the same. The same is true for primary input
signal probability.
Consider the following
Figure

2
.
1
and
Figure

2
.
2
.



Figure

2
.
1
Schematic of logic circuit 1



32

Figure

2
.
2
Schematic of Logic Circuit 2


In case of above, Input Clk
or D going to block can be primary inputs. Unless user gives
toggle rate, it is highly difficult to compute the same. We used static timing analysis
[24][25] specifications to derive these inputs. They are,

Input Delay Specification

A constraint that sp
ecifies the minimum or maximum
amount of delay from a clock edge to the arrival of a signal at a
specified input port. Input delay specification is with respect to a clock
that triggers events on that signal.

Clock specification

specifies the characteris
tics of a clock, including the clock
name, source period and waveform.

Mode Specifications

specifies the constant values applied on certain port or pins
to drive timing analysis in a specific mode. This means that these pins
or ports are not toggling dur
ing the analysis. It also specifies the
constant value to which the port or pin is tied to.

For clock inputs, we used the toggle rate specified as per the clock specification.

For non
-
clock inputs, we used the clock specified on the Input Delay specificati
on.

For constant ports, we used 0 toggle rate and static probability based on constant value
tied i.e. if it is constant 0, static probability is 0 else it is 1.


33

A Sample SDC file with above command is shown in Appendix A. Note that SDC file
is collection
of commands in tcl format so we have shown the commands which are
primarily required.

2

Sequential element modeling (e.g. flip
-
flops, latches)

Sequential elements do not directly switch arbitrarily when the input switches. Hence,
we can not apply the formul
a as mentioned in equation (1,2).

We used following formula to compute toggle frequency at the output of sequential
cells. Note that we are referring latches and basic flip
-
flops as part of sequential cells
and not the complex macros. They are dealt separa
tely.

Qout = min(DataInput, clock/2)

The upper bounding of clock/2 is required since we identified certain cases where Data
Input toggles more than clock/2. This is explained below. For the cases, where data
input is not toggling more than clock/2, output
can not toggle more than Data Input.
Above equation takes care of these facts.

3

Some Boolean gates were not taking care realistic scenarios: exor/exnor gates, mux

Equation (1,2) can compute higher toggle rate than clock toggle rate. This can go even
higher
than clock toggle rate if there are more such gates in transitive fan out. We found
that this is not the case on actual designs and in many cases, this was not intended
behavior. We exceptionally identified such cells and clipped their toggle rate to half
of
the clock toggle rate.

In similar fashion, we exceptionally identified mux cells and assigned the output toggle
rate to maximum toggle rate of all inputs.


34

4

Complex loop handling

These were handled by breaking the loops. We broke the loop at the 1
st
poin
t where we
found the loop forming.

5

Unconnected inputs going into logic

This was handled by reverse tracking the first sequential cell encountered in the
transitive fan out of unconnected inputs. This algorithm gives the clock controlling the
toggle rate do
wn the line.

If the unconnected inputs are clocks, we assigned the worst toggle rate of the block
itself.

6

Gated clocks or generated clocks

Gated clock is a clock signal that can be modified by logic within the design, such as a
clock that can be turned off
to save power. Schematic of gated clock is shown in
Figure

2
.
3
.


Figure

2
.
3
Gated clock example

We made the gated elements transparent for toggle propagation. A clock gating cell is
handled like a buffer.

7

Design Constraints

Guidelines to do realistic usable tog
gle activity estimation


35

Some of the care needs to be taken despite of all the above solutions. For example,
toggle estimation must be done based on the targeted application. This drives certain
inputs used in 1
-
6 above. In the implementation, we kept certa
in hooks to give control
to the user.

2
.
3
.
2

Hierarchical Modeling

1
.

Huge portion of the design is occupied by memories however
memory
output switching
activity calculation is not straight forward

2
.

Complex functionalities: Hard macros

3
.

Multi
-
million gates cannot affo
rd to have flat analysis due to cycle time and inherent
limitations of probabilistic approaches. We needed to devise a method to do hierarchical
analysis by modeling sub
-
blocks and using them as a black box.

We used the timing modeling approach to handle (
1), (2), (3).

All standard library components are pres
ently modeled in liberty file. [69]
Static timing
analysis tools can generate similar liberty file for blocks after completing the analysis.
[25]

This file has following information,



Input pin 2 output
pin timing arch



Setup and Hold constraints for the data input and clock input



Output timing with respect to either input pin or related clock

We derive output toggle frequency f(out) as below.


36

In case of input 2 output timing Arch

f(out) = maximum(all cont
rolling input toggle rate)

In case of clock 2 output timing Arch

f(out) = average switching activity of clock domain

Figure

2
.
4
shows the gate level netlist of a design called ‘simple’.
Figure

2
.
5
shows the timing
arcs which will be extracted by Prime Time

a leading industry timing analysis tool. [25]
Timing arc information will be used to compute
output toggle rate as explained below.


37


Figure

2
.
5
Timing Arcs in extracted model of 'simple' design

The
re are combinational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2
will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1
toggle rate at output pin. The other timing arch is clk2
-
>out1. In th
is case, out1 will be assigned
average switching activity of clk2.

Thus using timing model information, we
generate output toggle rates of memories, complex
hard macros or blocks.

2
.
4

Validation and Results

Above changes were incorporated into executable cod
e and applied to ISCAS89 circuits. The
results were compared through power estimation as discussed in next chapter.


38

2
.
5

Summary

In this work, we address real issues being faced by large designs. Automatic toggle generation
eases usability as well as improves a
ccuracy. Hierarchical analysis helps in hierarchical design
which is common methodology to handle design complexity.


39


3

Power Estimation

3
.
1

Overview

Accurate Power Estimates are necessary at various stages of the design in order to make correct
architectural,
implementation and cost tradeoffs.[61] Architectural level tradeoffs are higher
level and involves software or instruction level power modeling or high level activity numbers
for different blocks to do implementation tradeoffs. Many times weighted averages
are used to
identify best cost options [62
-
65]. Once the design gets converted to structural net list and
Physical Design starts, Power Estimation mainly drives package design, PG network design
and lower level power minimization. In this case, power diss
ipation is described as below.

P =
(
A
*
C
*
V
^
2
*
f
)
+
(

*
A
*
V
*
Ishort
)
+
(
V
*
Ileak
)

Where

A = activity factor


this specifies the amount of switching at various internal
nodes of design. Note that ‘f’ is clock frequency which is readily available for
most design
s. Activity factor specifies about how much a node toggles per ‘f’
transitions of clock. The activity factor can be derived from simulation patterns
of the logic.

C = capacitance


Interconnect load capacitance or wire capacitance

V = dynamic voltage


vo
ltage at which the logic operates

f = frequency


clock frequency at which the logic operates


40

Ishort = short
-
circuit current during switching


During transition in CMOS
logic, both NMOS and PMOS are ON for a momentarily of time. This time
current finds a
direct path from Power Supply to Ground. This is called short
circuit current. It is dependent on input transition duration of CMOS.


= duration of short
-
circuit current

Ileak = leakage current
[72
-
80][32]


Figure

3
.
1
defines various components of power and their relation ship or contribution to total
power estimation.



Switching power
(70
-
80%)
power dissipated by the
charging and discharging of
the load capacitance.


Cell
i
TR
i
Cload
VDD
))
(
*
)
(
(
*
)
2
^
(
Static (leakage) power (5%):
power dissipated by a gate
when it is not switching


)
(
i
Cell
ge(i)
PCellLeaka
Dynamic Power consists of
Switching Power and Short Circuit Power
ASIC Flow characterizes libraries
for average and leakage power.
Short Circuit power
power dissipated by a
momentary short circuit
between the P and N
transistors of a gate
during switching
Cell Internal Switching Power

can vary based on macro Size
Internal
Internal
Power
Power
Switching power
(70
-
80%)
power dissipated by the
charging and discharging of
the load capacitance.


Cell
i
TR
i
Cload
VDD
))
(
*
)
(
(
*
)
2
^
(
Static (leakage) power (5%):
power dissipated by a gate
when it is not switching


)
(
i
Cell
ge(i)
PCellLeaka
Dynamic Power consists of
Switching Power and Short Circuit Power
ASIC Flow characterizes libraries
for average and leakage power.
Short Circuit power
power dissipated by a
momentary short circuit
between the P and N
transistors of a gate
during switching
Cell Internal Switching Power

can vary based on macro Size
Internal
Internal
Power
Power

Figure

3
.
1
Venn diagram of Power Components


41

In this work, above power components and the
ir computation are extensively studied. To
address the problem in systematic manner, power estimation has been simplified the following
way. These assumptions are acceptable given the global analysis that we are considering.

Power supply and ground voltage
levels throughout the chip are fixed so that it becomes
simpler to compute the power by estimating the current drawn by every sub
-
circuit assuming a
given fixed power supply voltage. Note that this does not mean that different blocks can not be
at differe
nt voltage level. This allows pre
-
characterizing library components for required
voltage points.

The circuit is built of logic gates and latches or reusable IPs, and has the popular and well
-
structured design style of a
synchronous sequential circuit.
In o
ther words, it consists of flops
driven by a common clock and combinational logic blocks whose inputs (outputs) are derived
from flop outputs (inputs). It is also assumed that the flops are edge
-
triggered and, with the use
of CMOS design technology, the ci
rcuit draws no steady
-
state supply current. This allows
breaking down average power dissipation of the circuit into 2 components



The power consumed by the flops



The power consumed by the combinational logic blocks.

This chapter is organized as below. In th
e next section, we have further explained cell based
power analysis. Next section briefly introduces tools used to compare power estimation as
performed by toggle computation described in previous chapter. Later validation and results are
described.


42

3
.
2

Curren
t approaches to Power Analysis

Cell based power estimation consists of cell characterization and logic simulation or activity
estimation. The characterization phase entails a set of electrical simulations of each library cell
for all possible input transit
ions and for a wide range of fanin and fanout conditions. Timing
and power information obtained in this way is used to construct lookup tables for the basic
library elements [46][69].

Summing the leakage power of the design’s constituent library cells deri
ves the total leakage
power of a circuit:


P
leakageTotal
=


)
(
i
Cell
ge(i)
PCellLeaka
(3)

Where P
cellLeakage(I)
is the leakage power dissipation of each cell. Technology library developers
annotate the library cells with the approximate total leakage powe
r dissipated by each cell.
There is usually a single static power number per library cell but sometimes leakage power can
depend on the logical condition of the cell. In this case, the library cell is annotated with a state
dependent static power.

A cell’s
internal power is the sum of the internal power of all of the cell’s inputs and outputs as
modeled in the technology library:




)
(
)
(
*
)
(
*
i
Pin
i
f
i
A
Ei
Internal
P
(4)

Where Ei is the internal energy of each pin. In practice, the internal energy if a pin is
cha
racterized in the technology library and can be accessed by simple table look
-
up. Depending

43

on the required accuracy, different look
-
up tables can be provided by the library designers as
explained in
Table

3
.
1
.

Lookup Table

Pin
Direction

Indices

One
-
dimensional

Input/
Output

Input Transition OR Output load capacitance

Two
-
dimensional

Output

Input transition and output load capacitance

Three
-
dimensional

Output

Input transition and output load capacitance
of the two outputs
that have equal or opposite logic values

Table

3
.
1
Power Modeling for CMOS gates

The switching power is calculated in the following way:




Cell
i
f
i
A
i
Cload
VDD
Pswitching
))
(
*
)
(
*
)
(
(
*
)
2
^
(
(5)

Where Cload(i) is t
he capacitive load of net i. Without any physical information, the load
capacitance Cload(i) is calculated using the wire load model of the net and the fanout of the
driving pin. Usually, this approach achieves relative accuracy.

Apart from the approaches
mentioned above, the following factors are also important for
accurate power estimation.


44

1
.

Temperature dependency of power. Power consumption in CMOS depends on mobility
factors, threshold voltage and doping concentrations. These factors are temperature
depe
ndent. Hence power also varies according to variation in temperature.

2
.

Voltage dependency of power. Voltage dependency of power is well known.
(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOS
component as a capacitor, it is clear
that power varies based on the variation on supply
voltage.

3
.

Power increases with increase in frequency of operation. In fact, many designs now a
day have different modes of operation. A high frequency mode when the device is
operational and a low frequenc
y mode when the device is in standby mode. The impact
of frequency on power estimation is already being discussed in previous section.

4
.

Now a day, most of the designs have a significant chunk of flops or registers. According
to one statistics, around 40
-
50%
logic of the design contains flops. If all the flops are
clocked throughout the operation, clock network consumes almost 50% of total power.
It is sometimes helpful to analyze power consumption on clock network.
This work
analyzes clock power contribution
to total power.

5
.

Process corner also impacts the currents and power consumption.
This is especially true
for leakage power. A typical VLSI process has leakage power variation of order of 4
-
6
from worst process to best process.


45

Based on power sensitivity an
d tool study analysis in this section, we propose a power
estimation flow in typical design cycle as shown in
Figure

3
.
2
below. Note that the power
analysis varies from RTL design to pre layout netlist to post layout netlist.

* SAIF
-
Switching Activity File based approach
Architecture
Recommended
Least Preferred
Power Estimation
(spreadsheet)
RTL
Placed Netlist
Detailed Route Over
RC SPICE Netlist
Unplaced Netlist
Placed Netlist
Detailed Route Over
Toggle Frequency
Calculator
Power Estimation
in Power
Compiler (wire
load, global SPEF,
Detailed SPEF)
RC SPICE Netlist
NanoSim
PIF File
Generation
PrimePower
Forward SAIF*
Or Frequency
Constraints
Logic Simulation
* SAIF
-
Switching Activity File based approach
Architecture
Recommended
Least Preferred
Power Estimation
(spreadsheet)
RTL
Placed Netlist
Detailed Route Over
RC SPICE Netlist
Unplaced Netlist
Placed Netlist
Detailed Route Over
Toggle Frequency
Calculator
Power Estimation
in Power
Compiler (wire
load, global SPEF,
Detailed SPEF)
RC SPICE Netlist
NanoSim
PIF File
Generation
PrimePower
Forward SAIF*
Or Frequency
Constraints
Logic Simulation

Figure

3
.
2
Power Estimation in Design Stages

3
.
3

Power analysis
Tools

3
.
3
.
1

Power Compiler: [67]

Formerl
y known as Design Power, power compiler is currently most widely used Synopsys tool.
Power compiler, typically being used during synthesis, does power optimization as well as
power estimation. This tool has static algorithms for calculating switching activ
ity at various

46

circuit nodes and propagates the same. It is known fact that power compiler cannot estimate
good switching activity for sequential cells. It should be also noted that most ASIC vendors
have cell power modeling based on Synopsys Liberty synta
x so it is highly important to have
single cell power estimation close to Power Compiler number.

Synopsys Reference Manual on
Power Compiler [18] gives basic power calculation theory and description of terms being used
in its tools.

We used power compiler
in two modes.

One mode was to use power compiler as complete solution for power estimation. In this
approach, we generated input switching activity from our vectors and specified to
power compiler. Power compiler propagated the switching activity based on
switching
probability. It then calculates power. In this method, it used some assignment method
for sequential cells and we went ahead with that because our aim was to verify default
switching activity propagation algorithm of Power Compiler.

Second mode w
as to use power compiler just as power calculation engine. In this
approach, we generated switching activity at all the nodes by using methodology
defined in Chapter 3 and used the power calculation engine. As mentioned earlier,
power calculation engine is
quite accurate and so based on power estimation; our aim
was to evaluate switching activity determination accuracy of other methods.

3
.
3
.
2

Power Mill (or Nano Sim) [4][68]

Power Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE engine at core
. It
has been identified as nicely correlating for two of the single cell circuits and one small design

47

with SPICE. Power Mill is dynamic simulation based tool and hence it requires patterns for
simulation.

We used Power Mill to calculate average and peak
power. The main reason was runtime
advantage of PowerMill compare to SPICE. It should be noted here that Power Mill is capable
of taking SPICE net list as input so any switching between from Power Mill and SPICE is
transparent, if needed.

3
.
3
.
3

Prime Power [66]

Prime Power is another offering in Synopsys power portfolio. This is dynamic vector based
solution. However the key difference with Power Mill is that Power Mill is SPICE based tool
whereas Prime Power is logic simulation based tool. In other words, Power
Mill is more tuned
for accuracy and Analog kind of designs whereas Prime Power is tuned to digital and
specifically ASIC kind of designs with reasonably good accuracy. Prime Power has PLI
interface with leading industry simulators e.g. VCS, Modelsim, Veril
og etc. While doing logic
verification with these simulators, if we instantiate one call/command, the PLI dumps binary
files. These binary files can be used in Prime Power to do power estimation. It should be noted
that Prime Power can do peak power analys
is also.

We used Prime Power for both average and peak power analysis. The simulator interface being
used was VCS.

3
.
3
.
4

Other Tools

This project used VTRAN for converting vectors to SPICE stimulus. VTRAN is one of the
offerings as part of Synopsys and is generi
c translator of vectors from one format to another. It

48

is supporting all major industry formats as well as internal formats of many prominent
ASIC/EDA vendors.

VCS was used for logic simulation. There is no specific reason for using this simulator except
that it is Synopsys offering so will go with Prime Power without major hurdles.

There are few TI internal programs used to set up an automated flow. They are listed below.

1
.

genFuncTDL

An internal utility to generate random vectors with specified clock rat
e.

2
.

SimOut

A test constraint validation environment.

3
.

SDFAligner

for translating SDF from one simulator to other simulator compatible
format.

4
.

SigProbGen

For converting vectors to input switching activity and probability
calculator.

5
.

DREPGEN

for gener
ating data compatible for TFC.

6
.

ASCII benchmark data to Verilog netlist and SPICE netlist translator.

3
.
4

Validation Flow

The validation flow diagram, data management and color convention is shown in

Figure

3
.
3
.
Some of the key steps
are described below.


49

VERILOG
NETLIST
POWER
RANDOM
TDL
SIGPROBGEN
PIF
TEST
Bench
GENFUNC
TDL
POWER
ESTIMATION
DREPGEN
VTRAN
SMOUT
VCS_PIF
PrimePower
PWL
FILE
TFC
TRANSLATER
SPICE
POWER
MILL
USERFREQ
FILE
SWITCHING
ACTIVITYFILE
DREPFILE
+ DATA
Spice
NETLIST
VTRAN
cmd
SDF
CFG
CMD
Full VCD
COMPARISON AND
REPORT
DC Scripts
ISCAS89
Circuits
TRANSLATER
Verilog
VERILOG
NETLIST
POWER
RANDOM
TDL
SIGPROBGEN
PIF
TEST
Bench
GENFUNC
TDL
POWER
ESTIMATION
DREPGEN
VTRAN
SMOUT
VCS_PIF
PrimePower
PWL
FILE
TFC
TRANSLATER
SPICE
POWER
MILL
USERFREQ
FILE
SWITCHING
ACTIVITYFILE
DREPFILE
+ DATA
Spice
NETLIST
VTRAN
cmd
SDF
CFG
CMD
Full VCD
COMPARISON AND
REPORT
DC Scripts
ISCAS89
Circuits
TRANSLATER
Verilog

Figure

3
.
3
Power Estimation Validation Flow



White : Third Party tools

Green
: Automatically generated data or written translator

Grey
: TI tools

Default
: standard inputs/outputs

Blue
: Final Output

Elipse
: Data file(s)

Rhombus : Process Block(s)

Figure

3
.
4
Legends for Validation Flow


50

3
.
4
.
1

Netlist Setup:

Standard industry benchmark circui
ts

ISCAS89 are used for the validation. The circuits’
complexity ranges from 14 gates to 22000 gates. The detail statistics of the circuit is mentioned
in Table 2. [71]

To make the validation complete, two single cell circuits are added for ‘micro’ level
validation.
ISCAS89 benchmark circuits were mapped to 130nm technology for analysis. Note that there is
no optimization or synthesis being used while mapping the circuits to 130nm technology
however predetermined set of cells was used. They are,



2,3,4 inp
uts AND/NAND gates



2,3,4 inputs OR and NOR gates



Buffers and inverters



2,3 inputs ex
-
or and ex
-
nor gates



Flops

3
.
4
.
2

Vector Generation

Random vectors were generated for all the ISCAS89 circuits. The numbers of vectors were
based on circuit complexity and number
of gates. They vary from 4 vectors to 38000 vectors
approximately. The same set of vectors is used for logic simulation and SPICE simulation as
well as derivation of switching activity and static probabilities for Input Pins.


51

3
.
4
.
3

Interconnect setup

All the cir
cuits can be estimated as synthesized Verilog netlist and hence the parasitic
information was not available. To make comparison more realistic, no load modes were used in
power compiler and in SPICE simulation. The logic simulation was based on SDF generat
ed
from Synopsys.

3
.
5

Validation and Results

The complete data from different tools are shown in
Table

3
.
5
.
Table

3
.
2
describes circuits used
for benchmarking.
Table

3
.
3
compares run time between dynamic method and modified toggle
computation method for some of the big design blocks.
Table

3
.
4
shows power estimation for
clock network vs. total power estimation. All the power data is dynamic power in uW.



The power numbers mainly reflect the cell internal power and switching power only due
to gate input capacitances as no interconnects were
assumed.



All the experiments are done at nominal operating point i.e. normal process, 25 C
temperatures and 1.2 voltage (nominal voltage).



Clock network power is 50% of total dynamic power but this is not true in all cases.



Run time reduction from static
approach is more than 1000 times.



Prime Power reported power is optimistic in many cases to PowerMill. This is not in
our expectation and we are looking into it.



TFC is within 30% of PowerMill reported power. However there are certain exceptions
where it
reports 30% optimistic power or >50% pessimistic power.



Power Compiler is >50% pessimistic in most of the cases.



52


Design

Name

IN

OUT

Flops

Boolean

(gates+inv)

s111

8

1

0

8

s1196

14

14

18

388+141

s1238

14

14

18

428+80

s13207

31

121

669

2573+5378

s13
207_1

62

152

638

2573+5378

s1423

17

5

74

490+167

s1488

8

19

6

550+103

s1494

8

19

6

558+89

s15850

14

87

597

3448+6324

s15850_1

77

150

534

3448+6324

s208_1

10

1

8

66+38

s27

4

1

3

8+2

s298

3

6

14

75+44

s344

9

11

15

101+59

s349

9

11

15

104+57


53

Design

Name

IN

OUT

Flops

Boolean

(gates+inv)

s35932

35

320

1728

12204+3861

s382

3

6

21

99+59

s38417

28

106

1636

8709+13470

s38584

12

278

1452

11448+7805

s38584_1

38

304

1426

11448+7805

s386

7

7

6

118+41

s4

2

1

1

0

s400

3

6

21

106+58

s420_1

18

1

16

140+78

s444

3

6

21

119+62

s5

2

1

0

1+0

s510

19

7

6

179+32

s526

3

6

21

141+52

s526n

3

6

21

140+54

s5378

35

49

179

1004+1775

s641

35

24

19

107+272


54

Design

Name

IN

OUT

Flops

Boolean

(gates+inv)

s713

35

23

19

139+254

s820

18

19

5

256+33

s832

18

19

5

262+25

s838_1

34

1

32

288+158

s9234

19

22

228

2027+3570

s9234_1

36

39

211

2027+3570

s953

16

2
3

29

311+84


Table

3