
1

UC
San Diego / VLSI CAD Laboratory
Construction of Realistic Gate
Sizing Benchmarks
With Known Optimal Solutions
Andrew B. Kahng,
Seokhyeong Kang
VLSI CAD LABORATORY,
UC
San Diego
International Symposium on Physical Design
March 27
th
, 2012

2

Outline
Background and Motivation
Benchmark Generation
Experimental Framework and Results
Conclusions and Ongoing Work

3

Gate Sizing in VLSI Design
Gate sizing
–
Essential for power, delay and area optimization
–
Tunable parameters: gate

width, gate

length and
threshold voltage
–
Sizing problem seen in all phases of RTL

to

GDS flow
Common heuristics/algorithms
–
LP,
Lagrangian
relaxation, convex optimization, DP,
sensitivity

based gradient descent, ...
1.
Which heuristic is better?
2.
How suboptimal a given sizing solution is?
s
ystematic and quantitative comparison is required

4

Suboptimality
of Sizing Heuristics
Eyechart
*
–
B
uilt from three basic topologies, optimally sized with
DP
–
allow
suboptimalities
to be evaluated
–
Non

realistic:
E
yechart
circuits have different topology
from real design
–
large depth (650 stages) and small
Rent parameter (0.17)
More realistic benchmarks are required along w/
automated generation flow
*Gupta et al
.,
“
Eyecharts
: Constructive Benchmarking of Gate Sizing Heuristics”,
DAC
2010.
Chain
MESH
STAR

5

Our Work:
Realistic Benchmark
Generation w/ Known Optimal Solution
1.
Propose benchmark circuits with known optimal solutions
2.
The benchmarks resemble real designs
–
Gate count, path depth, Rent parameter and net degree
3.
Assess
suboptimality
of standard gate sizing approaches
Construct chains
Find optimal
solution
Connect chains
keeping the optimal solution
Netlist generator
Characteristic
parameters
Benchmark
circuit
w
/
known optimal
solution
Real design
Extract
parameters
Automated benchmark
generation flow

6

Outline
Background and Motivation
Benchmark Considerations and
Generation
Experimental Framework and Results
Conclusions and Ongoing Work

7

Benchmark Considerations
Realism
vs.
T
ractability to
Analysis
–
opposing
goals
To construct realistic benchmark: use
design
characteristic parameters
–
# primary ports, path depth,
fanin
/
fanout
distribution
To enable known optimal solutions
–
Library simplification as in Gupta et al. 2010:
slew

independent library
0
0.2
0.4
0.6
1
2
3
4
5
6
fanin
fanout
design:
JPEG Encoder
Fanin
distirbution
25%: 1

input
60%: 2

input
15%: >3

input
Path depth: 72
Avg
. net degree: 1.84
Rent parameter: 0.72

8

Benchmark Generation
Input parameters
1.
timing budget
T
2.
depth of data path
K
3.
number of primary ports
N
4.
fanin
,
fanout
distribution
fid(i),
fod
(j)
Constraints
–
T
should be larger than min. delay of
K

stage chain
–
𝑖
∙
𝑓𝑖𝑑
(
𝑖
)
𝐼
𝑖
=
1
=
𝑜
∙
𝑓𝑜𝑑
(
𝑜
)
𝑂
𝑜
=
1
Generation flow
1.
construct
N
chains with
depth
K
2.
attach connection
cells (
C
)
3.
connect chains
netlist
with
N*K + C
cells

9

Benchmark Generation:
Construct Chains
1.
Construct
N
chains each with depth
k
(
N*k
cells)
2.
Assign gate instance according to
fid(i)
3.
Assign #
fanouts
to output ports according to
fod
(o)
Assignment strategy: arranged and random
chain
1
chain
2
chain
N
...
stage
1
stage
K

1
stage
K
gate
(
1
,
1
)
gate
(
N
,
K
)

10

Benchmark Generation:
Construct Chains
1.
Construct
N
chains each with depth
k
(
N*k
cells)
2.
Assign gate instance according to
fid(i)
3.
Assign #
fanouts
to output ports according to
fod
(o)
Assignment strategy: arranged and random
fanout
fanin
Arranged assignment
Random assignment

11

Benchmark Generation:
Find Optimal Solution with DP
1.
Attach connection cells to all open
fanouts

to connect chains keeping optimal solution
2.
Perform dynamic programming
with timing budget
T

optimal solution is achievable w/ slew

independent lib.
chain
1
chain
2
chain
N
...
connection cell
chain
1
chain
2
chain
N
...

12

Benchmark Generation:
Solving a Chain Optimally (Example)
6
8 20 1
INV1
INV2
INV3
D
max
= 8
1 10 2
2 10 2
3 5 1
4 5 1
5 5 1
6 5 1
7 5 1
8 5 1
3 20 2
4 15 1
5 15 2
6 10 1
7 10 1
8 10 1
4 20 2
5 15 1
6 15 2
7 10 1
8 10 1
Stage 1
Stage 2
Stage 3
Stage 3
Stage 1
Stage 2
Budget
Power
Size
Budget
Power Size
Budget
Power
Size
Load
= 3
Load
= 6
Load
= 3
Load
= 6
size
input
cap
leakage
power
delay
load 3
load 6
Size
1
3
5
3
4
Size 2
6
10
1
2
2 10 2
3 10 2
4 5 1
5 5 1
6 5 1
7 5 1
8 5 1
8
25 2
OPTIMIZED CHAIN
size 2
size 1
size 1

13

Benchmark Generation:
Connect Chains
1.
Run
STA
and find arrival time for each gate
2.
Connect each connection cell to open
fanin
port

connect only if timing constraints are satisfied

connection cells do not change the optimal chain solution
3.
Tie unconnected ports to logic high or low
c
g
w
c
,
g
a
c
a
g
d
g
c
chain
1
chain
2
chain
N
...
VDD
chain
1
chain
2
chain
N
...

14

Benchmark Generation:
Generated Netlist
Generated output:
–
benchmark circuit of
N*K + C
cells w/ optimal solution
Schematic of generated
netlist
(N = 10, K = 20)
Chains are connected to each other
various topologies

15

Outline
Background and Motivation
Benchmark Generation
Experimental Framework and Results
Conclusions and Ongoing Work

16

Experimental Setup
Delay and Power model (library)
–
LP:
linear increase in power
–
gate sizing context
–
EP:
exponential increase in power
–
Vt
or gate

length
Heuristics compared
–
Two commercial tools (
BlazeMO
, Cadence Encounter)
–
UCLA sizing tool
–
UCSD
sensitivity

based leakage optimizer
Realistic benchmarks: six open

source designs
Suboptimality
calculation
Suboptimality
=
power
heuristic

power
opt
power
opt

17

Generated Benchmark

Complexity
Complexity (
suboptimality
) of generated benchmark
Chain

only
vs
. connected

chain topologies
0.0%
5.0%
10.0%
15.0%
20.0%
chainonly
connected
0.0%
5.0%
10.0%
15.0%
20.0%
chainonly
connected
Suboptimality
Commercial tool
Greedy
Chain

only: avg. 2.1%
Connected

chain: avg. 12.8%
[library]

[N]

[k]

18

Generated Benchmark

Connectivity
Problem complexity and circuit connectivity
1.
Arranged assignment:
improve connectivity
(larger
fanin
–
later stage, larger
fanout
–
earlier stage)
2.
Random assignment:
improve diversity of topology
arranged
random
unconnected
Subopt
.
100%
0%
0.00%
2.60%
75%
25%
0.00%
6.80%
50%
50%
0.25%
10.30%
25%
75%
0.75%
11.20%
0%
100%
17.00%
7.70%

19

Suboptimality
w.r.t
. Parameters
For different number of chains
1
10
100
1000
10000
8%
9%
10%
11%
12%
13%
14%
40
80
160
320
640
runtime (min)
suboptimality
number of chains
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
For different number of stages
1
10
100
1000
8%
9%
10%
11%
12%
13%
14%
20
40
60
80
100
runtime (min)
suboptimality
number of stages
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
Total # paths increase significantly
w.r.t
. N and K

20

Suboptimality
w.r.t
. Parameters (2)
For different average net degrees
For different delay constraints
0.1
1.0
10.0
100.0
1000.0
0%
20%
40%
60%
80%
100%
120%
1.2
1.6
2
2.4
runtime (min)
suboptimality
average net degree
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
0.1
1.0
10.0
100.0
0%
5%
10%
15%
20%
25%
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
runtime (min)
suboptimality
timing constraint (ns)
subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)

21

Generated Realistic Benchmarks
Target benchmarks
–
SASC
,
SPI
, AES, JPEG, MPEG (from
OpenCores
)
–
EXU
(from
OpenSPARC
T1
)
Characteristic parameters of real and generated benchmarks
data
depth
#instance
real designs
generated
Rent
param
.
net
degree
Rent
param
.
net
degree
SASC
20
624
0.858
2.06
0.865
2.06
SPI
33
1092
0.880
1.81
0.877
1.80
EXU
31
25560
0.858
1.91
0.814
1.90
AES
23
23622
0.810
1.89
0.820
1.88
JPEG
72
141165
0.721
1.84
0.831
1.84
MPEG
33
578034
0.848
1.59
0.848
1.60

22

Suboptimality
of Heuristics
Suboptimality
w.r.t
. known optimal solutions for
generated realistic benchmarks
Vt
swap
context
–
up to
52.2%
avg.
16.3%
0.00%
20.00%
40.00%
60.00%
eyechart
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
SensOpt
0.00%
20.00%
40.00%
60.00%
eyechart
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
SensOpt
Gate sizing
context
–
up to
43.7%
avg.
25.5%
Suboptimality
* Greedy results for MPEG are missing
With EP library
With LP library

23

Comparison w/ Real Designs
Suboptimality
versus one specific heuristic (
SensOpt
)
Real designs and real
delay/leakage library
(
TSMC
65nm
) case
Actual
suboptimaltiy
will be greater !
10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
Suboptimality
from
our benchmarks
Discrepancy: simplified delay model, reduced library set, ...

24

Conclusions
A new benchmark generation technique for gate
sizing
construct realistic circuits with known
optimal solutions
Our benchmarks enable systematic and
quantitative study of common sizing heuristics
Common sizing methods are suboptimal for
realistic benchmarks by up to 52.2% (
Vt
assignment) and 43.7% (sizing)
http://
vlsicad.ucsd.edu
/SIZING/

25

Ongoing Work
Analyze discrepancies between real and
artificial benchmarks
Handle more realistic delay model
–
Use realistic delay library in the context of
realistic benchmarks with tight upper bounds
Alternate approach for
netlist
generation
–
(1) cutting nets in a real design and find
optimal solution
(2) reconnecting the nets
keeping the optimal solution

26

Thank you
Comments 0
Log in to post a comment