Construction of Realistic Gate

stingymilitaryElectronics - Devices

Nov 27, 2013 (3 years and 10 months ago)

84 views

-
1
-

UC

San Diego / VLSI CAD Laboratory

Construction of Realistic Gate
Sizing Benchmarks

With Known Optimal Solutions

Andrew B. Kahng,
Seokhyeong Kang

VLSI CAD LABORATORY,
UC

San Diego


International Symposium on Physical Design

March 27
th
, 2012


-
2
-

Outline


Background and Motivation


Benchmark Generation


Experimental Framework and Results


Conclusions and Ongoing Work


-
3
-

Gate Sizing in VLSI Design


Gate sizing


Essential for power, delay and area optimization


Tunable parameters: gate
-
width, gate
-
length and
threshold voltage


Sizing problem seen in all phases of RTL
-
to
-
GDS flow


Common heuristics/algorithms


LP,
Lagrangian

relaxation, convex optimization, DP,
sensitivity
-
based gradient descent, ...

1.
Which heuristic is better?

2.
How suboptimal a given sizing solution is?



s
ystematic and quantitative comparison is required


-
4
-

Suboptimality

of Sizing Heuristics


Eyechart

*






B
uilt from three basic topologies, optimally sized with
DP


allow
suboptimalities

to be evaluated


Non
-
realistic:
E
yechart

circuits have different topology
from real design


large depth (650 stages) and small
Rent parameter (0.17)


More realistic benchmarks are required along w/
automated generation flow

*Gupta et al
.,

Eyecharts
: Constructive Benchmarking of Gate Sizing Heuristics”,
DAC

2010.

Chain

MESH

STAR

-
5
-

Our Work:
Realistic Benchmark
Generation w/ Known Optimal Solution

1.
Propose benchmark circuits with known optimal solutions

2.
The benchmarks resemble real designs



Gate count, path depth, Rent parameter and net degree

3.
Assess
suboptimality

of standard gate sizing approaches



Construct chains
Find optimal
solution
Connect chains
keeping the optimal solution
Netlist generator
Characteristic
parameters
Benchmark
circuit
w
/
known optimal
solution
Real design
Extract
parameters
Automated benchmark
generation flow

-
6
-

Outline


Background and Motivation


Benchmark Considerations and
Generation


Experimental Framework and Results


Conclusions and Ongoing Work


-
7
-

Benchmark Considerations


Realism

vs.
T
ractability to
Analysis


opposing
goals


To construct realistic benchmark: use
design
characteristic parameters


# primary ports, path depth,
fanin
/
fanout

distribution







To enable known optimal solutions


Library simplification as in Gupta et al. 2010:

slew
-
independent library

0
0.2
0.4
0.6
1
2
3
4
5
6
fanin
fanout
design:

JPEG Encoder


Fanin

distirbution

25%: 1
-
input

60%: 2
-
input

15%: >3
-
input

Path depth: 72

Avg
. net degree: 1.84

Rent parameter: 0.72

-
8
-

Benchmark Generation


Input parameters

1.
timing budget
T

2.
depth of data path
K

3.
number of primary ports
N

4.
fanin
,
fanout

distribution
fid(i),
fod
(j)


Constraints


T

should be larger than min. delay of
K
-
stage chain



𝑖

𝑓𝑖𝑑
(
𝑖
)
𝐼
𝑖
=
1
=

𝑜

𝑓𝑜𝑑
(
𝑜
)
𝑂
𝑜
=
1


Generation flow

1.
construct
N

chains with
depth
K

2.
attach connection
cells (
C
)

3.
connect chains



netlist

with
N*K + C
cells


-
9
-

Benchmark Generation:

Construct Chains

1.
Construct
N

chains each with depth
k

(
N*k

cells)

2.
Assign gate instance according to
fid(i)

3.
Assign #
fanouts

to output ports according to
fod
(o)


Assignment strategy: arranged and random

chain
1
chain
2
chain
N
...
stage
1
stage
K
-
1
stage
K
gate
(
1
,
1
)
gate
(
N
,
K
)
-
10
-

Benchmark Generation:

Construct Chains

1.
Construct
N

chains each with depth
k

(
N*k

cells)

2.
Assign gate instance according to
fid(i)

3.
Assign #
fanouts

to output ports according to
fod
(o)


Assignment strategy: arranged and random

fanout
fanin
Arranged assignment

Random assignment

-
11
-

Benchmark Generation:

Find Optimal Solution with DP

1.
Attach connection cells to all open
fanouts

-
to connect chains keeping optimal solution

2.
Perform dynamic programming
with timing budget
T

-
optimal solution is achievable w/ slew
-
independent lib.


chain
1
chain
2
chain
N
...
connection cell
chain
1
chain
2
chain
N
...
-
12
-

Benchmark Generation:

Solving a Chain Optimally (Example)

6

8 20 1

INV1

INV2

INV3

D
max

= 8

1 10 2

2 10 2

3 5 1

4 5 1

5 5 1

6 5 1

7 5 1

8 5 1

3 20 2

4 15 1

5 15 2

6 10 1

7 10 1

8 10 1

4 20 2

5 15 1

6 15 2

7 10 1

8 10 1

Stage 1

Stage 2

Stage 3

Stage 3

Stage 1

Stage 2




Budget
Power

Size




Budget

Power Size




Budget

Power
Size

Load

= 3

Load

= 6

Load

= 3

Load

= 6

size

input

cap

leakage

power

delay

load 3

load 6

Size

1

3

5

3

4

Size 2

6

10

1

2

2 10 2

3 10 2

4 5 1

5 5 1

6 5 1

7 5 1

8 5 1

8
25 2

OPTIMIZED CHAIN

size 2

size 1

size 1

-
13
-

Benchmark Generation:

Connect Chains

1.
Run
STA

and find arrival time for each gate

2.
Connect each connection cell to open
fanin

port

-

connect only if timing constraints are satisfied

-

connection cells do not change the optimal chain solution

3.
Tie unconnected ports to logic high or low

c
g
w
c
,
g
a
c
a
g
d
g
c
chain
1
chain
2
chain
N
...
VDD

chain
1
chain
2
chain
N
...
-
14
-

Benchmark Generation:

Generated Netlist


Generated output:


benchmark circuit of
N*K + C
cells w/ optimal solution


Schematic of generated
netlist

(N = 10, K = 20)


Chains are connected to each other


various topologies

-
15
-

Outline


Background and Motivation


Benchmark Generation


Experimental Framework and Results


Conclusions and Ongoing Work


-
16
-

Experimental Setup


Delay and Power model (library)


LP:

linear increase in power


gate sizing context


EP:

exponential increase in power


Vt

or gate
-
length


Heuristics compared


Two commercial tools (
BlazeMO
, Cadence Encounter)


UCLA sizing tool


UCSD

sensitivity
-
based leakage optimizer


Realistic benchmarks: six open
-
source designs


Suboptimality

calculation

Suboptimality

=

power
heuristic

-

power
opt

power
opt

-
17
-

Generated Benchmark
-

Complexity


Complexity (
suboptimality
) of generated benchmark


Chain
-
only
vs
. connected
-
chain topologies

0.0%
5.0%
10.0%
15.0%
20.0%
chain-only
connected
0.0%
5.0%
10.0%
15.0%
20.0%
chain-only
connected
Suboptimality

Commercial tool

Greedy

Chain
-
only: avg. 2.1%

Connected
-
chain: avg. 12.8%


[library]
-
[N]
-
[k]

-
18
-

Generated Benchmark
-

Connectivity


Problem complexity and circuit connectivity

1.
Arranged assignment:
improve connectivity

(larger
fanin



later stage, larger
fanout



earlier stage)

2.
Random assignment:
improve diversity of topology


arranged

random

unconnected

Subopt
.

100%

0%

0.00%

2.60%

75%

25%

0.00%

6.80%

50%

50%

0.25%

10.30%

25%

75%

0.75%

11.20%

0%

100%

17.00%

7.70%

-
19
-

Suboptimality

w.r.t
. Parameters


For different number of chains


1
10
100
1000
10000
8%
9%
10%
11%
12%
13%
14%
40
80
160
320
640
runtime (min)

suboptimality

number of chains

subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)

For different number of stages


1
10
100
1000
8%
9%
10%
11%
12%
13%
14%
20
40
60
80
100
runtime (min)

suboptimality

number of stages

subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)

Total # paths increase significantly
w.r.t
. N and K

-
20
-

Suboptimality

w.r.t
. Parameters (2)


For different average net degrees



For different delay constraints


0.1
1.0
10.0
100.0
1000.0
0%
20%
40%
60%
80%
100%
120%
1.2
1.6
2
2.4
runtime (min)

suboptimality

average net degree

subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
0.1
1.0
10.0
100.0
0%
5%
10%
15%
20%
25%
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
runtime (min)

suboptimality

timing constraint (ns)

subopt.(Comm)
subopt.(Greedy)
subopt.(SensOpt)
runtime(Comm)
runtime(Greedy)
runtime(SensOpt)
-
21
-

Generated Realistic Benchmarks


Target benchmarks


SASC
,
SPI
, AES, JPEG, MPEG (from
OpenCores
)


EXU

(from
OpenSPARC

T1
)


Characteristic parameters of real and generated benchmarks

data
depth

#instance

real designs

generated

Rent
param
.

net

degree

Rent
param
.

net

degree

SASC

20

624

0.858

2.06

0.865

2.06

SPI

33

1092

0.880

1.81

0.877

1.80

EXU

31

25560

0.858

1.91

0.814

1.90

AES

23

23622

0.810

1.89

0.820

1.88

JPEG

72

141165

0.721

1.84

0.831

1.84

MPEG

33

578034

0.848

1.59

0.848

1.60

-
22
-

Suboptimality

of Heuristics


Suboptimality

w.r.t
. known optimal solutions for
generated realistic benchmarks


Vt

swap
context


up to
52.2%

avg.
16.3%


0.00%
20.00%
40.00%
60.00%
eyechart
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
SensOpt
0.00%
20.00%
40.00%
60.00%
eyechart
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
SensOpt
Gate sizing
context


up to
43.7%

avg.
25.5%

Suboptimality

* Greedy results for MPEG are missing

With EP library

With LP library

-
23
-

Comparison w/ Real Designs


Suboptimality

versus one specific heuristic (
SensOpt
)


Real designs and real
delay/leakage library
(
TSMC

65nm
) case

Actual
suboptimaltiy

will be greater !

-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
SASC
SPI
AES
EXU
JPEG
MPEG
Comm1
Comm2
Greedy
Suboptimality

from
our benchmarks


Discrepancy: simplified delay model, reduced library set, ...


-
24
-

Conclusions


A new benchmark generation technique for gate
sizing


construct realistic circuits with known
optimal solutions


Our benchmarks enable systematic and
quantitative study of common sizing heuristics


Common sizing methods are suboptimal for
realistic benchmarks by up to 52.2% (
Vt

assignment) and 43.7% (sizing)


http://
vlsicad.ucsd.edu
/SIZING/



-
25
-

Ongoing Work


Analyze discrepancies between real and
artificial benchmarks


Handle more realistic delay model


Use realistic delay library in the context of
realistic benchmarks with tight upper bounds


Alternate approach for
netlist

generation


(1) cutting nets in a real design and find
optimal solution


(2) reconnecting the nets
keeping the optimal solution


-
26
-

Thank you