VLSI Scaling for Architects

connectionbuttsΗλεκτρονική - Συσκευές

26 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

107 εμφανίσεις

VLSI Scaling for Architects
MAH1
VLSI Scalin
g
for Architects
Mark Horowitz
Computer S
y
stems Laborator
y
Stanford Universit
y
horowitz@stanford.edu
VLSI Scaling for Architects
MAH2
The Buzz is VLSI Wires are Bad
A lot of talk about VLSI wires
bein
g
a problem:
Dela
y
Noise couplin
g
What are the characteristics of
chip wires?
How do the
y
compare to
scaled
g
ates?
And what does it reall
y
mean?
To CAD developers
To architects
A ver
y
popular fi
g
ure
VLSI Scaling for Architects
MAH3
How Will Scalin
g
Chan
g
e Computer Desi
g
n?
To answer this
q
uestion:
First look at what chan
g
es when technolo
gy
scales
Surprisin
g
l
y
less chan
g
es than
y
ou mi
g
ht think
Components
g
et faster
(
both wires and
g
ates
)
Mostl
y
it allows one to build more complex devices
Then look at how computin
g
devices use silicon technolo
gy
How architects and circuit desi
g
ners use the transistors
What are the loomin
g
problems with scalin
g
What can be done to help
Lets start b
y
lookin
g
at scalin
g
CMOS technolo
gy
VLSI Scaling for Architects
MAH4
Predictin
g
the Future
(without makin
g
a fool of yourself)
Is ver
y
difficult
The onl
y

g
uarantee is:
The future will happen, and
y
ou will be wron
g
Two approaches
Think about limitations
SIA 1994 Roadmap
Limited oxide thickness, small clock frequency growth, etc.
Industry hit points above the curve
Pro
j
ect from current trends
SIA 1997 Roadmap
Allow miracles to occur, continue trends
Project clock rates higher than physically possible
So use a ran
g
e of technolo
gy
scalin
g
s
Better chance of coverin
g
the correct answer
VLSI Scaling for Architects
MAH5
Technolo
g
y Scalin
g
In scalin
g
there are reall
y
two issues
Devices
Can we build smaller devices
What will their performance be
Wires
Tr
y
to avoid the wet noodle effect
There is concern about our abilit
y
to scale both of these
components
VLSI Scaling for Architects
MAH6
Device Scalin
g
Limits
Limitations to device scalin
g
has been around since I started
I started workin
g
in 3

nMOS, 22
y
ears a
g
o
(
actuall
y
bipolar
)
Worries were
Short channel effect
Punchthrou
g
h
drain control of current rather than
g
ate
Hot electrons
Parasitic resistances
Now worries are a little different
Oxide tunnel currents
Punchthrou
g
h
Parameter control
Parasitic resistances
VLSI Scaling for Architects
MAH7
Transistor Scalin
g
People are buildin
g
ver
y
short channel devices
Shown are I-V curves for 15nm L pMOS
And a short channel nMOS
The structure is stran
g
e
FinFET
But
y
ou can make them work
VLSI Scaling for Architects
MAH8
Lo
g
ic Gate Speed
How does the speed of a
g
ate depend on technolo
gy
?
Use a Fanout of 4 inverter metric
Measure the dela
y
of an inverter with C
out/Cin = 4
Divide speed of a circuit b
y
speed of FO4 inverter
Get dela
y
of circuit in measured in FO4 inverters
Metric prett
y
stable, over process, temp, and volta
g
e
1416
FO4
VLSI Scaling for Architects
MAH9
Delay Trackin
g
1.4
2


Power Suppl
y

(
V
)
8.4
12
2
2.5
3
3.5
Inverter Delays (log)
Process Corner
1.4
2


8.4
12
TTFFSSFSSF
VLSI Scaling for Architects
MAH10
Usin
g
FO4 Metric
Is ver
y
useful wa
y
to estimate/compare circuit/lo
g
ic
Can compare two desi
g
ns and normalize out technolo
gy
Can set lower bound on speed of a circuit block
If
y
ou know fanout of
g
ate,
y
ou can estimate dela
y
Complete theor
y
is called lo
g
ical effort
Total Fanout = Electrical fanout * Lo
g
ical Effort of
g
ate
Use an adder as an example
Fastest 64 bit adders run in around 7 FO4 dela
y
s
D
y
namic adders
Hasnt chan
g
ed ver
y
much recentl
y

(
HP at ISSCC in 96
)
Can find bound on the dela
y
Need to have fanout of 64 for Cin
L.E. has a 2 input XOR and some carr
y
lo
g
ic
VLSI Scaling for Architects
MAH11
Memory Performance
Can use FO4 dela
y
s
to look at memor
y
access time too.
Dela
y
is lo
g(
Size
)
for
an optimized desi
g
n.
Wire dela
y
is
important at lar
g
er
memories, but not a
dominant factor.
Partition arra
y
,
use fewer thicker
wires.
Notice access times
10-20 FO4
64Kb256Kb1Mb4Mb16Mb
Size
0.0
10.0
20.0
30.0
40.0
Delay (

fo4
)
Total Delay (no wire res.)
Decoder Delay (no wire res.)
Output Path Delay (no wire res.)
Total Delay (with wire res.)
VLSI Scaling for Architects
MAH12
FO4 Inverter Delay Under Scalin
g
Device performance will scale
FO4 dela
y
has been linear with tech
Approximatel
y
0.36 nS/

m*L
drawn
at TT
(
0.5nS/

m under worst-case conditions
)
Eas
y
to predict
g
ate performance
We can measure them
Labs have built 0.04

m devices
Ke
y
issue is volta
g
e scalin
g
Need to scale Vdd for power
Hard since Vth does not scale
Gate speed improvement mi
g
ht slow down
Vth issues and gate oxide limitations
1
3
5
7
0
0.5
1
1.5
2
FO4 delay (nS)
Feature size (

m)
0.36 * L
drawn
VLSI Scaling for Architects
MAH13
Circuit Power
Is ver
y
much tied to volta
g
e scalin
g
If the power suppl
y
scales with technolo
gy
For a fixed complexit
y
circuit
Power scales down as

^3 if
y
ou run as same fre
q
uenc
y
Power scales down as

^2 if
y
ou run it 1/

times faster
Power scalin
g
is a problem because
Fre
q
has been scalin
g
at faster than 1/

Complexit
y
of machine has been
g
rowin
g
This will continue to be an issue in future chips
Remember scalin
g
the technolo
gy
makes a chip lower power!
VLSI Scaling for Architects
MAH14
Wire Scalin
g
More uncertaint
y
than transistor scalin
g
Man
y
options with complex trade-offs
For each metal la
y
er
Need to set H, T
T, TB, 
1
,

2, conductivit
y
of the metal
W
SR
SL
TB
TT
H

1

2
ILD
Top
ILD
Bot
Metal, ILD
middle
VLSI Scaling for Architects
MAH15
Wire Capacitance
Capacitance per micron is rou
g
hl
y
constant
Simple approx. = frin
g
e
(
0.07fF/

m toda
y)
+ 4 parallel plates
Depends onl
y
on the ratio of the parameters
Couplin
g
becomes a lar
g
e issue with increased

H/S ratio
ILD
Top
ILD
Bot
Metal, ILD
middle

1W/T
T

1W/T
B

2H/S
R

2H/S
L
VLSI Scaling for Architects
MAH16
Wire Resistance
Resistance is simpler
R/

m =

/wh
Scales up as the technolo
gy
shrinks
Main reason that wire hei
g
ht has not scaled much
Tradeoffs between hei
g
ht, width and wire pitch
H1
W1
R

W1

H1
2R
1.4R

W1
H1
VLSI Scaling for Architects
MAH17
Wire Layers
Not all wirin
g
la
y
ers have the same characteristics
Toda
y
have three t
y
pes of levels
M1
Finest pitch, hi
g
hest resistance, local interconnection in a cell
M2-4?
Normal routin
g
level, most of the wires
M5+
Thick coarse metal, used for
g
lobal wires
When scalin
g
forces thinner metal, create new top la
y
er
VLSI Scaling for Architects
MAH18
Noise Issues
Two main noise sources for wires
Capacitance couplin
g
Inductive couplin
g
Capacitance couplin
g
is mostl
y
a nearest nei
g
hbor issue
Hi
g
h aspect ratio wires make this worse
Real push for low-

dielectric between wires
Inductive couplin
g
Is much more complex to anal
y
ze
Depends on where the return currents flow
Reduce these problem b
y
desi
g
n constraints
Gnd returns in buses, power and
g
nd planes
(
e.
g
. 21264
)
VLSI Scaling for Architects
MAH19
0
0.2
0.4
0.6
0.25
0.18
0.13
0.1
0.07
0.05
0.035
pF
Technology Ldrawn (um)
Semi-global wire capacitance, 1mm long
Aggressive scaling
Conservative scaling
0
0.1
0.2
0.3
0.4
0.25
0.18
0.13
0.1
0.07
0.05
0.035
Kohms
Technology Ldrawn (um)
Semi-global wire resistance, 1mm long
Aggressive scaling
Conservative scaling
Scalin
g
Global Wires
R
g
ets
q
uite a bit worse with scalin
g
; C basicall
y
constant
VLSI Scaling for Architects
MAH20
0
0.2
0.4
0.6
0.25
0.18
0.13
0.1
0.07
0.05
0.035
pF
Technology Ldrawn (um)
Semi-global wire capacitance, scaled length
Aggressive scaling
Conservative scaling
0
0.1
0.2
0.3
0.4
0.25
0.18
0.13
0.1
0.07
0.05
0.035
Kohms
Technology Ldrawn (um)
Semi-global wire resistance, scaled length
Aggressive scaling
Conservative scaling
Scalin
g
Module Wires
R is basicall
y
constant, and C falls linearl
y
with scalin
g
VLSI Scaling for Architects
MAH21
0
0.1
0.2
0.3
0.4
0.5
0.25
0.18
0.13
0.1
0.07
0.05
0.035
Wire delay/Gate delay
Technology Ldrawn (um)
Semi-global wire, scaled length
Aggressive scaling
Conservative scaling
Scaled wire dela
y
s sta
y
prett
y
constant relative to
g
ates
Not a ver
y
bi
g
chan
g
e
Module Wires
These wires scale fairl
y
well:
VLSI Scaling for Architects
MAH22
Is There a Module-Level Wire Problem?
This first cut seems to impl
y
that scaled wires arent a problem
Dela
y
of these wires are scalin
g

(
mostl
y)
with
g
ate speed
Lon
g
wires
g
et worse, but prett
y
slowl
y
So the
j
ob a desi
g
ner
(
or CAD tool
)
see sta
y
s the same, ri
g
ht?
So what are we missin
g
?
Wh
y
are people workin
g
so hard on developin
g
new tools?
Die complexit
y
: what if the number of
modules
doubles?
VLSI Scaling for Architects
MAH23
19 modules, 49 exceptions
19 modules, 24 exceptions
Implications of Complexity Growth
What can we do?
Lar
g
er desi
g
n teams?
Lon
g
er desi
g
n times?

Better tools that have fewer exceptions per module
9 modules, 22 exceptions
VLSI Scaling for Architects
MAH24
0
100
200
300
0.25
0.18
0.13
0.10
0.07
0.05
0.035
Length in gate pitches
Technology Ldrawn (um)

100K gate module
75K gate module
50K gate module
Current trend
Keepin
g
Desi
g
n Time Reasonable
Suppose we want the total CAD tool exceptions to sta
y
constant
At a module level, we need 1/2 as man
y
exceptions
The re
q
uired threshold for exceptions will
g
row
The dela
y
of those lines increases dramaticall
y
So CAD tools need to handle increasin
g
l
y
lon
g
er wires
Is this important?
THIS is important!
VLSI Scaling for Architects
MAH25
0
1
2
3
4
5
6
7
0.25
0.18
0.13
0.1
0.07
0.05
0.035
Wire delay/Gate delay
Technology Ldrawn (um)
Semi-global wire, 1mm long
Aggressive scaling
Conservative scaling
Global Wire Scalin
g
Fixed-len
g
th wires, relative to
g
ates, worsen b
y
2x per
g
eneration
This is a bi
g
problem
Now we examine
g
lobal wire dela
y
relative to
g
ate dela
y
VLSI Scaling for Architects
MAH26
Desi
g
ner Responses
Wire en
g
ineerin
g
-- use wider wires or thicker wires
Makin
g
wires wider will improve performance
Resistance = k/W; Capacitance = C
0 + b*W
RC = kC
0/W + kb; b << C
0 so increasin
g
W helps
q
uite a bit
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
4
6
8
10
12
14
16
Delay (nS) for a 1cm wire
Wire width in lambdas
M3
M5
VLSI Scaling for Architects
MAH27
Desi
g
ner Responses
Circuit solution -- use repeaters
Break the wire into se
g
ments
Dela
y
becomes linear with len
g
th
Si
g
nal velocit
y
= k
(
FO4 R
w Cw
)
1/2
Cload
Cw
4
Cw
4
Rw/2
Cload
Cw
4
Cw
4
Rw/2
Cw
4
Rw/2
Cw
4
VLSI Scaling for Architects
MAH28
When to Add Repeaters
Repeaters have overhead and can introduce lo
g
ic complexit
y
Add when the
y
reduce the dela
y
; wire RC > 8 FO4
Add sooner to reduce noise couplin
g
issues
Plot for 0.25

m minimum width wires:
0
2
4
6
8
10
12
14
0
5
10
15
20
Delay (FO4)
Distance (mm)
M3
M5
Repeaters
Repeaters
VLSI Scaling for Architects
MAH29
Si
g
nal Velocity for Repeated wires
Under SIA scalin
g
, prett
y
constant over man
y

g
enerations
Under conservative scalin
g
, slow chan
g
e at sub-0.1

m techs
Makes wire dela
y
increase slowl
y
20
40
60
80
100
120
140
160
180
0.05
0.1
0.15
0.2
0.25
Delay (ps/mm)
Feature size (

)
1
3
5


VLSI Scaling for Architects
MAH30
A Different View
Wires are not as bad as the
y
are painted in the media
If
y
ou take a chip and scale it
Keepin
g
the same exact desi
g
n
The ratio of the wire dela
y
to
g
ate dela
y
will chan
g
e slowl
y
Currentl
y
the
y
both scale b
y
rou
g
hl
y
the scalin
g
factor
Wire
g
et a little slower each
g
eneration
Not a bi
g
issue
Ke
y
is the lo
g
ical span of the wire remained constant
The problem is if
y
ou make the chip more complex
Wires in
g
eneral need to connect up more
g
ates
Lo
g
ical span increases
Get slower
Circuit Tricks
(
repeaters
)
help some, but not enou
g
h
VLSI Scaling for Architects
MAH31
The World is Growin
g
The problem associated with wires is reall
y
due to complexit
y
Dia
g
ram shows the lo
g
ical span
y
ou reach in a c
y
cle
It also show the lo
g
ical span of a chip
Old view: a chip looks small to a wire
Lo
g
ical chip size
Distance I can
g
o in 1 cycle
New view: a chip looks really big to a wire
How is this going to affect chip design?
VLSI Scaling for Architects
MAH32
Computer Performance
How is scalin
g

g
oin
g
to effect computer desi
g
n?
Use Henness
y
Patterson Formula
Dela
y
= Number of Instructions * C
y
cles/Inst * Clock Period
Performance = Clock Fre
q
uenc
y
/CPI
Look at how these factors have been improvin
g
, and wh
y
0.01
0.1
1
10
100
Jan-85Jan-88Jan-91Jan-94Jan-97
80386
80486
Pentium
Pentium II
SpecInt9
5
ISPEC Performance vs Year
1
10
100
1993199419951996199719981999
Year
ISPEC
ISPEC
VLSI Scaling for Architects
MAH33
Computer Architects Job
Convert transistors to performance
Use transistors to
Exploit parallelism
Or create it
(
speculate
)
Processor
g
enerations
Simple machine
Reuse hardware
Pipelined
Separate hardware for each sta
g
e
Super-scalar
Multiple port mems, function units
Out-of-order
Me
g
a-ports, complex schedulin
g
Speculation
Each desi
g
n has more lo
g
ic to
accomplish same task
(
but faster
)
VLSI Scaling for Architects
MAH34
Architecture Scalin
g
Plot of IPC
Compiler + IPC
1.5x /
g
eneration
Used all known tricks
OOO is old idea
Uses lots of wires
What next?
Wider machines
Threads
Speculation
Guess answers to
create parallelism
Have hi
g
h wire costs
0.00
0.01
0.02
0.03
0.04
0.05
Dec-83Dec-86Dec-89Dec-92Dec-95Dec-98
80386
80486
Pentium
Pentium II
SpecInt95 / MHz
VLSI Scaling for Architects
MAH35
Architecture Scalin
g
Issues
Wire scalin
g
is an issue for architecture
Need to support a old pro
g
ram model
Modest number of
g
lobal resources
Re
g
isters, memor
y
port
To execute in parallel
Need to find the parallelism in instruction stream
Man
y
instruction decoders, needin
g
to communicate
Need to predict instruction stream well
Lar
g
e memor
y
for prediction tables
Need multiple functional units
Need lar
g
e ported central re
g
isterfiles
This will not scale:
Machines are alread
y
startin
g
to use internal clusterin
g
VLSI Scaling for Architects
MAH36
Clock Frequency
Most of performance comes from clock scalin
g
Clock fre
q
uenc
y
double each
g
eneration
Two factors contribute: technolo
gy

(
1.4x/
g
en
)
, circuit desi
g
n
10
100
1000
Dec-83Dec-86Dec-89Dec-92Dec-95Dec-98
80386
80486
Pentium
Pentium II
MH
z
VLSI Scaling for Architects
MAH37
Gates Per Clock
Clock speed has been scalin
g
faster than base technolo
gy
Number of FO4 dela
y
s in a c
y
cle has been fallin
g
Number of
g
ates decrease
1.4x each
g
eneration
Caused b
y
:
Faster circuit families
(
d
y
namic lo
g
ic
)
Better optimization
Better micro-architecture
Better adder/mem arch
All this
g
enerall
y
re
q
uires
more transistors
10.00
100.00
Dec-83Dec-86Dec-89Dec-92Dec-95Dec-98
80386
80486
Pentium
Pentium II
FO4 inverter delays / cycle
VLSI Scaling for Architects
MAH38
Gates Per Clock Limits
Current SOA machines are at 16 FO4
g
ates per c
y
cle
Historical low values
(
Cra
y)
were at this level
Overhead for short tick machines
g
rows rapidl
y
Power
Increases clock power per lo
g
ic function
Latenc
y
Flops are alread
y
10-20% of c
y
cle toda
y
Lo
g
ic reach
g
rows smaller
What fits in a c
y
cle
(
how man
y
bits/
g
ates
)
decreases
Difficult to
g
enerate a clock at less than 8 FO4
g
ates
Continued scalin
g
of
g
ates/clock will be hard
Guessin
g
slope will chan
g
e soon
VLSI Scaling for Architects
MAH39
Performance Scalin
g
Remember processor performance plots used to have two lines
Microprocessors and mainframes
Mainframes had maxed out and improved at technolo
gy
rate
What will happen with microprocessors?
uP
Mainframe
VLSI Scaling for Architects
MAH40
Will Processor Performance Slow Down?
Yes and No
Uniprocessor performance
g
rowth will slow down
Lastest
j
ump is
g
ettin
g
to the 16ish FO4 c
y
cles
People will chan
g
e the benchmarks to fix this problem
More data parallel application
Multi-media / streamin
g
applications
More threaded applications
Explicitl
y
parallel machine scale
q
uite nicel
y
VLSI Scaling for Architects
MAH41
VLSI Scalin
g
Summary
Scalin
g
allows people to build more complex machines
That run faster too
It does not to first order chan
g
e the difficult
y
of module desi
g
n
Module wires will
g
et worse, but onl
y
slowl
y
You dont think to rethink
y
our wires in
y
our adder, memor
y
Or even
y
our super-scalar processor core
It does let
y
ou desi
g
n more modules
Continued scalin
g
of uniprocessor performance is
g
ettin
g
hard
Machines usin
g

g
lobal resources run into wire limitations
Faster clock ticks
(
in FO4
)
is
g
ettin
g
ver
y
hard
Power and lo
g
ical reach issues in
g
oin
g
much under 16 FO4s
Machines will have to become more explicitl
y
parallel