ppt - Zettaflops.org

winkwellmadeUrban and Civil

Nov 15, 2013 (3 years and 6 months ago)

60 views

A. Silver

1

Superconductor Technologies

for

Extreme Computing

Arnold Silver


Workshop on Frontiers of Extreme Computing

Monday, October 24, 2005

Santa Cruz, CA


A. Silver

2

Outline


Introduction


Single Flux Quantum (SFQ) Technology


State
-
of
-
the
-
Art


Prospects


Quantum Computing


Summary


A. Silver

3

Notional Diagram of a Superconductor Processor


Superconductor processors communicate with local cryogenic RAM and with
the cryogenic switch network.


Cryogenic RAM communicates via wideband I/O with ambient electronics.

Superconductor
Processors

Cryogenic
RAM

High Speed Cryogenic Switch Network

Ambient

Electronics

Wideband I/O

4 Kelvin

Introduction

A. Silver

4

Early Technology Limited


Early superconductor logic was voltage
-
latching


Voltage state data


AC power required


Speed limited

by RC load and reset time (~GHz)


Single Flux Quantum (SFQ)

is latest generation.


Current/Flux state data


SFQ pulses transfer data


DC powered


Higher speed (~100 GHz)


Incremental progress on DoD contracts.


Small annual budgets


Focus on small circuit demos


Minimal infrastructure investment


Introduction

A. Silver

5

SFQ Features


Quantum
-
mechanical devices


An “
electronics technology



High

speed

and
ultra
-
low on
-
chip power dissipation


Fastest, lowest power digital logic


≥ 100 GHz clock expected


~ nW/gate/GHz expected


Wideband communication on
-
chip and inter
-
chip


Superconducting transmission lines


Low
-

loss


Low
-
dispersion


Impedance matched



60 GHz data transfer

demonstrated with negligible cross
-
talk

Introduction

Comparison of a 12 GFLOPS SFQ and CMOS chip

40 kgate SFQ chip

50 GHz clock


2 mW

Plus 0.8 W cooling power

2 Mgate CMOS chip

1 GHz clock


80 W

Also requires cooling

A. Silver

6

Some Issues Need To Be Addressed


Present disadvantages


Low chip density and production maturity


Inadequate cryogenic RAM


Cryogenic cooling


Cryogenic
-

ambient I/O


Density and maturity will increase with better VLSI


Promising candidates for

cryogenic RAM


Hybrid superconductor
-
CMOS


Hybrid superconductor
-
MRAM


SFQ RAM


Cryogenics is an enabler for
low power


Options for

wideband I/O exist

Introduction

A. Silver

7

Technology Overview


Basic technology


Josephson tunnel junctions and SQUIDs


SFQ logic gates


SFQ transmitters
-
receivers


Cryogenic memory


Superconducting films produce microstrip and stripline transmission
lines


Zero
-
resistance at dc (no ohmic loss)


Low
-
loss, low
-
dispersion at MMW frequencies


Impedance
-
matched


Wideband


Enabling technologies


Advanced VLSI foundry


Superconducting multi
-
chip modules


Wideband I/O technologies


Optical fiber


Electrical ribbon cable


Cryogenic LNAs

SFQ Technology

A. Silver

8

Comparison of SFQ
-

CMOS Functions

Function

CMOS

SFQ

Basic Switch


Transistor


Josephson tunnel junction (a 2 terminal device)

Data Format


Voltage level


Identical picosecond (current) pulses

Speed Test


Ring oscillator


Asynchronous flip
-
flop, static divider


770 GHz achieved


1,000 GHz expected

Data
Transfer


Voltage data bus


RC delay with power dissipation


“Ballistic” transfer at ~ 100

m/ps in nearly lossless and
dispersion
-
free passive transmission lines (PTL)

Clock
Distribution


Voltage clock bus


Clock pulse regeneration and ballistic transfer at

~ 100

洯ms楮ne~牬礠汯ss汥ls~ndd楳ie牳楯n
-
晲fe偔Ls

Log楣i卷楴ch


Complementary transistor pair


Two
-
junction comparator

Bit Storage


Charge on a capacitor


Current in a lossless inductor

Fan
-
In,

Fan
-
Out


Large


Small

Power


Volt levels


Millivolt levels

Power
Distribution


Ohmic power bus


Lossless superconducting wiring

Noise


≥ 300 K thermal noise


4 K thermal noise that enables low power operation

SFQ Technology

A. Silver

9

Insulator (~1 nm)

q

M~杮整楣⁦楥汤



J

J
C
sin
q
V

h
2
e
f
f

1
2

d
q
dt
h
2
e


0

2
.
07
mv

ps
Josephson Tunnel Junction

I
C

b
c

> 1

I
C

b
c

< 1

Damping Parameter



b
c

2


0
I
C
R
d


R
d
C


SFQ Technology

A. Silver

10

SQUIDs Are Basic SFQ Elements


Combine flux quantization with the non
-
linear Josephson
effects


Store flux quantum or transmit SFQ pulse



2

Li
circ

o

q
JJ
junctions


2

k
;

k

= integer

Double JJ (dc) SQUID

JJ

JJ

Inductor

Input

Flux


0


0

SFQ Technology

A. Silver

11

SFQ Is A Current Based Technology


When (Input + I
bias
) exceeds JJ
critical current I
c
, JJ “flips”,
producing an SFQ pulse.


Area of the pulse is

0
=2.067 mV
-
ps


Pulse width shrinks as J
C

increases


SFQ logic is based on counting
single flux quanta


SFQ pulses propagate along
impedance
-
matched passive
transmission line (PTL) at the speed
of light in the line (~ c/3).



Multiple pulses can propagate in PTL
simultaneously in both directions.

Input

~1mV

~2ps

I
bias

JJ

SFQ Technology

A. Silver

12

SFQ Gates

Data Latch (DFF)


SFQ pulse is stored in a
larger
-
inductance loop


Clock pulse reads out stored
SFQ


If no data is stored, clock
pulse escapes through the
top junction

Clock
Data
“OR” Gate (merger)


Pulses from both inputs
propagate to the output

“AND” Gate


Two pulses arriving
“simultaneously”
switch output junction


DFF in each input
produces clocked AND
gate


PTLs transmit clock and data signals


Average number of junctions per gate is 10

SFQ Technology

A. Silver

13

SFQ Is The Fastest Digital Technology

Toggle Flip
-
Flop


Static

Frequency Divider


Benchmark of SFQ circuit
performance


Maximum frequency scales with J
C



Measured

dc to
446 GHz

static divider


770 GHz

demonstrated in experiment

Picosecond SFQ pulses can encode
terabits per second
.

~1ps

~2mV

SFQ Technology

Static Divider Speed (GHz)

J
C

(kA/cm
2
)

100

1000

1

10

100

300

NGST
-
Nb

NGST
-
NbN

HYPRES

SUNY

A. Silver

14

SFQ Is The Lowest Power Digital Technology


One SFQ pulse dissipates I
C


0

in shunt resistor


For I
C

= 100




㈠砱2
-
19

Joule (~ 1eV)


~ 5 junctions switch in single logic operation


1 nW/gate/GHz


㄰〠n圯条W攠慴‱〰G䡺H

SFQ Technology


Static power dissipation in bias resistors:
I
2
R


For I
C

= 100

A扩慳敤b慴‰⸷~I
C


Typical V
bias

= 2 mV (to maximize bias margin)


140 nW/JJ, 1400 nW/gate is 23 X the dynamic power

V
bias

I
bias

Data

V
bias


Voltage
-
biased
SFQ gates will eliminate
bias resistors and static power dissipation


Self
-
clocked complementary logic


Incorporates clock distribution circuitry


V
bias

=

0
F
Clock

A. Silver

15

SFQ Digital ICs Have Been Developed


First SFQ circuit (~ 1977) was a dc to SFQ converter
integrated with toggle flip
-
flops to form a binary counter.


Extensive development of SFQ logic did not occur until
after 1990.


Advanced SFQ logic was developed on HTMT FLUX.


Architecture


Design tools


LSI fabrication


Logic


High data
-
rate on
-
chip communications


Inter
-
chip communications


Vector registers


Microprocessor logic chip

State
-
of
-
the
-
Art

A. Silver

16

Superconductor IC Fabrication Is Simpler Than CMOS


Oxidized silicon wafers (100
-
mm)

1.
Deposit films (Nb trilayer, Nb wires, resistors, and oxide)

2.
Mask (g
-
line, i
-
line photolithography or e
-
beam)

3.
Etch (dry etch, typical gases are SF
6
, CHF
3

+ O
2
, CF
4
)

4.
Repeated 14 to 15 times


No implants, diffusions, high temperature steps


Trilayer deposition forms Josephson tunnel junction


All layers are deposited
in
-
situ



Al is passively oxidized
in
-
situ

at room temperature


1

洠浩m業i洠晥f瑵牥Ⱐ2⸶.

洠w楲攠i楴ch


Throughput limited by deposition tools

State
-
of
-
the
-
Art

2 nm Al oxide

Tunnel Barrier

8 nm Al

150 nm Nb Base Electrode

Oxide

100 nm Nb Counter Electrode

Legend:

Josephson Junction

Nb
2
O
5

SiO
2

MoN
x
5

⽳/⸠剥獩Rtor



Mo⽁氠‰ㄵ
Ω
/sq. Resistor

Junction Anodization

Silicon Wafer

Wire 1

Wire 2

Wire 3

Ground Plane

Wire 2

A. Silver

17

Cadence
-
based SFQ Design Flow (NGST)

Is similar to Semiconductor Design

Schematic

Layout

DRC

LVS

VHDL

RSFQ Gate Library

Logic Synthesis & Verification

Symbol

VHDL

Structure

Schematic

Layout

Netlist

PCells

LMeter

Malt

WRSpice

Gate

VHDL

Generic

State
-
of
-
the
-
Art

A. Silver

18

Complex Chips Have Been Reported

F
u
n
cti
on
C
om
pl
e
xity
S
p
ee
d
C
e
ll L
i
b
r
a
r
y
Or
g
a
ni
z
a
t
io
n
s
FLU
X-
1
.
8
-
bi
t

P
p
r
oto
ty
pe
.
2
5 3
0
-
bi
t-
dua
l
-o
p
in
st
ru
c
t
i
ons.
6
3
K

Jun
ct
ions
.
1
0
.
3
mm

x
1
0.
6
m
m
.
D
e
si
g
ned

f
o
r
2
0
GH
z
.
No
t

te
s
t
e
d
.
Y
e
s
.
In
c
orpo
r
a
te
s
d
r
iv
e
r
s
/r
e
c
e
iv
e
r
s f
or
P
TL.
No
rt
hrop

Grum
m
an,
St
ony
B
roo
k
,
J
PL
CO
R
E
1

1
0
.
8
-
b
i
t bi
t-
se
r
i
a
l

P
.
7

8
-
bi
t

i
ns
t
ru
ct
ions
.
7

K

J
unc
t
ion
s
.
3
.
4

m
m
x
3.
2
m
m
.
2
1
GH
z

l
oc
a
l
c
l
oc
k
.
1

GH
z
s
y
s
te
m
c
loc
k
.
Ful
l
y

f
un
c
t
i
onal
.
Y
e
s
.
Ga
te
s
c
onne
ct
ed

by
J
T
Ls

a
nd/o
r

P
TLs
I
S
T
E
C
-S
RL,
N
a
goy
a
U
.
,
Y
okoh
a
m
a
N
a
t
i
onal U.
M
AC
a
nd
P
r
ef
il
te
r
f
o
r
p
r
ogr
a
mm
a
ble

pa
s
s
-
b
a
nd A/
D
co
n
ve
rt
e
r
.
6

K–
1
1

K
J
unc
t
ion
s
.
5

mm

x 5
mm
.
2
0
GH
z

d
es
i
gn
Y
e
s
.
Ga
te
s
c
onne
ct
ed

by
p
a
ra
m
e
t
e
r
i
z
ed

J
T
Ls
a
n
d/or
P
TLs
No
rt
hrop

Grum
m
an
A
/
D

c
onv
e
r
t
e
r
6

K

J
unc
t
ion
s
.
1
9
.
6
GH
z
.
?
H
y
pr
e
s
D
i
gi
t
al

r
e
ce
i
v
e
r
1
2
K

Jun
ct
ions
.
1
2
GH
z
.
?
H
y
pr
e
s
FIFO

bu
f
f
e
r

me
m
or
y
4
K
bi
t
.
2
.
6

m
m
x
2.
5
m
m
3
2
bi
ts

t
e
s
t
e
d
a
t
4
0
G
H
z
.
No
No
rt
hrop

Grum
m
an
X-
ba
r
swi
tc
h
1
2
8

x
1
2
8
sw
i
t
c
h.
3
2 x
3
2
modu
l
e.
2
.
5

Gbp
s
.
No
N
S
A
,
No
rt
hrop
G
r
umm
a
n
S
FQ
X-
ba
r
swi
tc
h
3
2 x
3
2
modu
l
e.
4
0
Gbp
s
.
No
No
rt
hrop

Grum
m
an
State
-
of
-
the
-
Art

A. Silver

19

FLUX
-
1 Microprocessor Chip


Objective to demonstrate of 5K Gate
SFQ chip operating at 20 GHz


8
-
bit microprocessor design


1
-
cm chip


8
-

20 Gb/s transmitters, receivers


FLUX
-
1 chip redesigned, fabricated,
partially tested


1.75

洬㐠歁⽣4
2

junction Nb
technology


20 GHz internal clock


5 GByte/sec inter
-
chip data transfer
limited by

慲捨楴散瑵牥


Scan path diagnostics included


63 K junctions, 5 Kgate equivalent


Power dissipation
~

㤠浗m@‴㕋


40 GOPS peak computational
capability (8
-
bits @ 20
-
GHz clock)


Fabricated in TRW 4 kA/cm
2

process
in 2002

8
-
20 Gb/s receivers

8
-
20 Gb/s transmitters

State
-
of
-
the
-
Art

A. Silver

20

60 GHz Interconnect Demonstrated


MCM Nb stripline wiring is low loss, wideband


High density, low impedance solder bump arrays


Ultra
-
low power driver
-
receiver enables high data
rate communications


SFQ data format enables multiple bits in
transmission line simultaneously, increases
throughput


Demonstrated to 60 Gb/s through 2 solder bumps,
4


r敳楳i潲Ⱐ慮搠4


瑲慮獭楳獩潮汩湥猠潮⁣桩瀠
and MCM


Timing errors produced BER floor above 30 Gb/s

Chip
-
to
-
MCM Pad Optimization

-
3

0

0

50

100

150

200

Frequency (GHz)

S12 (dB)

Passive MCM

chip 1

Active circuitry

on chip

Micro
-
strip

Interconnect

s

g

g

s

g

g

chip 2

100

洠p~dⰠ100

洠sp~ce



G

G

G

G



䍨Cp
-
s楤e

浩捲os瑲楰

䵃M
-
s楤e

浩捲os瑲楰

1


-



-



-



-



-



-



-



-



-


1e
-
10

1e
-
11

1e
-
12

-
20

0

20

40

60

80

100

120

140

PRN Bit
-
error Rate

Receiver Bias Current (µA)

60

50

40

30

20

10

Measured Bit
-
error Rate

State
-
of
-
the
-
Art

A. Silver

21


Low power


Low fan
-
out, need
“pulse splitting”
:


JTL provides
current amplification


Amplified pulse can drive two JTLs


All connections are point
-
to
-
point


Fast, large RAM is hard to make


High speed


No global clock


Clock and data pulses are considered to be the same


Need to consider asynchronous/delay insensitive/self
-
timed/micropipelined


On
-
chip latencies

can reach many clock cycles


10 ps clock period in PTL corresponds to 2 mm length


Pulse splitting adds latency


On the

cutting edge


No truly automated place
-
and
-
route yet


Off
-
the
-
shelf CAD tools need to be heavily customized


Efficient gate library approach has to be refined


Requirement for wideband I/O to ambient RAM

SFQ Faces Challenges of 100+ GHz Technologies

I
C
=141

A

I
C
=100

A

I
C
=100

A

Prospects

A. Silver

22

Improved Chip Performance Feasible


Improve parameters by orders
-
of
-
magnitude

+
Increase junction and gate density

+
Increase clock frequency

+
Increase junction speed to 1,000
GHz by increasing J
C

≥ 100 kA/cm
2

+
Increase chip yield


Reduce power dissipation to SFQ
switching dissipation level


Reduce bias current


Establish foundry following
CMOS practice


Lithography at 250
-
180 nm; 90
-
60 nm


J
C

>20 kA/cm
2
; ≥100 kA/cm
2


Add superconducting layers 7
-
9; >20


Vertically separate power and data
transmission from gates


Achieve ≥1M junctions/cm
2

(≥10
5

gates);

100
-
250M junctions/cm
2

(10
-
25M gates)


Increase clock to 50 GHz; ≥100 GHz


Improve CAD tools and methods


May need to improve physical models
for junctions with higher J
C


Shorten development time


Prospects

A. Silver

23

Density Is Increased by Adding Wiring Layers

Fully
-
Planarized, 6
-
Metal
Process (Proposed by

ISTEC
-
SRL, Japan,
Nagasawa et

al., 2003)

IBM 90
-
nm Server
-
Class
CMOS process

Prospects


More metal layers are essential to increase
chip density


Vertically isolate power and communications
lines from active devices


Superconducting ground planes are excellent
shields


Full planarization and competitive lithography

A. Silver

24

SFQ Technology Projections

Before 2004

2010

Beyond 2010

Technology Projections

Technology Node

1

m

㈵〠
-

180m

90洠m爠re瑴敲

䍵牲en琠䑥ns楴y

8䄯捭
2

50 kA/cm
2

> 100 kA/cm
2

Superconducting Layers

4

7
-

8

~ 20

New Process Elements

NA

Full Planarization


Alternate barriers


Additional junction trilayers


Vertical resistors and inductors

Power

I
C
V
bias

Reduced Bias Voltage


CMOS
-
like


Reduced I
C

Projected Chip Characteristics

Junction Density

60 k/cm
2

2
-

5 M/cm
2

100
-
250 M/cm
2

Clock Frequency

< 20 GHz

50
-

100 GHz

100
-

250 GHz

Power

0.2

W⽊/nc瑩on

8W⽇䡺⽊畮c瑩on

0⸴W⽇䡺⽊畮c瑩on

Prospects

Increased Clock Frequency

Increased Density

Process Improvement


Smaller junction with higher J
C


Smaller line pitch


Greater vertical integration

Benefits


Faster circuits


Larger signals


More gates/cm
2


Reduced on
-
chip latency

Potential Disadvantages


Possibly larger spreads


Increased system latency


Potentially lower yield

Latency is measured in clock ticks

A. Silver

25

Gate Access Within Clock Period Is Important


Clock radius (
R
CL
) is
maximum distance data
can travel within a clock
period.


N
CL

is number of gates
within a clock radius.


Clock radius is limited by
time
-
of
-
flight and the
clock frequency.


Increasing gate density is
essential to increasing
effectiveness.

R
CL

N
CL

Prospects

A. Silver

26

Density Is
Key To Gate
Access

Clock

(GHz)

25

50

100

200

250

Clock Radius

(mm)

4

2

1

0.5

0.4

Clock Area

(mm
2
)

50

12.6

3.14

0.79

0.5

Density

(JJs/cm
2
)

Density

(Gates/mm
2
)

Number of Gates Within Clock Radius (N
CL
)

5 K

5

250

63

16

4

2.5

60 K

60

3 K

750

190

47

30

1 M

1 K

50 K

13 K

3.1 K

790

500

5 M

5 K

250 K

63 K

16 K

4 K

2.5 K

30 M

30 K

1.5 M

380 K

94 K

24 K

15 K

100 M

100 K

5 M

1.3 M

310 K

79 K

50 K

250 M

250 K

12.5 M

3.1 M

790 K

200 K

130 K

Clock radius assumed to be 1/2 of time
-
of
-
flight.

Prospects

A. Silver

27

High
-
End SFQ Computing Engine

2005


Not feasible

~

100 chips per processor


0.5 M processor chips, ~ 10
9

gates


2010


~

10 chips per processor


40 K processor chips, ~ 10
9

gates


After 2010


~

10 to 20 processors per chip

400 processor chips, including embedded memory

Prospects

A. Silver

28

Applications to Quantum Computing


Quantum computing is being investigated
using superconducting qubits.


Flux
-
based superconducting qubits are
physically similar to SFQ devices.


SFQ circuits are best candidates to
control/read superconducting qubits at
millikelvin temperatures.

SFQ and Quantum Computing

A. Silver

29


SFQ needs major engineering development in
chip technology if it is going to be a player in
high
-
end computing.


The engineering requirements are understood
and a development plan defined.


Prospects are exciting and achievable.

Summary

Summary