pptx - Computer Sciences Department

surprisesameΗμιαγωγοί

1 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

101 εμφανίσεις



21st Century

Computer Architecture


A community white
paper

http://
cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf


Technion
, Haifa Israel, June 2013



Information &
Commun
. Tech’s Impact


Semiconductor Technology’s Challenges


Computer Architecture’s Future


Example: Bypassing Paged Virtual Memory


White
Paper
Participants


“*”
contributed prose; “**” effort
coordinator

Thanks of CCC, Erwin
Gianchandani

& Ed
Lazowska

for
guidance and Jim
Larus

& Jeannette Wing for feedback

2

Sarita

Adve
, U Illinois *

David H. Albonesi, Cornell U

David Brooks, Harvard U

Luis
Ceze
, U Washington *

Sandhya

Dwarkadas
, U Rochester

Joel
Emer
, Intel/MIT

Babak

Falsafi
, EPFL

Antonio
Gonzalez
, Intel/UPC

Mark D. Hill, U Wisconsin *,**

Mary Jane Irwin, Penn State U *

David
Kaeli
, Northeastern U *

Stephen W.
Keckler
, NVIDIA/U Texas

Christos
Kozyrakis
, Stanford U

Alvin
Lebeck
, Duke U

Milo Martin, U Pennsylvania

José F.
Martínez
, Cornell U

Margaret
Martonosi
, Princeton U *

Kunle

Olukotun
, Stanford U

Mark
Oskin
, U Washington

Li
-
Shiuan

Peh
, M.I.T.

Milos
Prvulovic
, Georgia Tech

Steven K. Reinhardt, AMD

Michael Schulte, AMD/U Wisconsin

Simha

Sethumadhavan
, Columbia U

Guri

Sohi
, U Wisconsin

Daniel
Sorin
, Duke U

Josep

Torrellas
, U Illinois *

Thomas F.
Wenisch
, U Michigan *

David Wood, U Wisconsin *

Katherine
Yelick
, UC Berkeley/LBNL *

20
th

Century ICT Set Up


Information & Communication Technology (ICT)

Has Changed Our World

o
<long list omitted>




Required innovations in algorithms, applications,
programming languages, … , & system software



Key (invisible) enablers (cost
-
)performance gains

o
Semiconductor technology (“Moore’s Law”)

o
Computer architecture (~
80x
per
Danowitz

et al
.)

3

Enablers: Technology + Architecture

4

Danowitz

et al., CACM 04/2012,
Figure
1


Technology

Architecture

21
st

Century Promise


ICT Promises Much
More

o
D
ata
-
centric personalized health care

o
Computation
-
driven scientific discovery

o
H
uman network analysis

o
Much more: known & unknown



Characterized by

o
Big Data

o
Always Online

o
Secure/Private

o



Whither enablers of future
(cost
-
)performance
gains?

5

Technology’s Challenges 1/2

Late 20
th

Century

The New Reality

Moore’s Law


2
×

transistors/chip

Transistor count still 2
×

BUT…

Dennard

Scaling

~constant power/chip

Gone.
Can’t repeatedly

double
power/chip

6

Classic CMOS Dennard Scaling:


the Science behind Moore’s Law


7

National Research Council (NRC)


Computer Science and Telecommunications Board (CSTB.org)

Scaling:

Oxide:

t
OX
/
a

Results:

Power Density:

Voltage:

V/
a

Power/ckt:

1/
a
2

~Constant

(Finding 2)

Source: Future of Computing Performance:
Game Over or Next Level?,

National Academy Press, 2011

Power Density:

~Constant

Post
-
classic CMOS
Dennard

Scaling

8

National Research Council (NRC)


Computer Science and Telecommunications Board (CSTB.org)

Scaling:

Oxide:

t
OX
/
a

Results:

Voltage:

V/
a

V

Power/ckt:

1

a
2

1/
a
2

Post Dennard CMOS Scaling Rule

TODO:

C
hips w/ higher power (no), smaller
(

)
,
dark silicon
(

)
, or other (?)

Technology’s Challenges 2/2

Late 20
th

Century

The New Reality

Moore’s Law


2
×

transistors/chip

Transistor count still 2
×

BUT…

Dennard

Scaling

~constant power/chip

Gone.
Can’t repeatedly

double
power/chip

Modest (hidden)

transistor unreliability

Increasing
t
ransistor unreliability
can’t be hidden

Focus on computation
over communication

Communication (energy)

more
expensive
than computation

1
-
time

costs amortized
via mass market

One
-
time cost
much worse

&

want
specialized

platforms

9

How should architects step up as technology falters?

21
st

Century Comp Architecture

20
th

Century

21
st

Century



Single
-
chip
in

generic
computer

Architecture as
Infrastructure
:

Spanning

s
ensors
to clouds

Performance
plus security, privacy,
availability, programmability, …





Cross
-
Cutting:


Break
current
layers with
new
interfaces

Performance
via invisible
instr.
-
level
parallelism

Energy
First


Parallelism


Specializati潮


Cr潳s
-
layer design

Predictable
techn潬潧ies

CMOS, DRAM,
& disks

New
technologies

(
non
-
volatile
memory,
near
-
threshold,

3D,
photonics,
…) Rethink:
memory &
storage, reliability, communication

10

X

X

What Research Exactly?


R
esearch
areas
in white paper (& backup slides)

1.
Architecture as Infrastructure: Spanning Sensors to Clouds

2.
Energy First

3.
Technology Impacts on Architecture

4.
Cross
-
Cutting Issues &
Interfaces



Much more
r
esearch developed by future PIs!



E.g.:
Efficient
Virtual Memory
for
Big Memory
Servers

o
Basu
,
Gandhi
,
Chang
,
Hill
,
& Swift [ISCA 2013]

o
Big Memory: graph500
,
memcached
,
databases

o
Self
-
manage most memory (e.g.,
bufferpool
)


12

10/5/12

13

Execution Time
O
verhead: TLB Misses

1.
Significant waste

2.
Larger memory?

3.
Byte
-
addr

NVM?

Lower is better

Hardware: Direct Segment

OFFSET

BASE LIMIT

VA

Conventional
P
aging

P
A

1

2

Direct Segment

Why Direct Segment?


Matches Big Memory Workload needs


NO Paging => NO TLB Miss

Execution Time Overhead: TLB
M
isses

10/5/12

15

92
-
100% TLB “misses” to direct segment

Requires:
B
oth

small SW + small HW changes



21st Century

Computer Architecture


A community white
paper

http://
cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf


Technion
, Haifa Israel, June 2013



Information &
Commun
. Tech’s Impact


Semiconductor Technology’s Challenges


Computer Architecture’s Future


Example: Bypassing Paged Virtual Memory


Back Up Slides


Detailed
research areas
in white paper

1.
Architecture
as Infrastructure: Spanning Sensors to
Clouds

2.
Energy First

3.
Technology
Impacts on
Architecture

4.
Cross
-
Cutting
Issues &
Interfaces

http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf





Findings on National Academy
“Game Over”
Study



Glimpse at DARPA/ISAT Workshop
“Advancing
Computer Systems without Technology Progress



19

1. Architecture
as Infrastructure:
Spanning Sensors to Clouds


Beyond a chip in a generic computer


To pillar
of
21
st

century
societal infrastructure.

o
Computation
in context
(sensor
, mobile,
…,
data center)

o
Systems often large & distributed

o
Communication issues can dominate computation

o
Goals
beyond performance
(battery
life, form factor
)



Opportunities (not exhaustive)

o
Reliable sensors harvesting (intermittent) energy

o
Smart phones to Star Trek’s medical “
tricorder


o
Cloud infrastructure suitable for both “Big Data” streams

& low
-
latency qualify
-
of
-
service with stragglers

o
Analysis & design tools that scale

20

2. Energy
First


Beyond
single
-
core performance computer


To
(cost
-
)performance per watt/joule



Energy across the layers

o
Circuit/technology (near
-
threshold CMOS, 3D stacking)

o
Architecture (reducing unnecessary data movement)

o
Software (communication
-
reducing algorithms)


Parallelism to save energy

o
V
ast (fined
-
grained) homogeneous & heterogeneous

o
Improved SW stack

o
Applications focus (beyond graphic processing units)


Specialization for performance & energy efficiency

o
Abstractions for specialization (reducing 1
-
time cost)

o
Energy
-
efficient memory hierarchies

o
Reconfigurable logic structures

21

3. Technology
Impacts on Architecture


Beyond
CMOS, Dram, & Disks of last 3+ decades to



Using replacement circuit technologies

o
Sub/near
-
threshold
CMOS, QWFETs, TFETs, and
QCAs


Non
-
volatile storage

o
Beyond flash memory to STT
-
RAM
, PCRAM,
&
memristor


3D die stacking & interposers

o
logic, cache, small main memory


Photonic interconnects

o
Inter
-

& even intra
-
chip


Design automation

o
from circuit
-
design w/ new technologies to

o
pre
-
RTL functional, performance, power, area modeling of
heterogeneous chips & systems



22

4. Cross
-
Cutting
Issues & Interfaces


Beyond
performance w/ stable interfaces to



New design goals (for pillar of societal infrastructure)

o
Verifiability (bugs kill)

o
R
eliability (“dependability” computing base?)

o
Security/Privacy (w/ non
-
volatile memory?)

o
Programmability (time to correct
-
performant

solution)



Better Interfaces

o
High
-
level information (quality of service, provenance)

o
Parallelism
((
in)dependence, (lack of) side
-
effects)

o
Orchestrating communication ((recursive) locality)

o
Security/Reliability (fine
-
grain protection)


23

Executive summary
(Added to National Academy Slides)


Highlights of National Academy Findings

(F1) Computer hardware has transitioned to multicore

(F2)
Dennard

scaling of CMOS has broken
down

(F3) Parallelism
and locality must be exploited by
software

(F4) Chip
power will soon limit multicore
scaling


Eight recommendations from algorithms to education



We
know
all of this at some
level, BUT:

A
re
we all acting on this knowledge or hoping for business as usual
?

Thinking beyond next paper to where future value will be created?


Questions Asked but Not Answered Embedded in NA Talk


Briefly Close with
Kübler
-
Ross Stages of Grief:


Denial





Acceptance

Source: Future of Computing Performance: Game Over or Next Level?,

National Academy Press, 2011

Mark Hill talk (
http://www.cs.wisc.edu/~markhill/NRCgameover_wisconsin_2011_05.pptx
)

The Graph

25

System Capability (log)

8
0s

90s

0
0s

10s

2
0s

3
0s

40s

Fallow Period

5
0s

Source
: Advancing Computer Systems without Technology Progress,

ISAT
Outbrief

(http://www.cs.wisc.edu/~markhill/papers/isat2012_ACSWTP.pdf)

Mark
D. Hill and Christos
Kozyrakis
, DARPA/ISAT
Workshop, March 26
-
27, 2012
.


Approved
for Public Release, Distribution
Unlimited

The
views expressed are those of the author and do not reflect the official policy or position of the
Department of Defense or the U.S. Government.


Surprise 1 of 2


Can Harvest in the “Fallow” Period!



2
decades
of Moore’s Law
-
like
perf
./
energy
gains



Wring out inefficiencies used to harvest Moore’s Law


HW/SW
Specialization/Co
-
design
(
3
-
100x)

Reduce SW Bloat (2
-
1000x)

Approximate Computing (2
-
500x)

---------------------------------------------------

~1000x = 2 decades of Moore’s Law!



26

“Surprise” 2 of 2



Systems must exploit LOCALITY
-
AWARE parallelism



Parallelism Necessary, but not Sufficient


As communication’s energy costs dominate



Shouldn’t be a surprise, but many are in denial



Both surprises hard
,
requiring
“vertical cut” thru SW/HW



27