Chip Multi-Threading (CMT) Era

lightnewsSoftware and s/w Development

Nov 18, 2013 (3 years and 11 months ago)

101 views

Creative Commons Attribution-Share 3.0 United States License

1
www.opensparc.net

Creative Commons Attribution-Share 3.0 United States License

David Weaver
Principal Engineer, UltraSPARC Architecture
Principal Evangelist, OpenSPARC
Microelectronics
Sun Microsystems
Chip Multi-Threading
(CMT) Era
Creative Commons Attribution-Share 3.0 United States License

2
www.opensparc.net
1
10
1 0 0
1 0 0 0
1 0 0 0 0
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
Pe
r
for
mance (vs
.
VAX-
1
1
/7
8
0
)


25%/year
52%/year
??%/year
Uniprocessor Performance (SPECint)


VAX
: 25%/year 1978 to 1986


RISC + x86: 52%/year 1986 to 2002


RISC + x86: ??%/year 2002 to present
From Hennessy and Patterson’s
Computer Architecture: A Quantitative Approach
, 4th edition, 2006

Sea change” in chip
design has arrived:
multiple processor
cores per chip, and/or
multiple virtual
processors per core
3X
Source: David Patterson presentation at
MultiCore Expo, March 2006
Creative Commons Attribution-Share 3.0 United States License

3
www.opensparc.net

Hitting walls” in Processor Design

Clock frequency

frequency increases tapering off in new
semiconductor processes, leakage and wire load

high frequencies =>
power
issues

Processor designs for high single-thread performance
are becoming
highly
complex

expense and/or time-to-market suffer

verification increasingly difficult

more complexity => more circuitry => increased
power ... for diminishing performance returns

Memory latency (not instruction execution speed)
dominating most application times
Creative Commons Attribution-Share 3.0 United States License

4
www.opensparc.net
Memory Bottleneck
Relative
Performance
10000

1
1990

1995

2005

1980
1000
100
10

1985

2000
Gap
CPU Frequency
DRAM Speeds
C
PU -- 2x Every 2 Y
ea
rs
DRAM -- 2x Every 6 Y
ears
Source: Sun World Wide Analyst Conference Feb. 25, 2003
5
www.opensparc.net
Creative Commons Attribution-Share 3.0 United States License

Single Threaded
Performance
Single Threading
Thread
Memory Latency
Compute
Time
C
C
C
Typical Utilization of
Processor:15

25%
M
M
M
Up to 85% Cycles Spent Waiting for Memory
6
www.opensparc.net
Creative Commons Attribution-Share 3.0 United States License

Multi-threaded
Performance
Hardware Multi-Threading (HMT)


Utilization: Up to 85%*
C
M
C
M
C
M
Thread 1
Time
C
M
C
M
C
M
C
M
C
M
C
M
C
M
C
M
C
M
Thread 2
Thread 3
Thread 4
* based on example of UltraSPARC T1
Memory Latency
Compute
7
www.opensparc.net
Creative Commons Attribution-Share 3.0 United States License

Chip Multi-
T
h
r
e
a
d
i
n
g
(CMT)
CMP
(Chip MultiProcessing,
a.k.a. “multicore”)
HMT
(Hardware
Multithreading)
CMT
(Chip
MultiThreading)
n
cores per processor
m
threads per core
n
x
m
threads per processor
Creative Commons Attribution-Share 3.0 United States License

8
www.opensparc.net
Why CMT Works
Goal: “100% Resource Utilization”

(given a fixed maximum die size)
20%
Maximum die size
Size of Each Core

SPARC T1:
4
threads per core


Increases core die area by ~20%


Improves performance by 50

100%
0.5
1
10
2
Single-Core, Multi-Thread
Multi-Core, Multi-Thread
Single-Core, Single-Thread
Relative Performance on th
r
e
ad-

rich, memory-bound workloads
Creative Commons Attribution-Share 3.0 United States License

9
www.opensparc.net
CMT Effect on Efficiency – an example
Source: Computer Architecture, 4
th
edition, John Hennessy & David Patterson
Creative Commons Attribution-Share 3.0 United States License

10
www.opensparc.net
Major shift in processor design

FROM

single-thread performance


ever-increasing clock rate

IPC
(e.g. superscalar, out-of-order)
and ILP

(high power consumption)

cross-CPU communication through bus/memory

running a single OS

TO

multi-threaded performance

high thread count (TLP)

high throughput

high efficiency (performance/power)

high inter-CPU(strand) bandwidth

virtualization and multiple guest OSs
Creative Commons Attribution-Share 3.0 United States License

11
www.opensparc.net
The CMT Wave Has Begun

Every
manufacturer is designing multi
-
core (CMP)
and/or chip multi-threaded (CMT) processors

Sun
(CMT)


IBM
(CMT)

Intel
(CMP)

AMD
(CMP)

...

even
embedded processor manufacturers
12
www.opensparc.net
Creative Commons Attribution-Share 3.0 United States License

The Tidal Wave of CMT is Building
2003
2004
2005
2006
2007
0
10
20
30
40
50
60
70
2
2
4
4
4
1
2
32
32
64
1
1
2
2
4
1
1
2
2
4
Threads per Processor (chip)
IBM Power
Sun UltraSPARC
Intel x86
AMD x64
2C 2T
8C
4T
8
C
8T
8C
4T
2C 2T
2C 1T
2C 1T
2C 1T
2C 1T
2C 1T
2C 2T
2C 1T
2C 1T
4C 1T
4C 1T
Creative Commons Attribution-Share 3.0 United States License

13
www.opensparc.net
Instruction-level
Parallelism
Thread-level
Parallelism
Instruction/Data
Working Set
Data Sharing
Low
Low
Low
Low
Medium
High
High
High
High
High
High
Large
Large
Medium
Large
Large
Low
Medium
High
Medium
High
Medium
But... is
Software
Ready for CMT?
Creative Commons Attribution-Share 3.0 United States License

14
www.opensparc.net
Operating Systems Playing “Catch up”

A
tiny handful of
Operating Systems
*
scale well to
hundreds of threads

generally, those previously used for 100+ processor SMPs

Most only scale up to a few (4-8) threads

generally, those previously targeted at desktop systems
* including Solaris
Creative Commons Attribution-Share 3.0 United States License

15
www.opensparc.net
Opportunities for Compilers

Improving auto-parallelization

to automatically fork threads to take advantage of
CMT

Need more work on both

totally automatic parallelization

parallelization with directives (e.g. OpenMP)
Creative Commons Attribution-Share 3.0 United States License

16
www.opensparc.net
Applications Playing “Catch up”

Application software is generally
waaaay
behind the
CMT curve

Good
news:
many Java apps are inherently multi-threaded

Mediocre
news:
smarter compilers will help many apps

Bad
news:

some apps require
rewriting
to perform well in the
CMT age

most programmers aren’t used to thinking in terms
of executing concurrent threads
Creative Commons Attribution-Share 3.0 United States License

17
www.opensparc.net
Academic Curricula Opportunies

Train students in software implications of CMT on

operating system design

compiler/tools design

application design

Train processor architects on
real-world
tradeoffs

performance/complexity vs. power consumption

performance vs.
time to market!

additional performance
only
worthwhile if it can be
implemented quickly enough

1 month delay trades away ~5% of performance

Verification
takes
twice
the time/effort/$ of
design

so make the design easier to verify
Creative Commons Attribution-Share 3.0 United States License

18
www.opensparc.net

Creative Commons Attribution-Share 3.0 United States License

David Weaver
Principal Engineer, UltraSPARC Architecture
Principal Evangelist, OpenSPARC
Microelectronics
Sun Microsystems
Chip Multi-Threading
(CMT) Era