Multicore Processors (5)

cavalcadejewelSoftware and s/w Development

Nov 18, 2013 (3 years and 7 months ago)

139 views

Dezső Sima



Spring 2008

(Ver. 2.1)



Dezső Sima, 2008

Multicore Processors (5)



10.3.1 POWER line



10.3.2 Cell BE

10.3 IBM’s MC processors

10.3 IBM’s MC processors



POWER4

180

nm

10
/200
1



POWER4+

130

nm

11
/200
2

10.3.1 POWER line



POWER5

130

nm


5
/200
4



POWER5+


90

nm

10
/200
5



POWER6


65

nm


5/
200
7

Figure: The evolution of IBM’s major RISC lines

92
93
94
95
96
97
98
99
91
90
02
03
01
00
04
05
89
88
OS/400
Commercial computing
IMPI/48
AIX
Technical computing
PowerPC/32
PowerPC AS/64
PowerPC/64
POWER/32
A10
A30
A50
Pulsar
SStar
601
604
604e
POWER
POWER2
Power3
Power3-II
P2SC
AS/400 e-Server iSeries
RS/6000 e-Server pSeries
(Scalar CISC)
(~2.G. superscalar)
(~1.G. superscalar)
(3.G. superscalar)
(3.G. superscalar)
(1.-2.G. superscalar)
Upwards binary compatible extension
Transition
Derived from
Northstar
SStar
POWER4
POWER5
PowerPC/64 ext.
PowerPC AS/64 ext.
(1.G. superscalar)
PSC
AS/400-line
06
07
POWER4+
POWER5+
POWER6
10.3.1 Evolution of IBM’s major RISC lines

Figure : POWER4 chip logical view [3.6]

10.3.1 POWER4 (1)

Built
-
In
-
SelfTest


Service Processor

Power On Reset

Core interface Unit

(crossbar)

Non
-
Cacheable

Unit

MultiChip Module

Figure: Logical view of the L3 controller [3.5]

10.3.1 POWER4 (2)

Figure: The memory cotroller of the POWER4 [3.5]

10.3.1 POWER4 (3)

Figure: I/O controller of the POWER4 [3.5]


Fabric

Controller

10.3.1 POWER4 (4)

Figure: POWER4 chip [3.11]

10.3.1 POWER4 (5)

10.3.1 POWER4 (6)

Table: Main features of IBM’s dual
-
core POWER line

Off
-
chip

Mem. contr.

L3

L2

1.44 MB/shared

Size/allocation

On
-
chip

Implementation

32 MB

Size

32 MB

Tags on
-
chip

SCM
1
/MCM
2

115/125

Tags on
-
chip, data off
-
chip

1.3

174 mtrs

412 mm
2

180 nm

10/2001

DC

POWER4

L3 size

L3 impl.

Power management

Dual threaded

Packaging

TDP [W]

Implementation

f
c

[GHz]

Nr. of transistors

Die size

Technology

Introduced

Dual/Quad
-
Core

POWER line

1

SMC: Single Chip Module

2

MCM: Multi Chip Module

3

DCM: Dual Chip Module

4

DCM: Dual Core Module

5

QCM: Quad Core Module

6

DPM: Dynamic Power Management

10.3.2 POWER4+ (1)

Figure: New features of the POWER5+ [3.3]

10.3.1 POWER4+ (2)

Table: Main features of IBM’s dual
-
core POWER line

On
-
chip

Off
-
chip

Mem. contr.

L3

L2

1.5 MB/shared

1.44 MB/shared

Size/allocation

On
-
chip

On
-
chip

Implementation

32 MB

32 MB

Size

SCM
1
/MCM
2

70

1.7

184 mtrs

380 mm
2

130 nm

11/2002

DC

POWER4+

32 MB

Tags on
-
chip

SCM
1
/MCM
2

115/125

Tags on
-
chip, data off
-
chip

1.3

174 mtrs

412 mm
2

180 nm

10/2001

DC

POWER4

L3 size

L3 impl.

Power management

Dual threaded

Packaging

TDP [W]

Implementation

f
c

[GHz]

Nr. of transistors

Die size

Technology

Introduced

Dual/Quad
-
Core

POWER line

1

SMC: Single Chip Module

2

MCM: Multi Chip Module

3

DCM: Dual Chip Module

4

DCM: Dual Core Module

5

QCM: Quad Core Module

6

DPM: Dynamic Power Management

Figure 5.14: Contrasting POWER4 and POWER5 system structures [3.1]

10.3.1 POWER5 (1)

(Exclusive L3)

Figure: Block diagram of the POWER5 (1) [3.1]

10.3.1 POWER5 (2)

Figure: Block diagram of the POWER5 (2) [3.12]

10.3.1 POWER5 (3)

10.3.1 POWER5 (4)

Figure: Floorplan of the POWER5 [3.13]

POWER4

POWER5

180 nm, 412 mm
2

130 nm, 389 mm
2
(~3 % enlarged)

10.3.1 POWER5 (6)

Figure: Contrasting the floor plans of the POWER4 and POWER5 dies [3.11], [3.13]

Figure: Packaging alternatives of the POWER4/5 processors

Source: Partridge R. and Ghatpande S., IBM Introduces POWER5+ and Quad
-
Core Modules in System p5,”


Tech Trends Monthly, Nov./Dec. 2005,

POWER5
+

Dual
-
Core Module

10.3.1 POWER5 (7)

POWER4 MCM Photo

32
-
way System Showing 4 MCMs and L3 Cache













Figure: Quad

Chip POWER4 module (MCM) and a 32
-
way POWER4 system [3.7]

10.3.1 POWER5 (8)

Figure: Photos of Dual
-
Chip Modules (DCMs) and Multi
-
Chip Modules (MCM) of the POWER5 [3.7]

10.3.1 POWER5 (10)

Figure: The Multi
-
chip module of the POWER5 [3.10]

10.3.1 POWER5 (11)

10.3.1 POWER5 (12)

Table: Main features of IBM’s dual
-
core POWER line

On
-
chip

On
-
chip

Off
-
chip

Mem. contr.

L3

L2

1.9 MB/shared

1.5 MB/shared

1.44 MB/shared

Size/allocation

On
-
chip

On
-
chip

On
-
chip

Implementation

36 MB

32 MB

32 MB

Size

36 MB

Tags on
-
chip

DPM
6

DCM
3
/MCM
2

80 (est)

1.65/1.9

276 mtrs

389 mm
2

130 nm

5/2004

DC

POWER5

SCM
1
/MCM
2

70

1.7

184 mtrs

380 mm
2

130 nm

11/2002

DC

POWER4+

32 MB

Tags on
-
chip

SCM
1
/MCM
2

115/125

Tags on
-
chip, data off
-
chip

1.3

174 mtrs

412 mm
2

180 nm

10/2001

DC

POWER4

L3 size

L3 impl.

Power management

Dual threaded

Packaging

TDP [W]

Implementation

f
c

[GHz]

Nr. of transistors

Die size

Technology

Introduced

Dual/Quad
-
Core

POWER line

1

SMC: Single Chip Module

2

MCM: Multi Chip Module

3

DCM: Dual Chip Module

4

DCM: Dual Core Module

5

QCM: Quad Core Module

6

DPM: Dynamic Power Management

Source: Vetter S. et al., IBM System p5 Quad
-
Core Module Based on POWER5+ Technology,” Redbooks paper,


IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf

Figure: Block diagram of the POWER5+

10.3.1 POWER5+ (1)

Figure.: Interpretation of Dual
-
Chip Modules (DCMs) and Multi
-
Chip Modules (MCM) of the POWER5 [3.7]


10.3.1 POWER5 (9)

Figure: Dual
-
Core Modules (DCMs) and Quad
-
Core Modules (QCM) of the POWER5+ [3.14]

10.3.1 POWER5+ (2)

10.3.1 POWER5+ (3)

Table: Main features of IBM’s dual
-
core POWER line

On
-
chip

On
-
chip

On
-
chip

Off
-
chip

Mem. contr.

L3

L2

1.9 MB/shared

1.9 MB/shared

1.5 MB/shared

1.44 MB/shared

Size/allocation

On
-
chip

On
-
chip

On
-
chip

On
-
chip

Implementation

36 MB

36 MB

32 MB

32 MB

Size

36 MB

Tags on
-
chip

DPM
6

DCM
3
/MCM
2

80 (est)

1.65/1.9

276 mtrs

389 mm
2

130 nm

5/2004

DC

POWER5

SCM
1
/MCM
2

70

1.7

184 mtrs

380 mm
2

130 nm

11/2002

DC

POWER4+

32 MB

Tags on
-
chip

SCM
1
/MCM
2

115/125

Tags on
-
chip, data off
-
chip

1.3

174 mtrs

412 mm
2

180 nm

10/2001

DC

POWER4

36 MB

Tags on
-
chip

DPM
6

DCM
4
/QCM
5

70

1.92

276 mtrs

230 mm
2

90 nm

10/2005

DC

POWER5+

L3 size

L3 impl.

Power management

Dual threaded

Packaging

TDP [W]

Implementation

f
c

[GHz]

Nr. of transistors

Die size

Technology

Introduced

Dual/Quad
-
Core

POWER line

10.3

1

SMC: Single Chip Module

2

MCM: Multi Chip Module

3

DCM: Dual Chip Module

4

DCM: Dual Core Module

5

QCM: Quad Core Module

6

DPM: Dynamic Power Management

POWER6’s main features [3.15b]

10.3.1 POWER6 (1)



ultra
-
high frequency (4.7 = GHz) dual core dual threaded SMT




13 FO4 design




private 4 MB L2 caches




partially integrated 32 MB L3 victim cache




minimization of excessive circuitry to reduce dissipation


(modest speculation and ooo
-
execution, no renaming)




push many fuctions of decoding and instruction grouping into predecoding (4 stages)


(added L2 latency causes 0.5 % loss for each stage whereas each added stage after


the I
-
cache access results in about 1 % loss per stage)




increased dispath and completion bandwidth (to 7 instructions per thread)




L2 cache, SMP interconnect, parts of the memory and I/O subsystem operate at 0.5 fc,


L3 operates at one
-
quarter, the memory. controller up to 3.2 GHz.


(In the POWER5 the L2 operates at fc,the remaining components at 0.5 fc.)




since L2 operates at 0.5 fc, the width of the load and store interfaces was doubled.


10.3.1 POWER6 (2)

POWER6 (in the IBM System p570) had at intro the
highest

figures for
SPECint2006
,

SPECfp2006
,
SPECjbb2005

(Java performance) and
TPC
-
C

(transaction performance).

POWER6

POWER5+

Figure: Contrasting the block diagrams of the POWER5 and POWER6 processors [3.15a]

Hardware support of decimal arithmetic

10.3.1 POWER6 (3)

Figure: Comparing the POWER5 and POWER6 processors [3.15b]

10.3.1 POWER6 (4)

Table: Throughput comparison POWER6 vs POWER5 [3.15b]

10.3.1 POWER6 (5)

10.3.1 POWER6 (6)

[3.15b]

Figure: The internal pipelines of the POWER6 and the POWER5 [3.15b]

10.3.1 POWER6 (7)

Figure: First level nodal topology of the POWER6 vs POWER5 [3.15b]

10.3.1 POWER6 (8)

Figure: Second level topology of the POWER5 vs POWER6 [3.15b]

10.3.1 POWER6 (9)

Table: POWER6 processor functional signal I/O
-
pin comparison for various system types [3.15b]

10.3.1 POWER6 (10)

10.3.1 POWER6 (11)

Figure: Micrograph of the POWER6 [3.15b]

10.3.1 POWER6 (12)

Table: Main features of IBM’s dual
-
core POWER line

On
-
chip

On
-
chip

On
-
chip

Off
-
chip

Mem. contr.

L3

L2

2*4 MB/private

1.9 MB/shared

1.9 MB/shared

1.5 MB/shared

1.44 MB/shared

Size/allocation

On
-
chip

On
-
chip

On
-
chip

On
-
chip

On
-
chip

Implementation

32 MB

36 MB

36 MB

32 MB

32 MB

Size

Tags on
-
chip

DPM
6

DCM
3
/MCM
2

80 (est)

1.65/1.9

276 mtrs

389 mm
2

130 nm

5/2004

DC

POWER5

SCM
1
/MCM
2

70

1.7

184 mtrs

380 mm
2

130 nm

11/2002

DC

POWER4+

Tags on
-
chip

SCM
1
/MCM
2

115/125

Tags on
-
chip, data off
-
chip

1.3

174 mtrs

412 mm
2

180 nm

10/2001

DC

POWER4

Tags on
-
chip

DPM
6

DCM
4
/QCM
5

70

1.92

276 mtrs

230 mm
2

90 nm

10/2005

DC

POWER5+

Tags on
-
chip

L3 impl.

n.a.

Power management

Dual threaded

n.a.

Packaging

~100

TDP [W]

Implementation

4.7

f
c

[GHz]

790 mtrs

Nr. of transistors

341 mm
2

Die size

65 nm

Technology

5/2007

Introduced

DC

Dual/Quad
-
Core

POWER6

POWER line

1

SMC: Single Chip Module

2

MCM: Multi Chip Module

3

DCM: Dual Chip Module

4

DCM: Dual Core Module

5

QCM: Quad Core Module

6

DPM: Dynamic Power Management

On
-
chip

10.3 IBM’s MC processors



Cell BE

90

nm

2/2006

10.3.2 Cell BE

Figure: The history and development cost of the Cell BE [3.17], [3.22]

10.3.2 Cell BE (1)

AUC
: Atomic Update Cache

BIC
: Bus Interface Contr.

EIB
: Element Interface Bus

LS
: Local Store of 256 KB

MFC
: Memory Flow Controller

MIC
: Memory Interface Contr.

PPE
: Power Processing Element

PXU
: POWER Execution Unit

SMF
: Synergistic Memory Flow


Unit

SPU
: Synergistic Processor Unit

SXU
: Synergistic Execution Unit

XDR
: Rambus DRAM

Figure: Block diagram of the Cell BE [3.19]

10.3.2 Cell BE (2)


PPE: dual
-
threaded


>

200 GFLOPS (SP)


>

20 GFLOPS (DP)


>

25 GB/s memory BW


>

75 GB/s I/O BW


>

300 GB/s EIB BW


fc
>

4 GHz (lab)

Figure: Main design parameters of the Cell BE [3.28]

10.3.2 Cell BE (3)

Design parameters of the Cell BE:

Figure : Cell SPE architecture [3.16]

10.3.2 Cell BE (4)

Figure: Block diagram of the SPE [3.19]

10.3.2 Cell BE (5)

Figure: Pipeline stages of the Cell BE [3.19]

10.3.2 Cell BE (6)

Figure: Floor plan of a single SPE [3.19]

10.3.2 Cell BE (7)

Principle of operation of the Element Interface Bus (EIB) [3.23]

10.3.2 Cell BE (8)

Figure: The Element Interface Bus EIB) [3.19]

10.3.2 Cell BE (9)

Figure: The Synergistic Memory Flow unit (SMF) [3.19]

10.3.2 Cell BE (10)

Figure: PPE block diagram [3.28]

Figure: Floor plan of the Cell BE processor [3.19]

235 mm
2

241 mtrs

10.3.2 Cell BE (11)

10.3.2 Cell BE (12)

Table: Main features of the IBM’s Cell BE

L3

On
-
chip

Memory controller

Ring based

Interconnection network

Up to 75 MB/s

I/O bandwidth

PPE: 2
-
way

SPE:

Multithreading

95 W @ 3GHz

TDP [W]

25 GB/s

Memory bandwidth

PPE: 512 KB

SPE: 256 KB Local Store (128*128 bit)

L2

3.0/3.2

f
c

[GHz]

234 mtrs

Nr. of transistors

221 mm
2

Die size

90 nm

Technology

9/2006 (in the QS20 BladeCenter)

Introduction

PPE: 64
-
bit RISC

SPE: Dual
-
issue 32
-
bit SIMD with 128 bit capability

Cores

PowerPC 2.02

Architecture

Heterogeneous

1xPPE, 8*SPE

Implementation

Cell BE

Series

Source: Brochard L., A Cell History,” Cell Workshop, April, 2006



http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf

Figure: Cell BE Blade Roadmap

10.3.2 Cell BE (13)

Source: Hofstee H. P., „Real
-
time Superconputing and Technology for Games and Entertainment,” 2006,


http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf

Figure: Roadmap of the Cell BE

10.3.2 Cell BE (14)


10.3 Literature (1)

POWER4, POWER4+

[3.3]

Grassl C
., „
New IBM Components for HPCx
”, Dec. 2003,


http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf

[3.1]

Barney B
., „
IBM POWER Systems Overview
”, Livermore Computing, 2006,


http://www.llnl.gov/computing/tutorials/ibm_sp/

[3.2]

DeMone P.
, „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004,


http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf

[3.4]

Krevell K
., „IBM’s POWER4 Unveiling Continuues”, Microprocessor Report, Nov. 20. 2000, pp
-

1
-
4




[3.5]


Tendler
, J.M.
, Dodson
, S.
, Fields

S.
, Le

H.
, Sinharoy

B.
: Power4 System Microarchitecture,




IBM Server, Technical White Paper, October 2001
,



http://www
-
03.ibm.coom/servers/eserver/pseries/hardware/whitepapers/power4.pdf

POWER5
,
POWER5+

[3.9]

Grassl C
., „
New IBM Components for HPCx
”, Dec. 2003,


http://www.hpcx.ac.uk/about/events/annual2003/Grassl.pdf

[3.7]

Barney B
., „
IBM POWER Systems Overview
”, Livermore Computing, 2006,


http://www.llnl.gov/computing/tutorials/ibm_sp/

[3.8]

DeMone P.
, „Sizing Up the Super Heavyweights,” Real Word Technologies, Sept. 2004,


http://h21007.www2.hp.com/dspp/files/unprotected/Itanium/sizingsuperheavys.pdf

[3.10]

Kalla R
., „IBM’s POWER5 Microprocessor Design and Methodology,” 2003,



www
-
csl.csres.utexas.edu/users/billmark/teach/cs352
-
05
-
spring/lectures/Lecture22
-
RonKallaIBM.pdf


[3.6]


Tendler
, J.M.
, Dodson
, S.
, Fields

S.
, Le

H.
, Sinharoy

B.
: Power4 System Microarchitecture,,


IBM J. Res. & Dev. Vol. 46, No. 1, Jan. 2002, pp. 5
-
25,



http://www.research.ibm.com/journal/rd/461/tendler.pdf

[3.11]

Kalla

R.
, Sinharoy

B.
, Tendler

J.
:

Simultaneous Multi
-
threading Implementation in Power5




IBM’s Next Generation POWER Microprocessor, 2003


http://www.hotchips.org/archives/hc15/3_Tue/11.ibm.pdf

[3.12]

Krevell K
., „POWER5 Tops on Bandwidth”, Microprocessor Report, Dec. 2003



http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power5%20(2)%20(mpr).pdf

[3.13]

Shinharoy B., Kalla R.N., Tendler J.M.
, Eickenmeyer R.J., Joyner J.B., „POWER5 system


microarchitecture,” IBM J. R&D, Vol. 49, No. 4/5, 2005, pp. 505
-
521

[3.15a]

Kanter D
., „IBM Previews the Power6,” Oct. 2006, dkanter@realwordtech.com

[3.14]

Vetter S.

et al., IBM System p5 Quad
-
Core Module Based on POWER5+ Technology,” Redbooks paper,


IBM Corp. 2006, http://www.redbooks.ibm.com/redpapers/pdfs/redp4150.pdf

POWER6

POWER5
,
POWER5+
(cont.)

Cell BE

[3.17]

Brochard L.
, A Cell History,” Cell Workshop, April, 2006


http://www.irisa.fr/orap/Constructeurs/Cell/Cell%20Short%20Intro%20Luigi.pdf


[3.19]

Gshwind M.
, „Chip Multiprocessing and the Cell BE,” ACM Computing Frontiers, 2006,


http://beatys1.mscd.edu/compfront//2006/cf06
-
gschwind.pdf

[3.16]

Blachford N.
:
„Cell Architecture Explained Version 2”
,


http://www.blachford.info/computer/Cell/Cell1_v2.html

[3.18]

Day M. and Hofstee P.
, „Hardware and Software Architectures for the Cell Broadband Engine processor,


” CODES, Sept. 2006,
http://www.casesconference.org/cases2005/pdf/Cell
-
tutorial.pdf

10.3 Literature (2)

[3.15b]

Le. H. Q. et al
., „IBM POWER6 microarchitecture,” IBM J. R&D, Vol. 51, No. 6, 2007. pp 639
-
662

10.3 Literature (3)

Cell BE
(cont.)

[3.23]

Keable C.
, „And we also have hardware...” 17th Machine Evaluation Workshop, Dec. 2006,


http://www.cse.clrc.ac.uk/disco/mew17/talks/Keable_IBM_MEW17.pdf

[3.21]

Hofstee H. P.
, „Real
-
time Superconputing and Technology for Games and Entertainment,” 2006,


http://www.cercs.gatech.edu/docs/SC06_Cell_111606.pdf

[3.26]

Solie, D.
, „Technology Trends Presentation,” Power Symposium, Aug. 2006,


http://www
-
03.ibm.com/procurement/proweb.nsf/objectdocswebview/


file14+
-
+darryl+solie+
-
+ibm+power+symposium+presentation/$file/


14+
-
+darryl+solie
-
ibm
-
power+symposium+presentation+v2.pdf

[3.27]

-

„Cell Broadband Engine processor


based systems,” White Paper, IBM Corp., 2006

[3.25]

Krewell K.
, „Cell Moves Into The Limelight,” Microprocessor Report, Febr. 14 2005, pp. 1
-
9

[3.20]

Gschwind M., Hofstee H. P., Flachs B. K., Hophkins M., Watanabe Y., Yamazaki T




Synergistic Processing in Cell's Multicore Architecture
,”

IEEE Micro, Vol. 26, No.
2
,

2006, pp.
10
-
24

[3.24]

Krolak D.
, „Unleashing the Cell Broadband Engine Processor,” MPR Fall Proc. Forum, Nov. 2005,


http://www
-
128.ibm.com/developerworks/power/library/pa
-
fpfeib/?ca=dgr
-
lnxwCellConnects

[3.22]

Hofstee H. P.
, „Cell today and tomorrow,” 2005,
http://www.stanford.edu/class/ee380/Abstracts/Cell_060222.pdf

[3.28]

-

„Cell Architecture”, Course Code L1T1H1
-
10, 2006,


http://www.power.org/resources/devcorner/cellcorner/CellTraining_Track1/CourseCode_L1T1H1
-
10_


CellArchitecture.pdf