Nehalem & Core i7 – i5 – i3 Architecture - WordPress.com

mittenturkeyElectronics - Devices

Nov 26, 2013 (3 years and 8 months ago)

89 views

Presented by :

Nasser
Hadjloo

http://Hajloo.wordpress.com


Design Considerations


Instruction
-
level parallelism.


Use of Cache hierarchies

and their management.


Higher clock speeds


The Front Side Bus (FSB).


Multi
-
Threading.


Power Consumption and heating issues.


Etc …



Intel Architectures: Netburst


NetBurst

Architecture

Features of Netburst Architecture


Hyperthreading


single processor appears to be two logical
processor


Each logical processor has its own set of register,
APIC( Advanced programmable interrupt
controller)


Increases resource utilization and improve
performance.




Rapid Execution Engine:


Arithmetic Logic Units (ALUs) run at twice the
processor frequency.


Basic integer operations executes in 1/2 processor
clock tick.


Provides higher throughput and reduced latency of
execution.


Netburst
Microarchitecture


Design Considerations


Deeper pipeline(20 stage) with increased branch
mispredictions but greater clock speeds and
performance.


Techniques to hide penalties such as parallel
execution, buffering, and speculation.


Executes instructions dynamically and out
-
of order.


Performance of a particular code sequence may vary
depending on the state the machine was in when
that code sequence was entered.

Modifications in NetBurst


Northwood design combined an increased cache size, a
smaller 130 nm fabrication process, and hyper
-
threading technology


Prescott
, had a heavily improved branch predictor, the
introduction of the SSE3 SIMD instructions , the
implementation of Intel 64, Intel's branding for their
compatible implementation of the x86
-
64 64
-
bit version
of the x86 architecture


two Prescott cores in a single die, and later
Presler
,
which consists of two
Cedar Mill

cores on two separate
dies.


But this had problems……….

Heading to Core

Core
Microachitecture

Core Microarchitecture

Design Considerations of Core


L2 control unit (super
-
queue)=
L2 controller
(snoop
requests)+
Bus control unit
(data and I/O requests to
and from the external bus).


Prefetching unit is extended to handle separately
hardware prefetching by each core.



Shared L2 cache in the Core 2 Duo eliminates on
-
chip
L2
-
level cache coherence and between L1s of two cores
in Core 2 Duo.


Although, Core 2 Duo benefits from its on
-
chip access to
the other L1 cache, its performance is limited.




Features of Core Architecture


Multiple cores and hardware virtualization.


14 stage pipeline (smaller than Netburst).


Dual core design with linked L1 cache and shared L2
cache.


Macrofusion
-

Two program instructions can be
executed as one micro
-
operation.


Intel Intelligent Power Capability
-

manages run time
power consumption of the processors’ execution cores.


Includes advanced power gating capacity
-

ultra fine
-
grained control systems that turns on individual
processor logic subsystems only if when they are
needed.

Modifications in Core



Allendale core, with 2

MB L2 cache, offers a
smaller die size and therefore greater yields.


Merom
, the first mobile version of the Core 2,
gives more emphasis on low power consumption
to enhance notebook battery life.


Kentsfield

released was the first Intel desktop quad
core CPU. It comprises of two separate silicon dies
(each equivalent to a single Core 2 duo) on one
multi chip module


Penryn design are the addition of new instructions
including SSE4.


Problem……..

Problem with
quad core

Heading to Nehalem

Introduction


Core i7 New Intel CPU brand name for
the business and high
-
end consumer
markets


Core i5 processors intended for the
main
-
stream consumer market


Core i3 processors intended for the
entry
-
level consumer market

Features of Nehalem


Integrated Memory Controller


Quick Path Interconnect


Advanced Configuration and Power States


Improvements to the pipeline (L2 Branch Predictor,
Renamed Returned Stack Buffer, L2 TLB, etc)


HyperThreading


SSE4.2 instructions


Nehalem architecture has a three
-
level cache

Core i7 History


It was started by Bloomfield Architecture
in 2008


In 2009 Lynnfield and
Clarksfield

models
cames


Prior to 2010 all models were quad core


In 2010
Arrandale

(dual core) models
comes


In 2010
Gulftown

models (extreme)
comes which has six
hyperthreaded

cores

Bloomfield


All models started by Core
-
i7 9xx with socket 1366


Includes single
-
processor servers sold as Xeon
35xx


Replaced Yorkfield processors


Use a different socket than other core
-
I
cpus

.
Even from all 45 nm
cpus



On
-
die memory controller (
uncore

clock)


Use (only one) QPI instead of FSB


Support for SSE4.2 & SSE4.1 instruction
sets


Bloomfield


32 KB L1 instruction and 32 KB L1 data cache
per core


256 KB L2 cache (combined instruction and
data) per core


8 MB L3 (combined instruction and data)
"inclusive", shared by all cores


"Turbo Boost" technology allows all active cores
to intelligently clock themselves up in steps of
133 MHz over the design clock rate as long as
the CPU's predetermined thermal and electrical
requirements are still met

Lynnfield


Used on Core
-
i5


There is no QPI but directly connects to a
southbridge

using a 2.5 GT/s Direct Media
Interface and to other devices using PCI
Express links in its Socket 1156


Core i7 processors based on Lynnfield have
Hyper
-
Threading, which is disabled in
Lynnfield
-
based Core i5 processors

Lynnfield


Core i5
-
7xx, Core i7
-
8xx or Xeon X34xx


Replaced
Penryn based Yorkfield processor


45 nm


Socket 1156 opposed to the 1366


include Direct Media Interface and PCI
Express links (dedicated northbridge chip,
called the
memory controller hub

or
I/O
hub)

Clarksfield


Is the mobile version of Lynnfield and
available under the Core i7 Mobile brand


Quad core, 45 nm


integrated PCI Express and DMI links


Core i7 7xxQM (6MB), Core i7 8xxQM (8MB),
Core i7 9xxXM Extreme Edition (8MB)


Replaced Penryn
-
QC




Arrandale


Second Mobile cups which contains All Core i7
6xx [UE, LE, E] (4MB)


Core i5 5xx [UM, M, E] (3MB), Core i5 4xxM
(3MB)


Core i3 3xxM, Celeron U3xxx (
unreleased
), P4 xxx
(2MB)


Integrated graphics processing unit but only two
processor cores


32 nm and Dual Core


E series processors are embedded versions with
support for
PCIe

bifurcation and ECC memory

Clarkdale


Desktop version of
Arrandale
, 32 nm


Only as Core i3 and Core i5 and Dual Core


All support Intel's Hyper Threading (HT)


Integrated Graphics as well as PCI
-
Express
and DMI links


The Clarkdale processor package contains
two dies: the actual 32 nm processor with the
I/O connections and the 45 nm graphics
controller with the memory interface


Successor of
Wolfdale

(45nm)

Clarkdale


Used in Intel Core, Pentium and Celeron


The Core i5 versions generally have all
features enabled


Only the Core i5
-
661 model lacking Intel
VT
-
d and TXT like the Core i3, which
also does not support Turbo Boost and
the AES new instructions


Pentium and Celeron versions do not
have SMT, only use a reduced amount
of third
-
level cache

Gulftown

or
Westmere
-
EP


The Extreme Edition version of the Core i7
featuring 6 cores, 32nm process (core i9)


Gulftown

is the first six
-
core dual
-
socket
processor from Intel


Hyper
-
Threading (for a total of 12 logical
threads), 12 MB of cache, Turbo Boost and
Intel QuickPath connection bus


Uses
Westmere

micro architecture a 32 nm
shrink version of Nehalem

Gulftown


50% higher performance than
bloomfield

core i7 975


Includes Core i7 9xx and Corei7 9xxx
[12 MB], Xeon 36xx, Xeon 56xx


Socket 1366

Specification

Nehalem Architecture

Nehalem Architecture

Design Considerations


Hypertreading

is reintroduced to cater to
increasing number of thread based applications.


Cores are placed on a single die to reduce
latencies.


QuickPath Interconnect also supplements to
achieve this purpose.


L1 and L2 for each core and large shared L3 cache
for improving performance.

Looking forward to Sandy Bridge

What can we expect……


Sandy Bridge microchip will have an architecture
optimized for 32
-
nanometer transistors


The Sandy Bridge microarchitecture is also said to
focus on the connections of the processor core like
vertical interconnects and multilevel dies


Increase in FLOPs by using AVX (Advanced Vector
Extensions)


Haswell will be the successor to Sandy Bridge will
be in 22nm.


The tick tock model works just fine…!!!

Intel Processor Trends

Intel Processor Trends

NetBurst

Core

Nehalem

Cache
Hierarchy

Two

level
hierarchy

Two level
hierarchy

Three level
hierarchy

Second level

cache size

256KB

2MB

1MB

12MB

>1MB

Third level

cache size

-

-

8MB

Front side bus

(in MHz)

400, 533, 800,
1066

533, 667,800,

1066,1333,1600

(QPI=6.4GT/s)

Intel Processor Trends

SPEC 2000benchmark

2003
-

(3.0 GHz,

Pentium 4 processor
with Hyper
-
Threading
Technology)

Primary Cache: 12k
micro
-
ops I + 8KBD on
chip Secondary Cache:
512KB(I+D) on chip

Memory: 512 MB

2004
-

(3.80 GHz, Intel
Pentium 4 processor
570J)

Primary Cache: 12k
micro
-
ops I + 16KBD on
chip Secondary Cache:
1MB(I+D) on chip
Memory: 1 GB

2005
-

3.73 GHz,

Intel(R) Pentium(R) 4
processor

Primary Cache: 12k
micro
-
ops I + 16KBD on
chip Secondary Cache:
2MB(I+D) on chip

Memory: 1 GB

2006
-

Intel(R) Core(TM)
2 Extreme processor
X6800( 2.93 GHz, 1066
MHz bus

Primary Cache: 32KBI +
32KBD per core, on
chip Secondary Cache:
4 MB(I+D) per chip, on
chip (shared)

Memory: 2 GB

SPEC 2006 benchmark

2006:Intel Core 2 Duo
E6700
2.67 GHz, 1066
MHz bus


Primary Cache: 32 KB I +
32 KB D on chip per core



Secondary Cache: 4 MB
I+D on chip per chip


Memory: 2 GB


2007:Intel Core 2 Extreme


QX9650
3.00 GHz

1333 MHz FSB


Primary Cache:32 KB I + 32 KB


D on chip per core

Secondary Cache:12 MB I+D


on chip per chip,


6 MB shared / 2 cores


Memory: 4 GB

2008:Intel Xeon X5270
3.5GHz


Primary Cache: 32 KB I + 32
KB


D on chip per core

Secondary Cache: 6 MB I+D


on chip per chip


Memory: 16 GB




2009:Intel Core i7
-
965 Extreme
Edition

Intel Turbo Boost Technology up
to 3.46 GHz


Primary Cache: 32 KB I + 32 KB
D on chip per core

Secondary Cache: 256 KB I+D
on chip per core

L3 Cache: 8 MB I+D on chip per
chip


Memory: 12 GB



Focus needs to be on more scalable and robust
architecture.


Implementing 3
-
D integration.


How about a 128 bit processor?


The speed of light problem.


The end of Moore’s Law?


Our Views

REFERENCES:


Journals:


Koufaty
, D. Marr, D.T, “
Hyperthreading technology In the
netburst

Microarchitecture
”, Volume: 23 , Issue: 2, page(s): 56


65.


Lu
Peng
,
Jih
-
Kwon
Peir
,
Prakash
, T.K., Yen
-
Kuang

Chen,
Koppelman
, D, “
Memory
Performance and Scalability of Intel's and AMD's Dual
-
Core Processors: A Case
Study
”,

Performance, Computing, and Communications Conference, 2007. IPCCC
2007. IEEE International


11
-
13 April 2007 Page(s):55


64.



Kurd, N., Douglas, J.,
Mosalikanti
, P., Kumar, R., “
Next generation Intel® micro
-
architecture (Nehalem) clocking architecture
”, VLSI Circuits, 2008 IEEE Symposium
on 18
-
20 June 2008 Page(s):62


63.


Varghese George,
Sanjeev

Jahagirdar
, Chao Tong, Smits, Ken,
Satish

Damaraju
,
Siers
, Scott,
Ves

Naydenov
,
Tanveer

Khondker
,
Sanjib

Sarkar
,
Puneet

Singh, “
Penryn:
45
-
nm next generation Intel® core™ 2 processor
”, Solid
-
State Circuits Conference,
2007. ASSCC '07. IEEE Asian 12
-
14 Nov. 2007 Page(s):14


17.


Chang, J., Ming Huang, Shoemaker, J., Benoit, J.,
Szu
-
Liang Chen, Wei Chen,
Siufu

Chiu,
Ganesan
, R.; Leong, G.,
Lukka
, V.,
Rusu
, S.,
Srivastava
, D., “
The 65
-
nm 16
-
MB
Shared On
-
Die L3 Cache for the Dual
-
Core Intel Xeon Processor 7100 Series
”,
Solid
-
State Circuits, IEEE Journal of Volume 42,


Issue 4,


April 2007 Page(s):846


852.


Bin
-
feng

Qian
, Li
-
min Yan, “
The research of the inclusive cache used in multi
-
core
processor”
,

Electronic Packaging Technology & High Density Packaging, 2008.
ICEPT
-
HDP 2008. International Conference on 28
-
31 July 2008 Page(s):1


4.


Online References:



www.wikipedia.org


www.intel.com


http://www.hexus.net/content/item.php?item=3824