Title of the research paper:Performance analysis of Multicore Systems

georgenameElectronics - Devices

Nov 27, 2013 (3 years and 6 months ago)

86 views


Title of the research paper:

Performance analysis of Multicore Systems


Research Area:

Multicore Systems

Authors:

L
akhvinder singh & Harmeet kaur

Faculty mentor:

Dayanand.J

Name of the Institution:

GURU NANAK
DEV ENGG COLLEGE,

BIDAR


Abstract
:


One
constant in computing is that the world’s hunger for faster
performance is never satisfied. Every new performance advance in processors leads
to another level of greater performance demands from businesses and consumers.
Today these performance demands are

not just for speed, but also for smaller, more
powerful mobile devices, longer battery life, quieter desktop PCs, and

in the
enterprise

better price/performance per watt and lower cooling costs. People want
improvements in productivity, security
, multitas
king
, data protection, game
performance, and many other capabilities.

There’s also a growing demand for more convenient form factors for the home, data
center, and on the go.

Through advances in silicon technology, micro architecture, software, and platfor
m
-
T
echnologies, Intel is on a fast
-
paced trajectory to continuously deliver new
generations of multi
-
core processors with the superior performance and energy
-
efficiency necessary to meet these demands for years to come.

In mid
-
2006, we reached new levels
of energy
-
efficient performance with our Intel®
Core™2 Duo processors and Dual
-
Core

Intel® Xeon® processor 5100 series, both produced with our latest 65
-
nanometer

(nm) silicon technology and
micro
-
architecture.


Now we’
re using

world’s first mainstream
quad
-
core processors for

both desktop
and mainstream servers

Intel® Core™2 Quad processors, Intel® Core™2Extreme
quad
-
core processors

and others.

This paper explains the advantages and challenges of multi
-
core processing

and

the
direction

in which Intel is

taking multi
-
core processors to the future. We discuss
many of the

benefits you will see as we continue to
increase processor performance,
energy,
efficiency and capabilities.



Background:

For years, Intel customers came to expect a doubling
of

every 18
-
24
months in accordance with Moore’s Law. Most of these performance gains came from

Dramatic increases in frequency (from 5 MHz to 3 GHz in the

years from 1983 to
2002)

and through process technology
advancements. Improveme
nts also came
from incr
eases in
instructions per cycle (IPC). By 2002, however, increasing

P
ower

densities and the resultant heat began to reveal some

limitations in using
pred
ominately frequency as a way of
improving performance. So, while Moore’
s Law
frequency
increases, and I
PC improvement
s continue to play an important
role

in
performance increases, new thinking is also required.

The best example of this new thinking is multi
-
core processors.

By putting multiple executio
n cores into a single processor
(as well as continuing
to
incr
ease clock frequency), Intel is
able to provide even greater multiples of
processing power.

Using multi
-
core processors, Intel can dramatically increase

a computer’s capabilities
and computing resources, providing

better responsiveness, improving mu
ltithreaded
throughput,

and delivering the advantages of parallel computing to

properly
threaded mainstream applications


While manufacturing technology continues to improve, reducing the size of single
gates, physical limits of semiconductor
-
based microelectronics have become a major
design concern. Some effects of these physical limitations can cause significant heat
dissip
ation and data synchronization problems. The demand for more capable
microprocessors causes CPU designers to use various methods of increasing
performance. Some instruction
-
level parallelism (ILP) methods like superscalar
pipelining are suitable for many a
pplications, but are inefficient for others that tend
to contain difficult
-
to
-
predict code. Many applications are better suited to thread
level parallelism (TLP) methods, and multiple independent CPUs is one common
method used to increase a system's overal
l TLP. A combination of increased available
space due to refined manufacturing processes and the demand for increased TLP is
the logic behind the creation of multi
-
core CPUs.




Problem Statement:

How to increase the pe
rformance of multi
-
core systems
?


Methodology:





Performance

of a processor can be increased by
increasing Clock

speed and
Bus

speed.



To increase the speed of processor we need a large cache memory.



We need Transistors for the performance of a processor.


According to MOORE’S Law ”T
he nu
mber of transistor that can be integreated on
single chip keep increasing exponetially and a processor
i
s consider as better
speed by using as many minimum mumber of Transistors.”


A FUNDAMENTAL THEORAM OF MULTI
-
CORE PROCESSOR.

“MULTI
-
CORE PROCESSOR takes
advantages of a fundamental relationship between
power and frequency.”

By incorporating multiple cores each core is able to run at a lower frequency,dividing
among them the power normally given to a single core.






Multi
-
Threading

Processor designers
have found that since most microprocessors spend a

significant

amount of time idly waiting for memory, software parallelism

can be leveraged to
hide memory latency. Since memory stalls typically

take on the order of 100
processor cycles, a processor pipeli
ne is idle for a

significant amount of time.

Table 1 shows the amount of time spent waitingfor memory in some typical
applications, on 2 GHz processors.

For

example, we can see that for a workload such as a Web server, there are

sufficient memory stalls
such that the average number of machine cycles is

1.5

2.5 per instruction, resulting in the pipeline waiting for memory upto 50% of
the time.









In Figure 3, we can see that less than 50% of the processor’s pipeline is

actually
being used to process
instructions; the remainder is spent waiting

for memory.

By providing additional sets of registers per processor

pipeline, multiple software
jobs can be multiplexed onto the pipeline, a

technique known as simultaneous multi
-
threading (SMT). Threads areswi
tched on to the pipeline when another blocks or
waits on memory, thus

allowing the pipeline to be utilized potentially to its
maximum.

Figure 4

shows an example with four threads per core
. In each core, when a memo
ry
stall occurs, the pipeline switches to

another thread, making good use of

the pipeline
while the previous memory stall is fulfilled. The tradeoff is

latency for bandwidth;
with enough threads, we can completely hide memory

latency, provided there is
enough memory bandwidth for the added

reques
ts. Successful SMT systems typically
allow for very high memory

bandwidth from DRAM, as part of their balanced
architecture.






SMT has a high return on performance in relation to additional transistorcount.

For example, a 50% performance gain may be
realized by adding

just 10% more
transistors with an SMT approach, in contrast to making

the pipeline more complex,
which typically affords a 10% performance

gain for a 100% increase in transistors.
Also, implementing multi
-
core

alone doesn’t yield optimal

performa
nce

the best
design is typically a balance

of multi
-
core and SMT.


Key Results
:


Best Energy
-
Efficient Performance Processor Transistors


• Intel Second Generation Strained Silicon

Technology i
ncreases transistor
performance
10 to 15 percent
without increasing leakage.

• Compared to 90 nm transistor technology, Intel’
s
enhanced ene
rgy
-
efficient
performance 65 nm
transistors
provide over 20% improvement in
transist
or
switching speed and over 30%
reduction in transistor switching power
.




Discu
ssion:

This fundamental relationship between power and frequency can be
effectively used to multiply the number of cores from two to four, and then eight and
more, to deliver continuous increases in performance without increasing power
usage. To do this th
ough, there are many

advancements that must be made that are
only achievable by a company like Intel.

These include:


Continuous advances in silicon process technology
from 65 nm to 45 nm and to
32nm
to increase transistor density.

In addition, Intel is
committed to

continuing to deliver superior
energy
-
efficient
performance transistors.


Enhancing the performa
nce of each core and optimizing
it for multi
-
core
through
the introduction of new advanced

micro
-
architectures about every two years.


Improving
the memor
y subsystem and optimizing data
access
in ways that ensure
data can be used as fast as

possible among all cores. This minimizes latency and


improves efficiency and speed.


Optimizing the interconnect fabric
that connects the
cores
to improve pe
rformance
between cores and memory units.




Scope for future work (if any):



Network
-
on
-
chip (NoC):


Network
-
on
-
chip (NoC) has emerged as a new paradigm for designing multi core
Sysems. NoC will help to design
future multi

core Sysems where large numbers of
Intellectual Property (IP) cores are connected to the communication fabric (router
based network) using network interfaces. The network is used for packet switched
on
-
chip communication among cores. It supports high degr
ee of reusability and
scalability. In this work a scalable network based on Mesh of Tree (MoT) topology
has been presented. MoT interconnection network has the advantage of having small
diameter as well as large bisection width and has a nice recursive str
ucture. These
characteristics make it more powerful than other interconnection networks like
meshes and binary trees. A generic NoC simulator is designed for performance
evaluation in terms of network throughput, latency and power of different topologies
u
nder different traffic situations.





80 core processor


We can

bulid 80core

processor

having performance of
1teraflop.


It will be utilizing an Input
power of 78.35W

and its Clock
speed would

be
3.13GHz.


When the cores
are not

needed then this processor would only
need 6.5

watt power
thus it is power saving.

This would serve as being the near future
for the

CPU industry.



Conclusion:

The proximity of multiple CPU cores on the same die allows the cache
coherency circuitry to o
perate at a much higher clock rate than is possible if the
signals have to travel off
-
chip. Combining equivalent CPUs on a single die
significantly improves the performance of cache snoop (alternative: Bus snooping)
operations. Put simply, this means that
signals between different CPUs travel shorter
distances, and therefore those signals degrade less. These higher quality signals
allow more data to be sent in a given time period since individual signals can be
shorter and do not need to be repeated as ofte
n.




References:

[1] Interconnections
”, DATE’2000, IEEE Press, 2000.
pp.250
-
256.



[2]
S. Kumar et al, “A Network on Chip Architecture and Design Methodology”, IEEE
Computer Society Annual Symposium on VLSI, April 2002. pp. 105
-
112.



[3] SIEMENS, “OMI

324
: PI Bus
-
ver.0.3d”, Munich:Siemens AG, 1994. 35p.


[4] IBM Core connect Bus Architecture,”
http://www.ibm.com/chips/products/coreconnect/
.


[5] L.
Benini and G. D. Micheli, “Network on Chips:

A new SOC paradigm," IEEE
computer, pp. 70
-
78, January 2002.



[6] W. J. Dally and Brian Towles
, “Route Packets, Not Wires: On
-
Chip
Interconnection Networks," Proceedings of the 38th Design Automation Conference,
ACM/IEEE, Las Vegas, Nevada, USA, pp. 684
-
689, June 2001.


[7]”Interconnection Network Architectures”

http://www.wellesley.edu/cs/cours
es/cs331/notes/notes
-
networks.pdf
, pp. 26
-
49,
January 2001.

[8] C. Zeferino and A. Susin, SoCIN:A Parametric and Scalable Network on Chip,"
Proc. of the 16th symposium on Integrated circuits and System Design (Sao Paulo,
Brazil). IEEE Computer Society, Pr
ess, Los Alamitos, Calif, pp. 169
-
174, February
2003.

[9] M. Horowitz and B. Dally, “How Scaling Will Change Processor Architecture,”
Proc. Int’l Solid State Circuits Conf. (ISSCC), pp. 132
-
133, Feb. 2004.


[10]
S. Kundu and S. Chattopadhyay, “Mesh
-
of
-
Tre
e Deterministic Routing for
Network
-
on
-
Chip Architecture”, ACM Great Lake Symposium on VLSI, Florida, USA,
2008.


[11] P. P. Pande, C. Grecu, M. Jones, A. Ivanov and R. Saleh, ”Performance
Evaluation and Design Trade
-
Offs for Network
-
on
-
Chip Interconnect A
rchitectures”,
IEEE Transaction on Computers, Vol. 54, No. 8, August 2005.


[12]

Sotiriadis, P. P. and Chandrakasan, A. 2002. A Bus Energy Model for Deep
Submicron Technology. IEEE Transaction on Very Large Scale Integration (VLSI)
Systems, Vol. 10, No. 3
, pp. 341


350

[13]
www.intel.com

[14]
www.intel.com/software/enterprise


Acknowledgements:

The satisfaction and euphoria that accompany the successful completion of any task
would be incomplete without the mention of

the people who make it possible, whose
Constant guidance and encouragement crown all the efforts with success.


We consider it our privilege to express my gratitude and respect all those who
guided, inspired and helped us in the completion of the project,

the expression in the
Project belongs to those listed below.


We are deeply indebted to
Prof.Dayanand.J
for

having consented to be our project
guide and providing invaluable suggestions during the course of the project work.


We are deeply thankful to
Pro
f. S. Arvind
,

head of the department
,
computer
science and

engineering, GNDEC
for providing us the necessary facility in order to
complete the project successfully.


We would like to express our deep sense of gratitude to our principal
Dr. V.D.Mytri.


for

this continuous effort in creating a competitive environment in our minds and
encourage us to bring out the best in us.






Lakhvinder singh


Harmeet kaur