CPU Myth: An Evaluation of

birdsowlSoftware and s/w Development

Dec 2, 2013 (4 years and 29 days ago)

97 views

Debunking the 100X GPU vs
CPU Myth: An Evaluation of
Throughput Computing on
CPU and GPU

Victor W. Lee,
et al.

Intel Corporation


ISCA ’10 June 19
-
23, 2010,

Saint
-
Malo, France

Mythbusters view on the topic


CPU vs GPU


http://videosift.com/video/MythBusters
-
CPU
-
vs
-
GPU
-
or
-
Paintball
-
Cannons
-
are
-
Cool




Full movie:


http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht
ml


The Initial Claim


Over the past 4 years NVIDIA has made a great
many claims regarding how porting various
types of applications to run on GPUs instead of
CPUs can tremendously improve performance
by anywhere from 10x to 500x.



But it actually began much earlier (SIGGRAPH
2004)


http://pl887.pairlitesite.com/talks/2004
-
08
-
08
-
GP2
-
CPU
-
vs
-
GPU
-
BillMark.pdf


Intel’s Response?



Intel, unsurprisingly, sees the situation
differently, but has remained relatively quiet on
the issue, possibly because Larrabee was going
to be positioned as a discrete GPU.



Intel’s Response?


The recent announcement that Larrabee has been
repurposed as an HPC/scientific computing solution
may therefore be partially responsible for Intel ramping
up an offensive against NVIDIA's claims regarding
GPU computing.


At the International Symposium On Computer
Architecture (ISCA) this June, a team from Intel
presented a whitepaper purporting to investigate the
real
-
world performance delta between

CPUs

and
GPUs.



But before that….


December 16, 2009


One month after ISCA’s final papers were due.



The Federal Trade Commission filed an

antitrust
-
related lawsuit against Intel Wednesday
, accusing the
chip maker of deliberately attempting hurt its
competition and ultimately consumers.




The

Federal Trade Commission's complaint

against
Intel for alleged anticompetitive practices has a new
twist: graphics chips.

2009 was expensive for Intel


The

European Commission fined Intel

for
nearly 1.5 billion USD,


the

US Federal Trade Commission sued Intel

on
anti
-
trust grounds, and



Intel settled with AMD

for another 1.25 billion
USD.


If nothing else it was an expensive year, and while
Intel settling with AMD was a significant milestone
for the company it was not the end of their troubles.

Finally the settlement(s)


The EU Fine is still
under appeal ($1.45B)


8/4/2010 Intel Settles with the FCC



Then there is the whole Dell issue….

So back to the paper,

What did Intel Say?


Throughput Computing



Kernels


What is a kernel?



Kernels selected:


SGEMM, MC,
Conv
, FFT, SAXPY, LBM,
Solv
,
SpMV
, GJK, Sort, RC, Search,
Hist
,
Bilat

The Hardware selected


CPU:


3.2GHz Core i7
-
960, 6GB RAM



GPU


1.3GHz
eVGA

GeForce

GTX280 w/ 1GB

Optimizations:


CPU


Mutithreading
,


cache blocking, and


reorganization of memory accesses for
SIMDification


GPU


Minimizing global synchronization, and


using local shared buffers.

This even made Slashdot


Hardware:

Intel, NVIDIA Take Shots At
CPU vs. GPU Performance


And PCWorld


Intel: 2
-
year
-
old Nvidia GPU Outperforms
3.2GHz Core I7


Intel researchers have published the results of a
performance comparison between their latest quad
-
core Core i7 processor and a two
-
year
-
old Nvidia
graphics card, and found that the Intel processor
can't match the graphics chip's parallel processing
performance.


http://www.pcworld.com/article/199758/intel_2ye
arold_nvidia_gpu_outperforms_32ghz_core_i7.html


From the paper's abstract:



In the past few years there have been many
studies claiming GPUs deliver substantial
speedups ...over multi
-
core CPUs...[W]e perform
a rigorous performance analysis and find that
after applying optimizations appropriate for
both CPUs and GPUs the performance gap
between an Nvidia GTX280 processor and the
Intel Core i7 960 processor narrows to only 2.5x
on average.


Do you have a problem with this statement?


Intel's own paper indirectly raises a question
when it notes:


The previously reported LBM number on GPUs
claims 114X speedup over CPUs. However, we
found that with careful multithreading,
reorganization of memory access patterns, and
SIMD optimizations, the performance on both
CPUs and GPUs is limited by memory bandwidth
and the gap is reduced to only 5X.

What is important about the context?


The International Symposium on Computer
Architecture (ISCA) in Saint
-
Malo, France,
interestingly enough,

is the same event where
NVIDIA’s Chief Scientist Bill Dally received the
prestigious

2010 Eckert
-
Mauchly Award

for his
pioneering work in architecture for parallel
computing.

NVIDIA Blog Response:


It’s a rare day in the world of technology when a
company you compete with stands up at an important
conference and declares that your technology is *only*
up to 14 times faster than theirs.




http://blogs.nvidia.com/ntersect/2010/06/gpus
-
are
-
only
-
up
-
to
-
14
-
times
-
faster
-
than
-
cpus
-
says
-
intel.html


NVIDIA Blog Response: (cont)


The real myth here is that multi
-
core CPUs are
easy for any developer to use and see
performance improvements.


Undergraduate students learning parallel
programming at M.I.T. disputed this when they
looked at the performance increase they could
get from different processor types and
compared this with the amount of time they
needed to spend in re
-
writing their code.


According to them, for the same investment of
time as coding for a CPU, they could get more
than 35x the performance from a GPU.


Despite substantial investments in parallel
computing tools and libraries, efficient multi
-
core optimization remains in the realm of
experts like those Intel recruited for its analysis.


In contrast, the CUDA parallel computing
architecture from NVIDIA is a little over 3 years
old and already hundreds of

consumer
,
professional

and

scientific

applications are
seeing speedups ranging from 10 to 100x using
NVIDIA GPUs.

Questions


Where did the 2.5x, 5x, and 14x come from?


How big were the problems that Intel used for
comparisons? [compare w/ cache size]


How were they selected?


What optimizations were done?



Fermi cards were almost certainly unavailable
when Intel commenced its project, but it's still
worth noting that some of the GF100's
architectural advances partially address (or at
least alleviate) certain performance
-
limiting
handicaps Intel points to when comparing
Nehalem to a GT200 processor.

Bottom Line


Parallelization is hard, whether you're working
with a quad
-
core x86 CPU or a 240
-
core GPU;
each architecture has strengths and weaknesses
that make it better or worse at handling certain
kinds of workloads.

Other Reading


On the Limits of GPU Acceleration
http://www.usenix.org/event/hotpar10/tech/f
ull_papers/Vuduc.pdf