Debunking the 100X GPU vs
CPU Myth: An Evaluation of
Throughput Computing on
CPU and GPU
Victor W. Lee,
et al.
Intel Corporation
ISCA ’10 June 19
-
23, 2010,
Saint
-
Malo, France
Mythbusters view on the topic
CPU vs GPU
http://videosift.com/video/MythBusters
-
CPU
-
vs
-
GPU
-
or
-
Paintball
-
Cannons
-
are
-
Cool
Full movie:
http://www.nvidia.com/object/nvision08_gpu_v_cpu.ht
ml
The Initial Claim
Over the past 4 years NVIDIA has made a great
many claims regarding how porting various
types of applications to run on GPUs instead of
CPUs can tremendously improve performance
by anywhere from 10x to 500x.
But it actually began much earlier (SIGGRAPH
2004)
http://pl887.pairlitesite.com/talks/2004
-
08
-
08
-
GP2
-
CPU
-
vs
-
GPU
-
BillMark.pdf
Intel’s Response?
Intel, unsurprisingly, sees the situation
differently, but has remained relatively quiet on
the issue, possibly because Larrabee was going
to be positioned as a discrete GPU.
Intel’s Response?
The recent announcement that Larrabee has been
repurposed as an HPC/scientific computing solution
may therefore be partially responsible for Intel ramping
up an offensive against NVIDIA's claims regarding
GPU computing.
At the International Symposium On Computer
Architecture (ISCA) this June, a team from Intel
presented a whitepaper purporting to investigate the
real
-
world performance delta between
CPUs
and
GPUs.
But before that….
December 16, 2009
One month after ISCA’s final papers were due.
The Federal Trade Commission filed an
antitrust
-
related lawsuit against Intel Wednesday
, accusing the
chip maker of deliberately attempting hurt its
competition and ultimately consumers.
The
Federal Trade Commission's complaint
against
Intel for alleged anticompetitive practices has a new
twist: graphics chips.
2009 was expensive for Intel
The
European Commission fined Intel
for
nearly 1.5 billion USD,
the
US Federal Trade Commission sued Intel
on
anti
-
trust grounds, and
Intel settled with AMD
for another 1.25 billion
USD.
If nothing else it was an expensive year, and while
Intel settling with AMD was a significant milestone
for the company it was not the end of their troubles.
Finally the settlement(s)
The EU Fine is still
under appeal ($1.45B)
8/4/2010 Intel Settles with the FCC
Then there is the whole Dell issue….
So back to the paper,
What did Intel Say?
Throughput Computing
Kernels
What is a kernel?
Kernels selected:
SGEMM, MC,
Conv
, FFT, SAXPY, LBM,
Solv
,
SpMV
, GJK, Sort, RC, Search,
Hist
,
Bilat
The Hardware selected
CPU:
3.2GHz Core i7
-
960, 6GB RAM
GPU
1.3GHz
eVGA
GeForce
GTX280 w/ 1GB
Optimizations:
CPU
Mutithreading
,
cache blocking, and
reorganization of memory accesses for
SIMDification
GPU
Minimizing global synchronization, and
using local shared buffers.
This even made Slashdot
Hardware:
Intel, NVIDIA Take Shots At
CPU vs. GPU Performance
And PCWorld
Intel: 2
-
year
-
old Nvidia GPU Outperforms
3.2GHz Core I7
Intel researchers have published the results of a
performance comparison between their latest quad
-
core Core i7 processor and a two
-
year
-
old Nvidia
graphics card, and found that the Intel processor
can't match the graphics chip's parallel processing
performance.
http://www.pcworld.com/article/199758/intel_2ye
arold_nvidia_gpu_outperforms_32ghz_core_i7.html
From the paper's abstract:
In the past few years there have been many
studies claiming GPUs deliver substantial
speedups ...over multi
-
core CPUs...[W]e perform
a rigorous performance analysis and find that
after applying optimizations appropriate for
both CPUs and GPUs the performance gap
between an Nvidia GTX280 processor and the
Intel Core i7 960 processor narrows to only 2.5x
on average.
Do you have a problem with this statement?
Intel's own paper indirectly raises a question
when it notes:
The previously reported LBM number on GPUs
claims 114X speedup over CPUs. However, we
found that with careful multithreading,
reorganization of memory access patterns, and
SIMD optimizations, the performance on both
CPUs and GPUs is limited by memory bandwidth
and the gap is reduced to only 5X.
What is important about the context?
The International Symposium on Computer
Architecture (ISCA) in Saint
-
Malo, France,
interestingly enough,
is the same event where
NVIDIA’s Chief Scientist Bill Dally received the
prestigious
2010 Eckert
-
Mauchly Award
for his
pioneering work in architecture for parallel
computing.
NVIDIA Blog Response:
It’s a rare day in the world of technology when a
company you compete with stands up at an important
conference and declares that your technology is *only*
up to 14 times faster than theirs.
http://blogs.nvidia.com/ntersect/2010/06/gpus
-
are
-
only
-
up
-
to
-
14
-
times
-
faster
-
than
-
cpus
-
says
-
intel.html
NVIDIA Blog Response: (cont)
The real myth here is that multi
-
core CPUs are
easy for any developer to use and see
performance improvements.
Undergraduate students learning parallel
programming at M.I.T. disputed this when they
looked at the performance increase they could
get from different processor types and
compared this with the amount of time they
needed to spend in re
-
writing their code.
According to them, for the same investment of
time as coding for a CPU, they could get more
than 35x the performance from a GPU.
Despite substantial investments in parallel
computing tools and libraries, efficient multi
-
core optimization remains in the realm of
experts like those Intel recruited for its analysis.
In contrast, the CUDA parallel computing
architecture from NVIDIA is a little over 3 years
old and already hundreds of
consumer
,
professional
and
scientific
applications are
seeing speedups ranging from 10 to 100x using
NVIDIA GPUs.
Questions
Where did the 2.5x, 5x, and 14x come from?
How big were the problems that Intel used for
comparisons? [compare w/ cache size]
How were they selected?
What optimizations were done?
Fermi cards were almost certainly unavailable
when Intel commenced its project, but it's still
worth noting that some of the GF100's
architectural advances partially address (or at
least alleviate) certain performance
-
limiting
handicaps Intel points to when comparing
Nehalem to a GT200 processor.
Bottom Line
Parallelization is hard, whether you're working
with a quad
-
core x86 CPU or a 240
-
core GPU;
each architecture has strengths and weaknesses
that make it better or worse at handling certain
kinds of workloads.
Other Reading
On the Limits of GPU Acceleration
http://www.usenix.org/event/hotpar10/tech/f
ull_papers/Vuduc.pdf
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment