Future of GPU/CPU Computing and Programming: Report

tangibleassistantΛογισμικό & κατασκευή λογ/κού

3 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

163 εμφανίσεις

Future of GP
U/CPU Computing and Programming

CD

July 25, 2012



Contents

1. Abstract

................................
................................
................................
................................
.....................

2

2. GPU Background

................................
................................
................................
................................
.......

3

2.a Original

Purpose and History of GPU

................................
................................
................................
..

3

2.b Why Develop Programs for GPU

................................
................................
................................
.........

4

2.c Architecture Difference Between CPU and GPU

................................
................................
.................

4

2.d When to Develo
p Programs for GPU

................................
................................
................................
..

5

3. Who is Already Developing Programs for the GPU

................................
................................
...................

6

4. Future of CPU/GPU Computing

................................
................................
................................
.................

6

5. Hurdles of Hybrid CPU +

GPU Computing

................................
................................
................................
.

7

5.a Hardware

................................
................................
................................
................................
.............

7

5.b Parallel Programming Concepts

................................
................................
................................
..........

7

5.c CPU versus GPU Battle

................................
................................
................................
........................

7

5.d Complexity of Current GPU program APIs and Architecture

................................
..............................

7

5.e GPU Program Portability

................................
................................
................................
.....................

8

5.f Hybrid/Heterogeneous Computing

................................
................................
................................
.....

8

6. Solutions to Hybrid CPU + GPU Computing

................................
................................
..............................

8

6.a Hardware

................................
................................
................................
................................
.............

8

6.e CPU versus GPU Battle

................................
................................
................................
......................

11

6.f Complexity of Current GPU program APIs and Architecture

................................
.............................

11

6.g GPU Program Portability

................................
................................
................................
...................

11

7. Conclusions

................................
................................
................................
................................
.............

12

7.a Future Computing Challenges Tree

................................
................................
................................
...

12

7.b Possible Future and Solutions Summary Table

................................
................................
.................

13

7.c Breakdown of
Parallelism

................................
................................
................................
..................

13

8. Final Summary Conclusion

................................
................................
................................
......................

14

9. References

................................
................................
................................
................................
..............

15

Appendix A

................................
................................
................................
................................
..................

17

Appendix B: data
-
parallel versus task
-
parallel

................................
................................
............................

17

Appendix

BC: API defined

................................
................................
................................
...........................

17



1. Abstract

Computing technology is one of the most dynamic industries in the world today.
C
omputing power
continuously
gets smaller,
f
aster and cheaper.
This technology

continues to seep into every aspect of our
lives and
demand for

processing

power continues to grow at nearly exponential rates.


Figure 1: Exponential Growth of
Computing [
32]


It is difficult, even for people who are already immersed in the computing industry, to keep up with
computing technology; even more so for someone just entering the field
or

who’s job exists on the
peripheral of computer science.


So many questions aris
e when one starts to ponder the computers that exist today. How are computer
going to keep up with the global demand for
processing

power? How much smaller can they make
computers … won’t there be a point where they cannot get any smaller?


Several adven
ts in computer science and technology are allowing computers to keep stride with global
demand. New and revolutionary computer processor
s are
starting to take the stage in the twenty first
century. We are also learning how to take advantage of processing

power which has been available, yet
sitting idle for years; mainly the

parallel processing power which comes from the GPU.


This paper focuses on the future of the
computing power which exists within the boundaries of the
CPU
and the GPU

hardware; the br
ain of computer technology
.
There are so many hurdles to conquer in order
to achieve next generation processors that it can often get confusing and overwhelming for anyone trying
to determine what challenges need to be met next.


We take a look
at poss
ible

next generation

processor
s
and supporting hardware. We also take a look at
how languages, computing models and architectures are rapidly evolving to support heterogeneous (CPU
+ GPU) processing.
The tree shown in figure
2

summarizes all topics which will be discussed in this
report. It is these
great advances in ha
rdware and language development

that will allow
the computing
technology to keep pace with global demand.



Figure
2
:

Future Challenges of Processing and Computing Resources


This paper has three appendixes to ensure the reader is up to speed on basic concepts not covered in the
main body of the documentation. This ensures that all readers
receive

maximum information ab
sorption.



2
. GPU Background

We first take a look at the
GPU

(graphical processing unit). Many people are not as familiar with the
GPU as they are the CPU. It is imperative that one fully understand the potential held within the GPU
before fully delving

into the future of heterogeneous computing.


2
.a Original Purpose and History of GPU

GPU stands for Graphical Processing Unit. The GPU was initially designed to “accelerate memory
-
intensive work of texture mapping and rendering polygons which would the
n be displayed on the user’s
computer screen” [1][2]. The GPU, like the CPU consists of many
transistors
. Unlike the CPU, most of
the GPU transistors are
set aside for the ALU (Arithmetic Logic Unit); the part of the processor devoted
to performing
arithmetic and logical operations.

Recently, technology has come along which allows the average programmer to tap into the GPU
resources. Back in 2006 Nvidia released Cuda 1.0 which allowed programmers to take advantage of the
Nvidia GPU computing power.

This recent evolution of GPU programming has continued to add
flexibility to GPU usage. With this new, now somewhat easy to
access

computing capability, many
engineers and scientists are starting seriously think about using the GPU for non
-
graphical calc
ulations
and number crunching.


Figure 3: Texture Mapping [3]


2
.b Why Develop Programs for GPU

We now understand that, as of recently, we (programmers) have been given the capability to use the GPU
to do much of the work we previously tasked the CPU wi
th doing. The next logical question one would
ask themself would be, why do we need to start using the GPU? I am happy writing programs which
mainly use the CPU. Why fix something that is not broken?


The main advantage to
relying on both the GPU and CP
U for processing resources is simple one reason;
speed. As graphics, animation and user interfaces become more complex, the softwar
e becomes more and
more compute

intensive
. The GPU is ideal for compute intensive and graphical tasks. Unfortunately, if
w
e continue to use only the CPU for processing power, it will always be reaching its max capacity. This
will in turn result in all
processes and software slow
ing down to a crawl;
the user computing experience
quickly becomes
extremely
time consuming and fr
ustrating
.


Another reason to start thinking about GPU programming is simply because it exists. A large majority of
the time the GPU sits idle while the CPU does all the work. GP
Us are more efficient than the C
PUs in
certain areas of computing; especia
lly parallel computing. If the
resources already exist
, why not tap into
this

goldmine of opportunity
?


Lastly
,

research is becoming
exceedingly

compute intensive.
The
current amount

of computing power
available to
engineers and scientists
allow then
to
take advantage of
more accurate and all
-
encompassing
integral and derivative equation
. It also allows them to
use finer mesh geometry, creating models which
more closely mimic that of the actual physical realm as we know it to be.
However more advanced
e
ngineering tool require yet greater computational power.
These types of
computing intensive tasks are
ideal for the GPU.


After hearing
such awe inspiring

things about the GPU … many people start to question why we even
need a CPU. Since the GPU is so mu
ch faster than
then the CPU, does this mean

GPU
will
one day
completely replace the CPU? This is the next topic we discuss.


2
.c Architecture Difference Between CPU and GPU

Many people (and companies) have gone to extreme measures to determine which processor is better and
faster … the GPU or the CPU. Unfortunately this is a very unfair comparison since the two processor
types were designed to do entirely different tasks.
A
s shown in figure
4
, t
he GPU
architecture
is vastly
different than the CPU architecture.
The GPU was design for the sole purpose of
simultaneous (parallel)
data processing

and rapid number crunching.
The CPU on the other hand excels at getting one task
completed at speeds much faster than a GPU could. The CPU is also capable of doing a much larger
variety of tasks than the GPU
.

These two processors will never be able to fully take the place of the
other. Adding CPU capabilities to a GPU forces the GPU

to sacrifice some of its powerful graphical and
computing capabilities. The same problem occurs when adding GPU features to a CPU. In summary, the
GPU is a supplement, not a replacement for the CPU. Our goal as programmers should be to make wise
decisi
on as to when we should take advantage of the GPU. It is our job to ensure the GPU and CPU work
together as efficiently as possible.



Figure 4: Architecture Differences Between GPU and CPU


If you accept everything you have read thus far, the next qu
estion you would be more than likely asking
yourself would be,
so how do I know when to take advantage of the GPU and when to keep my program
on the CPU? This is the next topic of conversation we delve into.


2
.d When to Develop Programs for GPU

Convert
ing a program so that is it capable of taking advantage o
f the GPU is not a simple or chea
p task.
For this reason it is impertinent to determine which code would be most efficient on the CPU and which
would be more efficient if processed by the GPU.

GPU
s excel at graphics rendering, parallel processing and most types of computationally intensive tasks.
Several examples of computationally intensive tasks, ideal for GPU processing are shown in the table
below.
One note of importance is that GPUs can proc
ess data
-
parallel computations contai
ni
ng high
arithmetic intensity much more efficiently than task
-
parallel processes. To understand more about task
-
parallel versus data
-
parallel, please see the Appendix B.


Computationally Intensive Tasks, Ideal for
GPU Processing

Scientific and Engineering computing problems

Physical Simulations

simple structured grid PDR methods in computational finance

Matrix Algebra

Global Illumination (such as ray tracing, photon mapping and radiosity)

Image & Volume
Processing

Non
-
Grid streams

Medical Imaging

XML parsing

Photography



Grid Computing

Table
1

[
10
]


Understanding the wide range of possibilities of GPU computing, brings
us to a different question. Have

any industries seriously started taking advantage of the GPU capabilities? The answer to that is in the
next section.


3
. Who is Alre
ady Developing Programs for the GPU


Figure

5

[9]

Although all GPU complications had yet to be worked out, academia

and
industry
,

have already started
jumping onto the GPU bandwagon.
Figure
5

showed industries starting to take interest in GPU
computing. Interestingly, many the industries taking interest in GPU are currently relying heavily on
HPC (
see Figure
6
). HPC

(higher performance computing)

is a large market which currently mainly
revolves around CPU computing. This competition for industry market has inflamed passionate battles
between many CPU and GPU companies.



Figure
6

[10]


This fierce competition between the CPU and GPU
bring

many to deeply question the true future of the
GPU and CPU hardware.


4
. Future of CPU/GPU Computing

Heterogeneous computing is most likely our future; for two very good reasons.

First off, t
hrougho
ut the 90’s and early 2000’s hardware technology advances allowed increase in
performance without the immediate need for change or fundamental restructuring. Recently hardware
advances have slowed due to the
quantum theory wall
as well as the
thermal and
power wall
.


Secondly, since we have the resources already available it is in our best interest to attempt to
utilize them
.
Today’s computers waste a lot of processor time. The CPU sits idle while the GPU does
its

task; the
GPU sits idle while the CPU
burns itself out trying to do practically everything [11].


Due to the reasons mentioned above as well as to the global demand for more computing power, perhaps
the most logical path to take into the future is that of heterogeneous (hybrid) computing. T
his would be a
future where tasks are split amongst the CPU and GPU accordingly. In the end, the GPU provides a low
cost platform for accelerating high performance computations [13]. Together the GPU and CPU could
take us to new levels of computing power

and capabilities; enabling us to create software we once could
only dream about.


5
.
Hurdles of Hybrid CPU + GPU Computing

Unfortunately this heterogeneous future of computing does not come without what seem to be almost
insurmountable hurdles. The res
t of the paper discusses the
six

main challenges we must face and
conquer before arriving at our destination of heterogeneous computing


5.a
Hardware

There are two main hardware concerns when it comes to processors. The first concern is that processors
ar
e getting so small that they are reaching physical boundaries which keep them from
further

progress.
The second hurdle is that there is a substantial data bottleneck when transferring information between the
GPU and CPU. This bottleneck greatly degrades
the overall performance gains achieved by taking
advantage of the GPU.


5.b
Parallel Programming Concepts

It is natural for humans to think in series. Our entire life we are constantly running through a sequence of
events which we catalog in chronologic
al (sequential) order.

On the other hand, multi
-
processor chip hardware requires dauntingly complex software that breaks up
computing chores into simultaneously processed chunks of code [21]. This hurdle alone leaved many
programmers with a massive head
ache at the end of each day.


5.c
CPU versus GPU Battle

Not only are their technical challenges, but there are also social/economical challenge which have to be
tackled as well. You currently see two distinct camps; those on the CPU side and those on th
e GPU side.
There is quite fierce bickering currently going on which seemed to make heterogeneous progress come to
a halt at times.


5.d
Complexity of Current GPU program APIs and Architecture

Using “current GPU programs to optimize an algorithm for a
specific GPU is a time
-
consuming task
which requires thorough knowledge of both the algorithm as well as the hardware” [13].
The GPU
parallel computing architectures are
still fairly new with the oldest one, CUDA, only being six years old

(2006, see Figur
e
7
)
.

For this reason the framework is undergoing constant evolution and improvement.
Although many of the GPU frameworks are fairly mature, they still have a long way to go before the
standard programmer can really take hold of these new tools and use
them to their max capacity.
This
intermingling

of hardware and software development forces software developers to take their focus away
from software development in order to write code which requires deep knowledge of the intricate details
for

the GPU har
dware. This lack of separation

of hardware and software programming

leads to
software
products failing to meet their true maximum potential.



Figure

7
: Timeline


5
.e

GPU Program Portability

Another major hurdle which has to be tackled in order for GPU

programming to succeed is program
portability. Currently much of the GPU code lacks portability due to the fact that code for one GPU may
not run as efficiently (or at all) when ported to a non
-
native GPU hardware [13]
. Much of the GPU code
is not even
capable of being efficiently ported to different generations and/or models of the same GPU
brands. This is a devastating setback for GPU program development.


5
.f

Hybrid/Heterogeneous Computing

Lastly
it

will be imperative to understand how we can optimize the communi
cation between the CPU and
GPU.


6
.
Solutions to Hybrid CPU + GPU Computing

After seeing all of the hurdles which we must overcome in order to arrive at the future of CPU/GPU
computing, one m
ight begin to wonder if there is any hope for the future. Fortunately, many people have
put in countless hours to ensure
our computing power continues to keep up with demand.


6
.a Hardware

Physical boundaries are not easy to overcome. If the basic conc
epts of physics say you can’t do
something … then you won’t be able to do it; that is unless you completely alter the understanding of
physics in your pursuit for resources. This is exactly what happened in February of this year (2012) when
researchers at

University of New South Whales, Purdue University and University of Melbourne
constructed the first controllable transistor

(see Figure
8
)

engineered from a single phosphorus atom

[15]
.
One of the most fundamental concepts of quantum physics
(Heisenberg
Uncertainty Principle
), forced
engineers and physicists alike into the belief that a transistor could not be
created

from a single atom.
C
reat
ing this single atom transistor,

researchers at these three university
made great strides in physics and
computing

[34]
.



Figure
8

[15]


Another technology is also making great strides in the computer world; optics. Both IMB and Intel are
investing a great deal of time and money into photon data transfer technologies [17][18]. Both IMB and
Intel see a fut
ure
in which

all copper
circuitry
within circuit board
s

is replaced with optical multimode
waveguides.
They not only
intend

for
this technology to infiltrate chip manufacturing but also hope that
this developing technology allows new methods for transferr
ing data between processors
.
Although most
optical processing devices are still in R&D or demo stage, o
ptical backplanes and
OPST (optical packet
switch and transport) technology has recently started to slowly trickle its way into the marketplace [35].



Figure
9

[18]


Great strides have been made in computing hardware which should easily help CPUs and GPUs
keep
stride

with
global
demand for the next two to three decades.
Not only are the new technologies allowing
chip manufacturers to make hardware
compact with less thermal loads, they also allow data to no
w travel
at orders of magnitude

faster.
This progress allows chip computing power continue to rise as well as help
reduce the CPU to GPU communication speeds and bring transfer rates down to hopef
ully negligible
times.


6
.d Parallel Programming Concepts

Parallel programming has always been a difficult concept for humans to
work
through
. Although the
concept of parallel programming has been around since the 1950’s it didn’t become an everyday
reality in
software development until the 2000’s. Parallel processing really starting gaining momentum in the
software world with the emergence of dual (2005) and quad core CPUs and well as programmable GPUs.

For this reason parallel programming still ha
s a long road to maturity compared to many others areas of
computing. With the emergence of parallel programming, software development and hardware begin to
get heavily intertwined, allowing for little hardware abstraction for the software developer.

Re
cently Consortiums, Government and Universities have worked heavily in attempts to untangle the
spaghetti
strings parallel coding which now entangles itself into everyday programs.



Figure
10
: Current scenario and hopeful future for parallel
programming


Three research projects which attempted to remove the complexity from parallel programing are TripleP,
PPmodel and MARPLE. There are possibly
dozens

of more
research projects which have improved the
progress of parallel programming. These th
ree research project are some of the most recent developments
in parallel processing.

TripleP uses ‘Synthesis at compile time to generate parallel binaries from declarative programs. It
abstracts the execution order of the program away from the developer
and allows for explicitly
parallelism without requiring
architecture

specific annotations and structures
”[22]
. In sort it strives to
determine the best way to generate parallel code.

The PPmodel attempts to help separate out sequential and parallel part
s of programs into blocks
without

having to modify the code [34]. This bit of software excels as identifying parallel hotspots in programs.

Lastly the MARPLE research was carried out in hopes to help businesses to automatically migrate their
legacy soft
ware systems to a data
-
parallel

platform

like Nvidia CUDA GPU [25].


There are massive amounts of great work going on in the world of parallel programming. There are many
years of hard work and new ways of thinking in order to allow parallel programming

to settle

to a
centralized location

out of the realm of software development. Granted the dredges of parallel
programming can never be completely removed from software
and high level programming
. Having said
that, there is still much which can be done
,

to help quicken the pace of software development.
Continued
effort in creating new programming models and structures as well as efficient parallel analyzers are just a
few ways to improve the world of parallel programming.


6
.e CPU versus GPU Battle

Altho
ugh this is one of the hurdles we must overcome in order to progress technology, it is one of the
smaller hurdles to tackle.

Global market demands will encourage the progress of technology as it has in the past. Mergers, such as
AMD and ATI (2006) also
help
alleviate

the bickering. AMD and ATI now work together to make CPUs
and GPUs that efficiently communicate amongst one another.

There are also several partially non
-
biased middle persons getting into the mix of things. Vendors such as
IBM help to facilitate the creation of heterogeneous systems. Government is also getting
involved

in
attempting to
alleviate

tensions

between CPU a
nd GPU makers. DAPRA joined up with Nvidia and Intel
in 2010 to work on an exascale computing project [30]. Just July of this year (2012) Nvidia, Intel, AMD
and Whamcloud started working together with the Departmen
t of Energy at the FastForward e
xascale
computing program [26].


In conclusion, we should never expect or hope for separate companies to play ‘friendly’. There will
always be lawsuits and fighting. The main concern for use is to make sure that their bickering does not
infringe

on the overall p
rogress of computing technology, but instead encourages its growth.


6
.f Complexity of Current GPU program APIs and Architecture

Most of the complexities of current GPU programming revolve around other hurdles such as parallel
programming and GPU program

portability. The GPU programs (APIs) are fairly complex and at times
difficult to understand due to parallel programming. Man
y

of the solutions to easing the process of
parallel programming would also make GPU programming much less daunting. Another do
wnfall to
GPU APIs/Wrappers is that they are not as efficient when brought into higher level languages. Although
Cuda does run
quite efficiently

when
incorporated

into a C++ program
, other programs such as Java and
Fortran haven’t been nearly as successfu
l at achieving maximum efficiency from GPU
s
.


Main solutions to GPU program complexities will be found by conquering hurdles such parallel
programing and GPU program portability. Until then, GPU programming will continue to be a point of
frustration for m
any software developers.




6
.g GPU Program Portability

Another hurdle for heterogeneous computing is the lack of portability of GPU programs betw
een different
GPU make and models
.
Several groups have worked hard to eliminate this road block.

One framework called OpenCL attempts to close this gap. OpenCL was
initially

created by Apple in
2008 and was then handed over to Khronos for further development. Khronos is a consortium of
companies

which was “founded in January 2000 by a number of leading media
-
centric companies
including 3Dlabs, ATI, Discreet, Evans and Sutherland, Intel, NVIDIA, SGI and Sun Microsystems. This
group is dedicated to creating open standard APIs to enable the authorin
g playback of rich media on a
wide variety of platforms and devices” [36].


OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of
CPUs
and
GPUs as well as other
processors
” [37]. It can be implemented on a
number of platforms
including cell phones.
When GPU hardware is not present it also has the capability to fall back on the
CPU [28]. OpenCL also supports synchronization over multiple devices and is fairly easy to learn
com
pared to other GPU frameworks.

OpenCL over several GPU makes such as Nvidia, ATI and Ivy
Bridge.


Another player which recently entered the field back in 2009 was Microsoft. In 2009 Microsoft came out
with
DirectCompute;
an API (Application programming interface) which was also desi
gned to execute
across heterogeneous platforms. It was soon followed up in 2011 by Microsoft’s C++ Amp library which
build onto DirectCompute [38].


One research project called sponge
also looks into making GPU code more portable. Unlike OpenCL and
C
+
+ Amp, it
focuses

mainly on
enabling

CUDA code to be ported to any Nvidia GPU model without
losing code efficiency [13]. It attempts to abstract the hardware details. Sponge was designed to take
care
of

the GPU to CPU and CPU to GPU

communication as well

as determine which parts of
the

code
are better suited for the GPU and which are best processed by the CPU [13]. The results showed that
Sponge’s programming model allowed performance increase of 3.2x the baseline benchmarks.


All in all there has been

tremendous effort put forth in order to ensure program universal compatibility
when it comes to GPU hardware. It would be extremely burdensome to someone who wanted to buy your
software if they had to also purchase the correct GPU hardware to go along wi
th your software.


7
.
Conclusions


7
.a
Future Computing Challenges Tree

The ‘Future Computing Challenges’ Tree graphically shows all hurdles which we have discussed in this
paper. These are the challenges which we must overcome in order to progress the
future of computing
and processing power.


Figure 1
1
: Future Challenges of Processing and Computing Resources


7
.b
Possible Future and Solutions Summary Table


Figure 1
2
: Possible Future Solution Summary Table






7
.c
Breakdown of Parallelism



Figure 1
3
: Sample Model for Parallel Programming




Figure 1
4
: Separation
Between

Various
Types

of Parallel Computing


8
.
Final Summary Conclusion

Heterogeneous computing might be the best path to take to ensure computer processing power continues
to meeting the stringent demands of the global market. The future of research and development in all
fields from Space and science to Medical and Environm
ents depend heavily of the future of processors
capabilities to ensure the future of human civilization. There is an information overload which is already
starting to place a strain on today’s computers [31]. Taking advantage of hybrid systems will allow
s us to
stretch our current comput
ing resources further.



9
.
References

[1]
“Graphics Processing Unit”. Wikipedia. 16 July 2012. Accessed: 20 July 2012.

URL:
http://en.wikipedia.org/wiki/Graphics_processing_unit

[2]
“Texture Mapping”. Wikipedia. 2 July 2
012. Accessed: 20 July 2012

URL:
http://en.wikipedia.org/wiki/Texture_mapping

[3]
Wolfe, Rosalee. “Teaching Texture Mapping Visually”. DePaul University. 30 May 1999. Accessed:
20 July 2012.
URL:
http://www.siggraph.org/education/materials/HyperGraph/mappi
ng/r_wolfe/r_wolfe_mapping_1.ht
m


[4]
Lee, V Kim, C; Chhugani, J; Deisher, M; Kim, D; Nguyen, A; Satish, N; Smelyanskiy, M;
Chennupaty, S; Hammarlund, P; Singhal R, Dubey, P. “Debunking the 100X GPU vs. CPU Myth: An
Evaluation of Throughput Computing on CP
U and GPU. ISCA 2010. AMC 2010. Accessed: 20 July
2012

[5]
Keane, A. “GPUs are Only up to 14 Times Faster than CPUs”. Nvidia. 20 June 2010. Accessed: 20
July 2012.

URL:
http://blogs.nvidia.com/2010/06/gpus
-
are
-
only
-
up
-
to
-
14
-
times
-
faster
-
than
-
cpus
-
says
-
i
ntel/

[6]
“Feasibility of GPU as CPU”. StackOverflow. 26 August 2008. Accessed: 20 July 2012.
URL:
http://stackoverflow.com/questions/28147/feasability
-
of
-
gpu
-
as
-
a
-
cpu

[7]
“Introduction to GPU Computing”. Accelereyes. Accessed: 20 July 2012.
[12]

http://wiki.accelereyes.com/wiki/index.php/Introduction_to_GPU_Computing

[8]
“Choose the Right Threading model (Task
-
Parllel or Data
-
Ralallel Threading”. 26, January 2009.
Accessed: 20 July 2012.

URL:
http://software.intel.com/en
-
us/articles/choose
-
the
-
ri
ght
-
threading
-
model
-
task
-
parallel
-
or
-
data
-
parallel
-
threading/

[9] “The Future of Massively P
aral
lel and GPU Computing”. Nvidia. Accessed: 20 July 2012.

URL:
www
.
greatlakesconsortium
.
org/events/
GPU
Multicore/kirk.pdf

[10]
Blaise, B. “Introduction to Parallel

Computing”. Lawrence Livermore National Laboratory. 16 July
2012. Accessed: 20 July 2012.

URL:
https://computing.llnl.gov/tutorials/parallel_comp/

[11]
Ro, W; Lee, C; Gaudiot
, J. “Cooperative Heterogeneous Computing for Parallel Processing on
CPU/GPU Hybrids”. University of California. Accessed: 20 July 2012.

[13]

Hormati, A; Samadi, M.; Who, M.; Mudge, T.; Mahlke, S. “Sponge: Portable Stream Programming
on Graphics Engines”.

ASPLOS 2011. AMC 2011. Accessed: 20 July 2012.


[14]
Berta, M; Christandl, M.; Colbeck, R; Remes, J; Renner, R. “
The uncertainty principle in the
presence of

quantum memory”. Nature Physics. 25 July 2010. Accessed: 20 July 2012.

URL:
http://www.nature.com/nphys/journal/vaop/ncurrent/full/nphys1734.html

[15]
“Single
-
Atom Trnasistor Is End of Moore’s Law; May Be Beginning of Quantum Computing”.
ScienceDaily. 19 February 2012. Accessed: 20 July 2012.

URL:
http://www.sciencedaily.com/rel
eases/2012/02/120219191244.htm

[16]
“Photon
-
Transistors for the Supercomputers Of The Future”. ScienceDaily. 29 Aug 2007. Accessed:
20 July 2012.

URL:
http://www.sciencedaily.com/releases/2007/08/070826162731.htm

[17]
“Intel Milestone Confirms light Beams

Can Replace Electronic Signals for Future Computers”. Intel.
27 July 2010. Accessed: 20 July 2012.

URL:
http://www.intel.com/pressroom/archive/releases/2010/20100727comp_sm.htm

[18]
Kash, J; Kuchta, D; Doany, F; Schow, C; Libsch, F; Budd, R; Taira, Y; Nak
agawa, S; Offrein, B;
Taubenblatt, M. “Optical PCB Overview”. IBM

Research
. Nov 2009. Accesed: 20 July 2012.

[19] http://news.cnet.com/8301
-
13924_3
-
20112553
-
64/ibm
-
intel
-
group
-
to
-
invest
-
$4.4
-
billion
-
in
-
chip
-
tech/

[20]
Buck, I. “The History of CUDA”. Confe
rence for High Performance Computing. 2008; Accessed: 20
July 2012

URL:
http://www.youtube.com/watch?v=Cmh1EHXjJsk

[21]
Markoff, J. “Faster Chips are Leaving Programmers in Their Dust”. Published 2007. Accessed 20
July 2012

[22]
Zaraket, F; Noureddine, M;

Sabra, M; Jaber, A. “Portable Parallel Programs using Architecture
-
aware Libraries”. SAC 26 March2012. ACM 2012. Accessed: 20 July 2012.

[23]
“Parallel Bars
: Parallel programming, once an obscure niche, is the focus of increasing interest as
“mulicore”

chips proliferate the ordinary PCs
”. Technology Quarterly. Q2 2011. Accessed: 20 July 2012.

URL:
http://www.economist.com/node/18750706

[24]
Jacob, F; Gray, J; Sun, Y; Bangalore, P. “A Platform
-
Independent Tool for Modeling Parallel
Programs”. ACM SE 24

March 2011. ACM 2011. Accessed: 20 July 2012.

[25]
Sarkar, S; Maltouf, M. “Indentifying hotspots in a Program for Data Parallel Architecture”. ISEC
2012. ACM 2012. Accessed: 20 July 2012.

[26]
Bohn, D; “Nvidia and Intel join AMD in Department of Energy’s
FastForward exascale Computing
project”. The Verge. 14 July 2012. Accessed: 20 July 2012.

URL:
http://www.theverge.com/2012/7/14/3157985/nvidia
-
intel
-
amd
-
department
-
of
-
energy
-
fastforward

[27]
Enderle, R; “How Nvidia’s Kepler

chips could end PCs and tablets as we know them”. Digital
Trends. 19 May 2012. Accessed: 20 July 2012.

URL:
http://www.digitaltrends.com/computing/how
-
nvidias
-
kepler
-
chips
-
could
-
end
-
pcs
-
and
-
tablets
-
as
-
we
-
know
-
them/

[28]
Ghorpade, J; Parande, J; Kulkarni,
M; Bawaskar, A. “GPGPU Processing in CUDA Architecture”.
ACIJ. January 2012.
Accessed: 20 July 2012

[29] p91
-
song.pdf

[30]
Montlbano, E. “DARPA Taps Intel, Nvidia For Extreme Scale Computing”. InfomrationWeek
Government. 11 August 2010. Accessed: 20 July 2
012.

URL:
http://www.informationweek.com/news/government/enterprise
-
architecture/226700040

[31]
“Data, Data Everywhere”. The Economist. 25 Feb 2010. Accessed: 20 July 2012.

URL:
http://www.economist.com/node/15557443

[32]
Kurzweil, R; “The Human Machine Merger: Why We Will Spend Most of Our Time in Virtual
Reality in the Twenty
-
First Century”. Kurzweil. 28 August 2001. Accessed: 20 July 2012.

URL:
http://www.kurzweilai.net/the
-
human
-
machine
-
merger
-
why
-
we
-
will
-
spend
-
most
-
of
-
o
ur
-
time
-
in
-
virtual
-
reality
-
in
-
the
-
twenty
-
first
-
century

[33]
“Transistor”. TechTerms. 7 October 2011. Accessed: 20 July 2012.

URL:
http://www.techterms.com/definition/transistor

[34]

“Uncertainty Principle”. Wikipedia. Accessed: 20 July 2012

URL:

http://en.
wikipedia.org/wiki/Uncertainty_principle

[35]
Staff, L. “Intune Platform Built on ATCA Technology from Schroff”. Lightwave. 26 June 2012.
Accessed: 20 July 2012.

URL:
http://www.lightwaveonline.com/articles/2012/06/intune
-
platform
-
built
-
on
-
atca
-
technology
-
f
rom
-
schroff.html

[36]
“About the Khronos Group”. Khronos. Accessed: 20 July 2012.

URL:
http://www.khronos.org/about/

[37]
“OpenCL”. Wikipedia. 19 July 2012. Accessed: 20 July 2012.

URL:
http://en.wikipedia.org/wiki/OpenCL

[38]
“C++ Amp”. Accessed: 20 July
2012.

URL:
http://msdn.microsoft.com/en
-
us/library/hh265137(v=vs.110).aspx



Appendix A
:
Separation between Hardware programmer, Program Language Developers and High
Level Language Programmers (software developers)
. See Video: Side Note

Appendix B:
data
-
p
arallel versus task
-
parallel

See the background section of the Videos

Appendix BC
: API defined


Glossary

Transistor


“a basic electrical component that alters the flow of electrical current. Transistors are
building blocks of integrated circuits, such a
s CPUs (and GPUs)”[33]. Transistors hold

(store)

the binary
data (1’s and 0’s).