Partial list of References for Parallel Programming

tangibleassistantΛογισμικό & κατασκευή λογ/κού

3 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

66 εμφανίσεις

May 2, 2011


P a g e

|
1

Partial list of References for Parallel Programming

(
Callele, Neufeld et al.
;
Foster
;
Harris
;
http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
;
Fortune and Wyllie 1978
;
Landau 1986
;
Gustafson 1988
;
Blelloch 1990
;
Blelloch 1993
;
Shi 1996
;
Campbell 1997
;
Xavier and Iye
ngar 1998
;
Pollack 1999
;
Cormen, Leiserson et al. 2001
;
Kumar 2002
;
Wired.com 2002
;
Grama, Karypis et al. 2003
;
Harris 2003
;
Macedonia 2003
;
Ranjan and Pontelli 2003
;
Wilkinson and Allen 2004
;
Mattson, Sanders et al. 2005
;
Sutter 2005
;
Harish and Narayanan 2007
;
Harish
and Narayanan 2007
;
Harris, Sengupta et al. 2007
;
Marowka 2007
;
Roger, Assarsson et al. 2007
;
Sengupta, Harris et a
l. 2007
;
Boyer, Skadron et al. 2008
;
Dotsenko, Govindaraju et al. 2008
;
Harris 2008
;
Lindholm, Nickolls et al. 2008
;
Nickolls, Buck et al. 2008
;
Ross 2008
;
Ryoo, Rodrigues et al. 2008
;
Sengupta, Harris et al. 2008
;
Stratton, Stone et al. 2008
;
Vineet and Narayanan 2008
;
Casanova, Legrand
et al. 2009
;
Harish, Vineet et al. 2009
;
Harris 2009
;
Larus 2009
;
Lin and Snyder 2009
;
Martín, Torres et al.
2009
;
NVIDIA 2009
;
NVIDIA 2009
;
Rehman, Kothapalli et al. 2009
;
Sun and Ma 2009
;
Cecilia, García et al.
2010
;
Collange, Daumas et al. 2010
;
Hachman 2010
;
Kirk and Wen
-
mei 2010
;
Nickolls and Dally 2010
;
NVIDIA 2010
;
Sanders and Kandrot 2010
;
Shilov 2010
;
Sodan, Machina et al. 2010
;
Vishkin 2010
;
Wayner
2010
;
Wikipedia.org 2010
;
Wikipedia.org 2010
;
Wong, Papadopoulou et al. 2010
)


Blelloch, G. (1990). Prefix Sums and Their Applications.
COMPUTER SCIENCE TECHNICAL REPORT
COLLECTION
. Pittsburgh PA 15213
-
3890, Carnegie Mellon University
:
25.

Blelloch, G. (1993). Prefix sums and their applications, Chapter 1 in Synthesis of Parallel Algorithms by JH
Reif, Morgan Kaufmann Publishers Inc., San Mateo, California.

Boyer, M., K. Skadron, et al. (2008). Automated Dynamic Analysis of CUDA Programs
.
2008 International
Symposium on Code Generation and Optimization (CGO), Third Workshop on Software Tools for
MultiCore Systems
. Boston, MA.

Callele, D., E. Neufeld, et al. "Sorting on a GPU."

Campbell, D. K. G. (1997). A survey of models of parallel comp
utation, University of York.

Casanova, H., A. Legrand, et al. (2009).
Parallel Algorithms
.

Cecilia, J., J. García, et al. (2010).
Implementing P Systems Parallelism by Means of GPUs
. 10th
International Workshop Membrane Computing, Curtea de Arges, Romania,

Springer.

Collange, S., M. Daumas, et al. (2010).
Barra: A Parallel Functional Simulator for GPGPU
. 18th Annual
IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and
Telecommunication Systems, IEEE.

Cormen, T., C. Leiserson
, et al. (2001).
Introduction to Algorithms
, MIT Press

McGraw
-
Hill Book Company.

Dotsenko, Y., N. Govindaraju, et al. (2008).
Fast scan algorithms on graphics processors
, ACM.

Fortune, S. and J. Wyllie (1978).
Parallelism in random access machines
. STOC '7
8 Proceedings of the
tenth annual ACM symposium on Theory of computing, Association for Computing Machinery,
Inc, One Astor Plaza, 1515 Broadway, New York, NY, 10036
-
5701, USA.

Foster, I. "Designing and Building Parallel Programs." from
http://www.mcs.anl.gov/~itf/dbpp/
.

Grama, A., G. Karypis, et al. (2003).
Introduction to Parallel Computing
. Essex, England, Pearson
Education Limited.

Gustafson, J. (1988). "Reevaluating Amdahl's law."
Communications of the
ACM

31
(5): 532
-
533.

May 2, 2011


P a g e

|
2

Hachman, M. (2010). "Nvidia's CUDA Technology Not Just for Geeks." from
http://www.pcmag.com/article2/0,2817,2358831,00.asp
.

Harish, P. and P. Narayanan (2007).
Accelerating large graph algorithms on the GPU using CUDA
. High
Performance Computing


HiPC 2007, Goa, India, Springer Berlin / Heidelberg.

Harish, P. and P. Narayanan (2007). "Accelerating large graph algorithms on the GPU using CUDA."
High
Performance C
omputing

HiPC 2007
: 197
-
208.

Harish, P., V. Vineet, et al. (2009). Large Graph Algorithms for Massively Multithreaded Architectures.
Hyderabad, INDIA, Large Graph Algorithms for Massively Multithreaded Architectures.

Harris, M. Optimizing Parallel Reductii
on iin CUDA, NVIDIA Developer Technology.

Harris, M. (2003). Real
-
Time Cloud Simulation and Rendering, University of North Carolina.

Harris, M. (2008). Optimizing parallel reduction in CUDA.

Harris, M. (2009). Parallel prefix sum (scan) with CUDA, NVIDIA.

Harris, M., S. Sengupta, et al. (2007). "Parallel prefix sum (scan) with CUDA."
GPU Gems

3
(39): 851

876.

http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
. "Larrabee." from
http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
.

Kirk, D. and W. Wen
-
mei (2010).
Programming massively parallel processors: A Hands
-
on approach
.
Burlington, MA, Morgan Kaufm
ann Publishers.

Kumar, V. (2002).
Introduction to parallel computing
, Addison
-
Wesley Longman Publishing Co., Inc.
Boston, MA, USA.

Landau, G. (1986). Efficient parallel and serial approximate string matching, Citeseer.

Larus, J. (2009). "Spending Moore's dividend."
Communications of the ACM

52
(5): 62
-
69.

Lin, Y. and L. Snyder (2009).
Principles of parallel programming
. Boston, Mass, Pearson/Addison Wesley.

Lindholm, E., J. Nickolls, et al. (2008). "NVIDIA Tesla: A Unifie
d Graphics and Computing Architecture."
IEEE Micro
: 39
-
55.

Macedonia, M. (2003). "The GPU Enters Computing's Mainstream."
COMPUTER
: 106
-
108.

Marowka, A. (2007). "Parallel computing on any desktop."
Communications of the ACM

50
(9): 74
-
78.

Martín, P., R. Tor
res, et al. (2009).
CUDA Solutions for the SSSP Problem
, Springer
-
Verlag.

Mattson, T., B. Sanders, et al. (2005).
Patterns for parallel programming
, Addison
-
Wesley Professional.

Nickolls, J., I. Buck, et al. (2008). "Scalable parallel programming with CUDA
."
Queue

6
(2): 40
-
53.

Nickolls, J. and W. Dally (2010). "The GPU Computing Era."
IEEE Micro

30
(2): 56
-
69.

NVIDIA (2009). NVIDIA’s Next Generation CUDATM Compute Architecture: Fermi.
Whitepaper
, NVIDIA.

NVIDIA (2009).
Parallel Thread Execution, ISA Version
1.4
. Santa Clara, CA 95050.

NVIDIA (2010).

NVIDIA CUDA™ Programming Guide Version 3.0
.

Pollack, F. J. (1999). New microarchitecture challenges in the coming generations of CMOS process
technologies (keynote address)(abstract only).
Proceedings of the 32nd
annual ACM/IEEE
international symposium on Microarchitecture
. Haifa, Israel, IEEE Computer Society
:
2.

Ranjan, D. and E. Pontelli (2003). "The level
-
ancestor problem on pure pointer machines."
Inf. Process.
Lett.

85
(5): 275
-
283.

Rehman, M., K. Kothapalli,
et al. (2009).
Fast and scalable list ranking on the GPU
, ACM.

Roger, D., U. Assarsson, et al. (2007). Efficient Stream Reduction on the GPU.
Workshop on General
Purpose Processing on Graphics Processing Units
. Boston : United States.

Ross, P. (2008). "Why cpu frequency stalled."
Spectrum, IEEE

45
(4): 72.

Ryoo, S., C. Rodrigues, et al. (2008).
Optimization principles and application performance evaluation of a
multithreaded GPU using CUDA
, ACM.

Sanders, J. and E. Kandrot (2010).
CUDA by

Example: An Introduction to General
-
Purpose GPU
Programming
. Upper Saddle River, NJ, Addison
-
Wesley.

Sengupta, S., M. Harris, et al. (2008). Efficient parallel scan algorithms for GPUs.
NVIDIA Technical Report
NVR
-
2008
-
003, Dec. 2008.
, NVIDIA.

May 2, 2011


P a g e

|
3

Sengupta, S
., M. Harris, et al. (2007).
Scan primitives for GPU computing
, Eurographics Association.

Shi, Y. (1996). Reevaluating amdahl’s law and gustafson’s law. Philadelphia, PA, Temple University.

Shilov, A. (2010). "Nvidia Unveils Three Years Roadmap: Kepler and Maxwell Architectures Incoming."
from
http://www.xbitlabs.com/news/video/display/20100921165317_Nvidia_Unveils_Three_Years_R
oadmap_Kepler_and_Maxwell_Architectures_Incoming.html
.

Sodan, A., J. Machina, et al. (2010). "Parallelism via Multithreaded and Multicore CPUs."
COMPUTER

PP
(99): 1
-
1
.

Stratton, J., S. Stone, et al. (2008). "MCUDA: An efficient implementation of CUDA kernels for multi
-
core
CPUs."
Languages and Compilers for Parallel Computing
: 16
-
30.

Sun, W. and Z. Ma (2009).
Count Sort for GPU Computing
. 2009 15th International Confer
ence on
Parallel and Distributed Systems, IEEE Computer Society.

Sutter, H. (2005). "The free lunch is over: A fundamental turn toward concurrency in software."
Dr.
Dobb’s Journal

30
(3): 292

210.

Vineet, V. and P. Narayanan (2008). "CUDA cuts: Fast graph c
uts on the GPU."

Vishkin, U. (2010). Thinking in parallel: Some basic data
-
parallel algorithms and techniques.

Wayner, P. (2010). "7 programming languages on the rise." Retrieved November 8, 2010, 2010, from
http://www.infoworld.com/d/developer
-
world/7
-
programming
-
languages
-
the
-
rise
-
620
.

Wikipedia.org. (2010). "Asymmetric multiprocessing." Retrieved May 15, 2010, 2010, from
http://en.wikipedia.org/wiki/Asymmetric_multiprocessing
.

Wikipedia.org. (2010). "Tianhe
-
I." Retrieved November 8, 2010, 2010, from
http://en.wikipedia.org/wiki/Tianhe
-
I
.

Wilkinson, B. and M. Allen (2004).
Parallel Programming: Techniques and Applications Using Networked
Workstations and Parallel Computers (2nd Edition)
, Prentice
-
Hall, Inc.

Wired.com. (2002). "Nvidia: Meet Nvidia CEO J
en
-
Hsun Huang, the man who plans to make the CPU
obsolete." Retrieved May 8, 2010, 2010, from
http://www.wired.com/wired/archive/10.07/Nvidia.html?pg=1&topic=&topic_
set=
.

Wong, H., M. Papadopoulou, et al. (2010).
Demystifying GPU microarchitecture through
microbenchmarking
. IEEE International Symposium on Performance Analysis of Systems and
Software (ISPASS), 2010, IEEE.

Xavier, C. and S. Iyengar (1998).
Introducti
on to parallel algorithms
, Wiley
-
Interscience.