Parallel Computer Architecture and Programming

perchorangeΛογισμικό & κατασκευή λογ/κού

1 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

75 εμφανίσεις


An initiative of the Department of Science and Technology

Managed by the Meraka Institute of the CSIR and the University of Cape Town

1
5 Lower Hope Road
∙ Rosebank ∙ 7700 ∙ South Africa

Tel: +27 21 658 2740 ∙ Direct: +27 21 658 2756


Advanced Computer Engineering Laboratory



www.chpc.ac.za


Parallel Computer Architecture and Programming


Da
tes:


7
-
11
th

September, 2009

Venue:


CHPC, 15 Lower Hope St, Rosebank, 7700.



Course Description

The principles and tradeoffs in the design and programming of parallel computers. Topics include
the varieties of parallelism in current hardware (e.g., fast
networks, multicore, accelerators such as
GPUs), the importance of locality, implicit vs. explicit parallelism, shared memory, cache
-
coherence, synchronization mechanisms (locking, atomicity, transactions, barriers), and parallel
programming models (thread
s, data parallel/streaming, transactions, and nested parallelism).
Recent ressearch results from the Pervasive Parallelism Lab will also be presented. A significant
parallel programming assignment will be given as homework.

The course will include practi
cal tasks, and a larger project to be completed after the course.
Tutoring will be provided by the CHPC ACELab staff.


Course Prerequisites

Working knowledge of C/C++

Suggested Text

John L. Hennessy and David A. Patterson

Computer Architecture: A Quantita
tive Approach, 4th
Edition

Morgan
-
Kaufmann


Research papers


Instructor

Kunle Olukotun

Professor

Department Electrical Engineering and Computer Science

Stanford University

Director

Pervasive Parallelism Laboratory

Contact Information

Office: Gates Hall 3
A, Room 302

Phone: (650) 725
-
3713

Fax: (650) 725
-
6949

Email: kunle@stanford.edu


An initiative of the Department of Science and Technology

Managed by the Meraka Institute of the CSIR and the University of Cape Town


Address:



Department of Electrical Engineering


Stanford University


Gates Hall 3A, Room 302


Stanford, CA 94305
-
9030 USA


Assistant:



Darlene Hadding



Administrative Associate for Professor Kunle Olukotun


Gates 4A
-
408, M/C 9040


Phone: (650) 723
-
1430


Fax: (650) 725
-
6949


darlene@csl.stanford.edu


Short Bio

Kunle Olukotun is a Professor of Electrical Engineering and Computer Science a
t Stanford
University where he has been on the faculty since 1992. Olukotun has been a researcher in and
proponent of chip multiprocessor technology since the mid 1990's. Olukotun is well known for
leading the Stanford Hydra research project which develope
d one of the first chip multiprocessors
with support for thread
-
level speculation (TLS). Olukotun founded Afara Websystems to develop
high
-
throughput, low power server systems with chip multiprocessor technology. Afara was
acquired by Sun Microsystems; the

Afara microprocessor technology, called Niagara, is the basis
of systems that have become one of Sun's fastest ramping products ever. Olukotun is actively
involved in research in computer architecture, parallel programming environments and scalable
paral
lel systems. Olukotun currently directs the Pervasive Parallelism Lab (PPL) which seeks to
proliferate the use of parallelism in all application areas.

Olukotun is an ACM Fellow and IEEE Fellow. He has authored many papers on CMP design and
parallel softw
are and recently completed a book on CMP architecture. Olukotun received his Ph.D.
in Computer Engineering from The University of Michigan.


An initiative of the Department of Science and Technology

Managed by the Meraka Institute of the CSIR and the University of Cape Town

Tentat
ive Course Schedule

Date

Lecture

Subject

Reading

Sept 7 AM

1

Introduction, course overview

4.1, [1][6]

Sept
7 PM

2

Parallel Programming


Sept 8 AM

3

Parallel Algorithms


Sept 8 PM

4

Performance Evaluation

[2][3]

Sept 9 AM

5

Symmetric Shared Memory I

4.2, 4.3

Sept 9 PM

6

Symmetric Shared Memory II


Sept 10 AM

7

Synchronization and Consistency

4.5, 4.6, [4]
[5]

Sept 10 PM

8

CMPs, GPUs

[8][9], 4.8

Sept 11 AM

9

Beyond Shared Memory

[10] [11][13]

Sept 11 PM

10

Pervasive Paralellism Lab (PPL)



Research Papers

Parallel Applications

[1]

Sutter, H. and Larus, J. 2005. Software and the concurrency revolution.
ACM Q
ueue
, vol. 3, no.
7, September 2005.

[2]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH
-
2 Programs:
Characterization and Methodological Considerations,”
Proc. 22nd International Symposium on
Computer Architecture
, Santa Margherita Ligur
e, Italy, June 1995.

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization
and Architectural Implications.
Proc. 17th International Conference on Parallel Architectures and
Compilation Techniques
, October 2008.


Locking
and Memory Consistency

[4]

S. V. Adve and K. Gharachorloo, “Shared memory consistency models: a tutorial,”
IEEE
Computer
, vol. 29, no. 12, pp. 66

76, Dec. 1996.

[5]

M. D. Hill, “Multiprocessors should support simple memory consistency models,”
IEEE
Computer
, vol.

31, no. 8, pp. 28

34, Aug. 1998.


Chip
-
Multiprocessors (CMPs)

[6]

K. Olukotun and L. Hammond, “The future of microprocessors,”
ACM Queue
, vol. 3, no. 7, pp.
26

34, September 2005.


An initiative of the Department of Science and Technology

Managed by the Meraka Institute of the CSIR and the University of Cape Town

[7]

L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. S
mith, R.
Stets, and B. Verghese, “Piranha: A Scalable Architecture Based on Single
-
Chip Multiprocessing,”
Proc. 27
th

Annual International Symposium on Computer Architecture (ISCA'00)
, Vancouver, British
Columbia, Canada, pp. 282

293, June 2000.

[8]

K. Olukotun
, L. Hammond and J. Laudon,
Chapter 2, Chip Multiprocessor Architecture:
Techniques to Improve Throughput and Latency
, Morgan Claypool 2007.


Thread
-
Level Speculation (TLS)

[9]

M. J. Garzaran, M. Prvulovic, J. M. Llaberia, V. Vinals, L. Rauchwerger, and J. Tor
rellas, "
Tradeoffs in buffering memory state for thread
-
level speculation in multiprocessors,"
Proc.

9th
International Symposium on High
-
Performance Computer Architecture (HPCA
), February 2003.

[10]

L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Ol
ukotun, "The Stanford Hydra
CMP," IEEE MICRO, vol. 20, no. 2, pp. 71

83 March
-
April 2000.


Transactional Memory

[11]

B. D. Carlstrom, J. Chung, A. McDonald, H. Chafi, C. Kozyrakis, and K. Olukotun, “The
Atomos transactional programming language,” Proc. ACM SIGP
LAN 2006 Conference on
Programming Language Design and Implementation, Ottawa, Canada, June 10

16 2006.

[12]

A. McDonald, J. Chung, H. Chafi, C. Cao Minh, B. D. Carlstrom, C. Kozyrakis, and K.
Olukotun, “Characterization of TCC on chip
-
multiprocessors,”
Proc.
14th International Conference on
Parallel Architecture and Compilation Techniques (PACT 2005)
, St. Louis, MO, Sept. 17

21 2005.

[13]

K. Moore, J. Bobba, M. Moravan, M. Hill, and D. Wood, “LogTM: log
-
based transactional
memory,”
International Symposium on High P
erformance Computer Architecture (HPCA)
, February
2006.

[14]

Saha, B., Adl
-
Tabatabai, A., Hudson, R., Cao Minh, C., Hertzberg, B. McRT
-
STM: A high
-
performance software transactional memory system for a multicore runtime.
Proc. Symposium on
Principles and Practi
ce of Parallel Programming
, June 2006.