Introduction to Grid and Cloud Computing

bootlessbwakInternet and Web Development

Nov 12, 2013 (3 years and 9 months ago)

60 views

10
-


1


The Term Project demands in
-
depth research and investigated
reporting. All reported contents, figures,

and tables must be originally generated .



Ten topics are for students to choose from, different topics

for multiple disjoint groups of students to work on.


You have only 1 month to report the work first through a proposal and
then a complete written report at the end of the semester and present
it.


The proposal which will become your report at the end should follow
the IEEE Conference paper format of about 10 pages, including
original figure illustrations and tabulations plus a reference listing of
10
-

15 papers. (Template link:
http://www.ieee.org/conferences_events/conferences/publishing/templ
ates.html)

Introduction to Grid and Cloud
Computing

Term Project Specification:

10
-

2

How to Write a Good Technical Paper on 10 pages ?

1.

Title

( < 8 words) must hit the hot topic
-

short, clear and eye
-
catching, Authors


and Affiliations (in 1
-
2 lines after the title)

2.

Abstract
(< 50 ~ 100 words)

must state the research objectives, summarize the


findings, and highlight the innovative contributions.

3.

Introduction
(including the title, abstract) on
1 page

must motivate the readers to


read the rest of the paper and prepare them with the necessary background

4.

Problem Statement and Formulation
(
2 pages
) of the problem being solved,


basic assumptions, formulate the problem with technical specifications

5.

Architecture, algorithms, solution methods, protocols, analytical


results and illustrated example,

etc. (2 pages)

6.

Experimental setting

(computer simulators, benchmarks, and datasets used (1
page)

7.

Experimental Results

in plotted figures or tabulations plus their interpretations
and
performance analysis

( 2 pages)

8.

Related Work and Conclusions

(1 page)

9.

References



List of 15 relevant papers (1 page)


10
-


3


Topic


Project Title


Assignments

1

Use of XEN to create virtual machines, conduct some

VM experiments and report performance measured

2

Exploring Amazon EC2, S3, or MapReduce, or virtual
cluster, or private cloud for HPC scientific applications

3

Parallelization of a novel application idea using MPI or
OpenMP, analyze the performance improvements.

4

Using Hadoop or node.js for a distributed Web
Application

5

Integration of Globus Online by using CLI or REST API
for an application that needs data transfer capabilities



Candidate Project Topics

:



10
-

4


Candidate Project Topics

:



Topic No.

Topic Title

Assignment

6

Stork


Globus Online Comparison through different
metrics

7

Application of a scientific problem with a workflow in
Condor scheduler

8

Development of a client/server application that does
performance improvements on a high
-
performance data
transfer protocol (GridFTP, UDT)

9

MPI
-

Hadoop Comparison

10

A survey on Parallel File System Comparison

10
-

5

Topic 1:
Use of XEN for virtual machine (VM)
creation and resource management through
some VM application experiments




You are asked to port the XEN hypervisor on a local


computer or on your own notebook.



Create the Domain 0 (control VM) and some User Domains


(VM applications) for some selected benchmarks



Collect the performance results. Discuss lessons learned


from the XEN application experiments.


10
-

6

Prof. Kai Hwang

Suggested References for Topic 1:



1.
M. Rosenblum, “Recent Advances in Virtual Machines and
Operating Systems”, Keynote Address,
ACM ASPLOS

2006

2.
J. Smith and R. Nair,
Virtual Machines: Versatile Platforms for
Systems and Processes,
Morgan Kaufmann, 2005

3.
B. Sotomayor, R. Montero, and I. Foster, “Virtual Infrastructure
Management in Private and Hybrid Clouds”,
IEEE Internet
Computing
, Sept. 2009.

4.
P. Barham, et al, “XEN and the Art of Virtualization”,
Proc.of the 9th
ACM Symp. on OS Principles

(SOSP19), ACM Press, 2003

5.
A. Menon, et al, “Diagnosing Performance Overheads in the XEN
Virtual Machine Environment”,
Proc. of the 1st Int’l Conf. on Virtual
Execution Environments.

2005

10
-

7

Topic 2:
Exploring the use of Amazon EC2, S3, MapReduce, or virtual
cluster, or private cloud in HPC scientific applications


This project requires to use available AWS virtual clusters (EC2, S3
instances), or the MapReduce Cluster, or the private cloud offered
on the AWS platform. A cluster of 64 to 120 nodes are desired


You need to perform some benchmark experiments on these VM
clusters. You need to measure the performance and analyze the
performance attributes and identify performance bottlenecks.



Select some well
-
known high
-
performance scientific benchmark
programs to carry out your experiments or write your own testing
program such as for large
-
scale matrix multiplication

10
-

8

Key References for Topic 2 :

1.
K. Hwang, G. Fox and J. Dongarra,
Distributed and Cloud
Computing
, Chapters 2, 4, 6, Morgan Kaufmann, Oct. 2011.

2.
K. Hwang and Z. Xu:
Scalable

Parallel Computing,
McGraw
-
Hill, Chapter 2 and 12, 1998

3.
E. Walker, “Benchmarking Amazon EC2 for High
-
Performance
Scientific Computing,”
login
, vol. 33, no. 5, pp. 18

23, 2008.

4.
D. Kirk and W. Hwu
, Programming Massively Parallel
Processors: A Hands
-
on Approach
, Morgan Kaufmann, 2010.

10
-

9

Topic 3:
Parallelization of a novel application idea using MPI
or OpenMP, analyze the performance improvements




You are asked to find a computationally intensive application and
parallelize it by using MPI or OpenMP.


Conduct a thorough performance analysis test using multiple
machines (multi core computer in absence of multiple machines)


Test your code by running it on an SMP(A single computer with mult
-
cores) and DSM(Multiple computers connected via LAN)
environment


Prepare different test case by differentiating machine architecture,
problem size, etc.

10
-

10

Topic 4:
Using Hadoop or node.js for a distributed Web
Application




Design a web application that serves thousands of users


Each user asks for a computationally intensive service.


Distribute the load of the service given by the application to multiple
machines at the back end by using technologies like Hadoop or
node.js.


Analyze the performance of your application with the increasing
number of users

10
-

11

Topic 5:
Integration of Globus Online by using CLI or REST
API for an application that needs data transfer capabilities





You are asked to design or use an existing application that needs
transfer capabilities


Your application will integrate Globus Online as the data transfer
capability and provide monitoring of the jobs as well.


The CLI could be used in a complex job that needs data transfers
between nodes before starting execution


The REST API could be used for any type of application.



10
-

12

Topic 6:
Stork


Globus Online Comparison through different
metrics






You are asked to install two GridFTP servers in two machines and
integrate these with Globus Online


Then install the Stork scheduler in one of the machines


Design data transfer test cases and make a full comparison of the
two tools.


Some of the performance metrics could be dataset characteristics,
ease of use(Stork doesnot have an interface so compare it with GO
CLI), individual transfer speed, throughput.


Use Stork features like concurrent transfers, optimization.

10
-

13

Topic 7:
Application of a scientific problem with a workflow in
Condor scheduler







Find a scientific problem that requires complex computational and
data transfer needs.


Design a workflow for the solution of the problem


Apply the workflow by using the Condor scheduler

10
-

14

Topic 8:
Development of a client/server application that does
performance improvements on a high
-
performance data
transfer protocol (GridFTP, UDT)







By using GridFTP or UDT APIs, design a client/server model that
does optimization to the data transfers


Ex: For UDT: Use the same connection for multiple file transfers,
apply a threaded server/client model to do concurrent file transfers
for multiple sockets


Ex: For GridFTP: Use the java or C APIs to dynamically change the
parallel stream numbers or concorrency numbers for a directory
transfer


Test your implementation to see any improvements.

10
-

15

Topic 9:
MPI
-
Hadoop Comparison







Find an application of algorithm that can be parallelized but does not
need any communications in between the parallel processes


Implement it using Hadoop and MPI


Compare their performances

10
-

16

Topic 10:
A survey Report on Parallel and Distributed File
Systems







You are asked to write an extensive report on popular currently
available parallel and distributed file systems (GPFS, Lustre, HDFS,
PVFS, WheelFS, GFS, AFS)


Research performance comparison metrics for these file system


Open source file systems could be installed and by using
performance benchmarking tools , conduct test cases where you
measure the read/write speeds


Write a paper presenting a multdimensional comparison study and
provide test case results with selected sample file systems