IRP Presentation - Senior Design

clumpfrustratedΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

95 εμφανίσεις

Group May 09
-
06

Bryan McCoy

Kinit

Patel

Tyson Williams

Advisor/Client: Zhao Zhang

What is Bioinformatics?


Genetic sequencing


Massive amounts of data


Many simple operations


Perfect for distributed
computing

Problem


Current solutions are not realistically feasible


Too expensive


Super computers


High powered servers


Too slow


Some inputs can takes several days



Need for high speed, low cost solutions

Our Solution


Cell Processor


Based on Phase 1


Cluster of PlayStation 3s


MPI


Message Passing Interface

IBM Cell Broadband Engine


1 Power Processing Element (PPE)


8 Synergistic Processing Elements (SPEs)


Only 6 SPEs are accessible on a PlayStation 3


4 high speed rings for processor communication

DNAPenny


Compares DNA strands
from different species


Score indicates
evolution similarities
between two species


Branch and bound
search algorithm

Functional requirements


FR1. Ported applications shall run on
the Cell B.E.


FR2. The results returned shall be the
same as the original program.


FR3. The applications shall return their
runtime.


FR4. The applications shall execute in
parallel on multiple Cell B.E.s.

Non
-
Functional Requirements


NF1. The Cells shall all run on the Linux
OS.


NF2. The resulting runtimes of the
ported applications shall be faster than
on the original applications.


NF3. The ported application shall be
coded in the C language.

Market Survey


Results of the survey point to a huge speed
up of computationally intensive programs.


Dr.
Gaurav

Khanna

at the University of
Massachusetts Dartmouth used cluster of 8
PS3s to replace a supercomputer.


Universitat

Pompeu
Fabra
, in Barcelona,
deployed in 2007 a BOINC system called
PS3GRID for collaborative biological
computing.


Risk Assessment


Slow network speed


Software support


Limited RAM


Hardware Failure


Lower quality entertainment hardware


Limited prior experience


Software development schedule

Resource Requirements


3 PlayStation 3s


High performance network switch


Cell programming books


Front node (desktop computer)


Time

Software Environment


Use Fedora 9 OS as it
is currently supported
by the Cell SDK 3.1.


Uses the command
line for user interface.


Use the IBM XLC
compiler and/or the
current GCC compiler.


Hardware Environment


3 PlayStation 3s


High speed Crossbar switch


Private network


Front Node (desktop computer)


Proxy server


Network File Store (NFS)

I/O


Input


Inputs are DNA
sequences stored in a
text file.


Text is a
CustalW

alignment organized
in
Phylip

format, a
standard format for
biological
applications.


Output


Outputs are


The parsimony score


The best trees


The execution time


The score and best trees
are output to the screen
and to text files.


The execution time is
output to a CSV (Comma
Separated Value) file.

Work Breakdown
Structure

Port Apps to
Cluster PS3s

Problem Definition

Research Cell/B.E

Research
Bioperf

Suite

Research
Distributed Parallel
Algorithms

Research
Previously Done
Work

End Product Design

Design
Requirements

Design Process

Design Documents

Considerations and
Selections

Decide Which Linux
to Install

Decide which
applications to port

End Product
Implementation

Hardware
Implementation

Prototyping
Implementation

Software
Implementation

End Product
Testing


Ensure
Correctness of
Output Results

Benchmarking

Final
Documentation and
Demonstration

Create Final Report

Create Project
Poster

Prepare for
Presentation

Work Schedule


Gant chart


Deliverables


Source Code


Compiled Executable


Runtime Comparisons


Final Report


Poster


Final Presentation

Costs


Time


Approximately 555
man hours total.


Freely donated.


Total cost $0.


Equipment


3 PlayStation 3s


Provided by client


Crossbar router


Provided by client


Standard desktop computer


Provided by department


Total cost $0.

Development: Initial Overview


Use MPI to distribute the program to the
multiple PlayStations.


Each PlayStation would search one
branch of the tree.


1 function (supplement) took 90% of the
runtime


Phase 1 ported this function to the SPEs

Development: Difficulties


Found a bug in supplement.


The bug
did not

affect results but
did

affect
runtime.


We contacted the original developer, Dr.
Felsenstein

at the University of Washington,
who fixed the bug.


The fix
significantly improved

runtime.


However, the fix
negated

all work done by
Phase 1 as supplement no longer took a
significant amount of runtime.

Development: Reworking


After the bug fix, no
single function took a
significant amount of
runtime.


We decided to distribute
branches of the tree
search to different
processors.

Development: Results


Completed our goals



Divided work among 3 PlayStation 3s.


Produced faster code that comparable
sequential environment.


Due to time constraints, we were not
able to port the code to the SPEs.


Testing


Used script to test multiple inputs.


Averaged the runtimes.


Used several different code revisions
and machines to provide comparisons.


Projected the speedup that could be
attained if code was ported to SPEs.

Results: Actual


Our current code is
20.76 times faster
than the it was at the
beginning of the
semester.


Surpassed our
original projections,
which assumed the
use of the SPEs.




Code revision

Runtime
(sec)

X Speedup

X

Speedup

(compared
to desktop)

#

of
available
cores

Original
(Core2)

1861.66

0.116

1

With

Bug Fixes
(Core 2)

218.8

1

1

Original

(1 PPE)

4953.51

1

0.044

1

With

Bug Fixes
(1 PPE)

662.70

7.47

0.330

1

MPI with Bug
Fixes (3 PPEs)

238.57


20.76

0.917

3

MPI with Bug

Fixes
(3 PPEs,
18 SPEs)

(Projected)

34.08

145.35

6.420

21

Original
Projections

334.82

14.79

0.653

21

Results: MPI


The speedup for
MPI was 2.78.


Excellent speedup
for 3 nodes.




Code revision

Runtime
(sec)

X Speedup

X

Speedup

(compared
to desktop)

#

of
available
cores

Original
(Core2)

1861.66

0.116

1

With

Bug Fixes
(Core 2)

218.8

1

1

Original

(1 PPE)

4953.51

1

0.044

1

With

Bug
Fixes (1 PPE)

662.70

7.47

0.330

1

MPI with Bug
Fixes (3 PPEs)

238.57


20.76

0.917

3

MPI with Bug

Fixes
(3 PPEs,
18 SPEs)

(Projected)

34.08

145.35

6.420

21

Original
Projections

334.82

14.79

0.653

21

Results: Comparison


Our final code came
close to a high
powered desktop.


Core 2 Quad at
2.66 GHz


Our projected
results indicate a
speedup of 6.4.

Code revision

Runtime
(sec)

X Speedup

X

Speedup

(compared
to desktop)

#

of
available
cores

Original
(Core2)

1861.66

0.116

1

With

Bug
Fixes (Core 2)

218.8

1

1

Original

(1 PPE)

4953.51

1

0.044

1

With

Bug Fixes
(1 PPE)

662.70

7.47

0.330

1

MPI with Bug
Fixes (3 PPEs)

238.57


20.76

0.917

3

MPI with Bug

Fixes
(3 PPEs,
18 SPEs)

(Projected)

34.08

145.35

6.420

21

Original
Projections

334.82

14.79

0.653

21

Results: Projected


Using all SPEs, the
speedup should be
145.35


Assuming SPEs run as fast
as the PPEs


Before SPE
vectorization

Code revision

Runtime
(sec)

X Speedup

X

Speedup

(compared
to desktop)

#

of
available
cores

Original
(Core2)

1861.66

0.116

1

With

Bug Fixes
(Core 2)

218.8

1

1

Original

(1 PPE)

4953.51

1

0.044

1

With

Bug Fixes
(1 PPE)

662.70

7.47

0.330

1

MPI with Bug
Fixes (3 PPEs)

238.57


20.76

0.917

3

MPI with Bug

Fixes
(3 PPEs,
18 SPEs)

(Projected)

34.08

145.35

6.420

21

Original
Projections

334.82

14.79

0.653

21

Conclusions


Achieved our goal of using MPI to get
runtime improvement.


Contributed a major fix to a widely
used application.


Surpassed our initial runtime goal.


Projected results show an even larger
runtime improvement still possible.

Acknowledgements


May08
-
24 group (phase I)


Kyle
Byerly



Shannon McCormick


Matt
Rohlf


Bryan
Venteicher


DNAPenny

Author


Dr.
Felsenstein


Advisor


Zhao Zhang


Environment Help


Steve
Nystrom



Questions?