Cloud Computing for Education and Research

streakconvertingSoftware and s/w Development

Dec 13, 2013 (4 years and 19 days ago)

95 views

Cloud Computing for Education and
Research

Customized cloud platform for computing on your terms !


Nirav Merchant (
nirav@email.arizona.edu
)

Topic Coverage


Introduction to cloud computing concepts


Challenges and unique features of cloud
computing


iPlant

Atmosphere overview


Designing customized infrastructure for research,
course work and training material


Using Atmosphere (hands on) for collaborative
data analysis


Explore use of these resources on your own and
ask questions !


Cloud Computing


Not a singular technology component


Not a black box or alien technology


Not a “elixir of scalability”, “panacea for Big Data”
etc.


It cannot keep growing and scaling without
planning (and architecting your application)


Unfortunate
victim of marketing hype


Further complicated by use of jargon, TLA, private
cloud, community cloud, hybrid cloud …



What is cloud computing ?

http://
geekandpoke.typepad.com
/
geekandpoke
/2009/03/let
-
the
-
clouds
-
make
-
your
-
life
-
easier.html

Cloud Computing


Amazingly flexible technology


It

s a platform that comprise of many uniquely
flexible components (more later)


Allows us to create “
purpose built appliances



Allows us to finally “
script our infrastructure



Allows mixing and matching of components that
you need to do your science


Opens up many new avenues and approaches for
teaching topics which usually require complex
(pre configured) software tools and data

Often overheard

I do my analysis using the
“cloud”

It

s the close equivalent of saying:



I do my research using “science”

7

Cloud Computing Zen


Don

t get
frustrated…


This is
cutting (bleeding)
edge technology


There will be plenty of WTF#$@
moments


Be patient…


Instructions/infrastructure keep changing (s/w version)


Be
flexible…


There will be unanticipated issues along the way


Be constructive…


Use wiki, forums and share knowledge



M
ake
everyone

s experience
better


Be creative…


There is more than one way to do it (TIM
-
TOWTDI)

iPlant

URL’s you should know


Wiki.iplantcollaborative.org


ask.iplantcollaborative.org


www.iplantcollaborative.org



Impromptu survey


How many of you use command line


How many of you are windows, mac,
linux

users ?


How many of you use HPC ? (or know what
HPC is)


What resources do you use to teach
computing based workshops/training/courses

Atmosphere:
motivation


Standalone GUI
-
based applications are frequently
required for analysis


GUI apps not easily to transform into web apps


Need to handle
c
omplex software dependencies
(
e.g

specific
bioperl

version and R modules)


Users needing full control of their software stack
(occasional
sudo

access)


Need to share desktop/applications for
collaborative analysis (remote collaborators)


Availability of Next
Gen
map
-
reduce
based
algorithms (
currently we have limited support
)


1
1

SaaS
: Software as a Service

(e.g. Clustering/Assembly is a service)

IaaS
:
Infrastructure as a Service

(get computer time with a credit card and with a Web interface like EC2)

PaaS
:
Platform as a Service

IaaS plus core software capabilities on which you build SaaS

(e.g. Hadoop/MapReduce is a Platform)


Cyberinfrastructure


Is

Research as a Service


http://salsahpc.indiana.edu


As
a
Service

models

More Pain

More Flexibility

Productivity

But where do I start ?


N
ot very helpful searching for “cloud computing ”
related terms (as you will most likely get
bombarded by commercials and advertisements
in the first few hits !)


NIST: National Institute of Standards
and
Technology

Cloud
Computing Synopsis
and Recommendations

(
Special

Publication 800
-
146 :
May 2012)

http
://
www.nist.gov
/
customcf
/
get_pdf.cfm?pub_id
=911075

What it is

Challenges of existing platforms


Amazon Web Services (AWS)

http://aws.amazon.com
/


Flexible

and
scalable



High level of expertise required for
configurations


Fairly challenging for biologists to master all
steps


Limited lifecycle management (cost, time)

Steps to get started !

What is Atmosphere ?


Self
-
service cloud infrastructure


Designed to make underlying cloud infrastructure easy
to use by novice user


Built on open source Eucalyptus (
OpenStack
)


Fully integrated into
iPlant

authentication and storage
and HPC capabilities


Enables users to build custom images/
appliances

and
share with community


Cross
-
platform desktop access to GUI applications in
the cloud (using VNC)


Start and stop your analysis (without loosing state),
much like your laptop (hibernate)


Profile your application usage patter


Provide easy web based access to remote resources
(
compute+data+s
/w)





Who is this tutorial designed for ?


Users wanting to launch configured images in
atmosphere (like a app store)


Software/tools
d
evelopers for application
distribution


Prototyping/Testing new software/modules
(testing software dependencies, conflicts)


Tailored software training setups (custom
workshops/laboratory courses etc.)


Distribute tasks in the “cloud”


Collaborate and share screen/applications


Extend compute capabilities of existing
applications

i.e. utilize
iPlant

API



Terms and jargon for cloud you should
know about


Virtual Machine (aka VM)


Image (aka VM
-
image)


Instance (running VM)


IP address



Amazon EC2 (Elastic Compute)


Amazon EBS (Elastic Block Storage)


Amazon S3 (Simple Storage Service)





API
-
compatible implementation of
Amazon EC2/S3 interfaces


Virtualize the execution environment for
applications and services


Up to 12 core / 48 GB instances


Access to Cloud Storage + EBS


Run servers,
CloudBurst

desktop use
cases. Big data and the desktop are co
-
local again!

>
60

hosted

applications

in

Atmosphere

today,

including

users

from

USDA,

Forest

Service,

database

providers,

etc
.


(
30

more

for

postdocs

and

grad

students

for

training

classes)

The iPlant Collaborative

Project Atmosphere™
: Custom Cloud Computing

Atmosphere:
Collaboration

iPlant

Data Store

Lifecycle


Working together


How often do you wish you could show your
desktop to the person on the phone/
skype


Let them navigate the application for you ?


They can continue your work while you are
away ?


Give you a judgment call/review details ?


Very doable if you


Buy screen sharing software


Log into a different application

Distributing Tasks (scaling)


You have a large collection (aka
BoT
: Bag of
Tasks)
e.g

many
fasta

sequence


You build a “appliance” and now want to
distribute that among many appliances


Works well for 1 but how do you feed many ?


You REALLY want to add more appliances to
finish faster


Makeflow

to the Rescue


Developed by Doug
Thain’s

Collaborative
Computing Lab at Notre Dame

http
://www3.nd.edu/~ccl/



Simple way to distribute and manage your
workflow/analysis among many computing
platforms (appliances)


Keeps track of progress, deals with failures
and starts where it left (no repeating
completed tasks)



Why another workflow system


Emphasis on simplicity


Very easy to integrate with cloud and HPC resources


Does not support complex workflows, handles
dependencies in tasks very elegantly


Light weight and portable


Even works on local machine and makes full use of
multiple cores
!


Working on certain tasks
locally

(important for data
intensive apps)


Workflow system is VERY extensible using various
scripting languages (if you choose)

How does it work ?

Your complex task

(needs software X, Y,Z)

Someone built you a
script/program

Atmosphere
Image/Appliance

DATA !

?

Atmosphere
Image/Appliance

Atmosphere
Image/Appliance

Atmosphere
Image/Appliance

Atmosphere
Image/Appliance

Atmosphere
Image/Appliance

Atmosphere
Image/Appliance

Makeflow

instructions

out
-
10
-
align.fasta
:

in
-
10.fasta
align.exe



align.exe


p
10


i

in
-
10.fasta
-
o out
-
10
-
align.fasta


out
-
20
-
align.fasta
:

in
-
20.fasta
align.exe


align.exe


p
10


i

in
-
20
.fasta
-
o out
-
20
-
align.fasta


o
ut
-
30
-
align.fasta
:

in
-
30.
dat
align.exe



align.exe


p
10


i

in
-
30
.fasta
-
o out
-
30
-
align.fasta




Running it


Take the
makeflow

file (previous slide)


Run
makeflow


f <filename>


Launch workers


Profit

What happens ?

Makeflow

instructions + your
program (
align.exe
) + data

Workers in Atmosphere
Image/Appliance

DATA !

Workers in Atmosphere
Image/Appliance

Workers in Atmosphere
Image/Appliance

Workers in Atmosphere
Image/Appliance

Workers in Atmosphere
Image/Appliance

Workers in Atmosphere
Image/Appliance

Workers in Atmosphere
Image/Appliance

in
-
10.fasta

In
-
20.fasta

out
-
10
-
align.fasta

out
-
20
-
align.fasta

Tasks

Example

When not to use cloud !


When you need “
bare metal
” performance


CPU speed


Network


Data I/O


You application can support
MPI

across large
number of compute nodes (> 2)


When applications need large memory
(>64Gb)


Users of Atmosphere for teaching


W
orkshops:


Frontiers and Techniques in Plant
S
ciences

CSHL 2011,2012


Genotyping by Sequencing

Cornell Computational Biology


Graduate/U. Graduate course work:


BCB 660 Volker
Brendel

and Amy
Toth

Fall 2011, Iowa State University


ISTA 420/520 Nirav Merchant & Eric Lyons

Fall 2012, Univ. of Arizona


Intro.
Bioinformaics
, Anne Lorraine

Fall 2012l Univ. of North Carolina


Popular community contributed images:


PhytoMorph

(Nate Miller, U. Wisconsin)


Twig2Genome (
Haibao

Tang, JCVI)


Julin

Maloof
, UC Davis*



Recap on key concepts


Purpose built appliances


Scriptable infrastructure


Scaling multiple self contained tasks


Collaborative analysis

Discussion


What would you want to build with your
custom infrastructure ?

Courses
Using Atmosphere

Asian Wild Rice Distribution

The Research


Genetic studies documented
geographic subdivision of Asian wild
rice (
Oryza

rufipogon

), the progenitor
of
cultivated Asian rice.


Cause unknown.


Use species distribution modeling
(SDM) to examine environmental
factors associated with the spatial and
temporal distribution of
O.
rufipogon
.


Compare estimated distribution during
Last Glacial Maximum (LGM) to genetic
data.


Problem


Analysis requires large datasets

Results


Present distribution of
O.
rufipogon

(Fig. A).


Projected
paleodistribution

at LGM was
separated into disconnected east and
west ranges (Fig. B).


Consistent with current geographic
pattern of genetic variation, with
two genetic groups that
intergrade

(Fig. D).


Annual precipitation contributes most
to SDM estimates.


SDM projections for year 2080 indicate
an increasing probability of presence
and range expansion (Fig. C).


Indicates global warming is less
threat to this endangered species
than other human
-
mediated factors.

Scalable science


325 records of O.
rufipogon

sample
locations from two sources.


iPlant enabled Huang and
Schaal

to
successfully pursue this research.

(A) present, (B) Last Glacial Maximum,
(C) Future 2080, (D) Genetic variation.


iPlant Workshop at BSA, July 2011


Pu

Huang (Washington U.) attended.


Learned
about Atmosphere,
iPlant’s
cloud computing platform.


P Huang and B.A.
Schaal
, Am. J. Botany 99(11). 2012.

Hands On Lab

Atmosphere
Login


Visit http
://
www.iplantcollaborative.org
/



Next click on
the Atmosphere
Login Image
(should be about mid page)


Click the Login button and enter your
iPlant

username and password

Atmosphere Intro screen

Getting familiar with the UI

1.
Search for NGS Viewers v3 08/20/2012(an instance type) and select the purple icon.

2.
Give it a name and select the instance size (choose m1.small).


By selecting different sizes you will notice project resources change
.

3.
When ready, press the Launch Instance

Understanding Instance Metrics


After an image has launched, you can view information about it.


Resource Usage Metrics


My Resource Usage at the top of the screen shows how much of your
quota in CPUs and GB of memory is being used by your running
instances. (Seen at the top
)


Instance
Details


The Instance Details tab displays important information about the instance,
including the ID assigned to the instance when it was launched, name of the image it
is using, unique EMI ID, the instance size, the date you launched the image, and the
IP address
, which you will need when logging in to the instance.


Instance Metrics


Instance Metrics allow you to drill down into the usage expended for the running
image.

Logging into an Instance


Via
ssh
-

If the Shell tab is disabled, you can log
into your instance via SSH for you operating
system.


In your terminal window type:



$
ssh

your_iplant_username@instance_ip_address



For example, mine would look like:



$
ssh

amercer
@
128.196.142.48



Enter your
iPlant

password and you should be logged into your instance


Terminating an Instance


Click instance to terminate in the My Instances
list.


Either


Click the
Terminate Instance
icon in
your My Instances
list
or




Click the
Terminate Instance
button on the Data tab.



Click OK to the warning message.


Requesting More Resources


Enter the amount or resources you are
requesting.


Enter the justification for the request.


Click the Request Resources button (right side
of page).


Your request will be reviewed and you will receive
a response within 2 working days.

Reporting an Instance Problem


Select the instance which you are having
problems with.


Click report instance



Fill out the Instance Error form.



When finished, press the Report this Instance
button.

Dealing with technical challenge

(Firewall issues)

Logging in via VNC


Airport VNC runs a
built
-
in Java VNC
viewer from
a web browser within the
Atmosphere
Airport
interface and requires Java. This is the more
common use
.


Select the VNC tab


If prompted, allow the Java applet to
run


In the VNC Server field, enter the IP address for
your instance, appending :1 after the IP
address
(should be auto
-
populated already). Press
connect.



Enter your username and password

Here you have
successfully logged via VNC.

Terminating a VNC session


You can terminate a VNC Viewer session either
from the VNC tab in Airport or from the VNC
Viewer application
window.


To terminate the session from Airport: Click
the 'X' from the My Instances list or from the
VNC tab:

Hands on exercise


Launching a instance (one per team)


Connecting to it (
vnc

and
ssh
) using the web
browser and
vnc

client software


Launching a application (flapjack/tablet)


Installing a new application (optional)


Collaborating with other users (sharing your
session)


Terminating the instance when you are done