FutureGrid Training, Education and Outreach

earsplittinggoodbeeInternet and Web Development

Nov 3, 2013 (3 years and 7 months ago)

224 views

https://portal.futuregrid.org

FutureGrid

Training, Education and Outreach

Bloomington Indiana

January 17 2010


Presented by Renato Figueiredo

renato@acis.ufl.edu



Associate Professor

University of Florida

https://portal.futuregrid.org

Overview


Traditional ways of delivering hands
-
on training and
education in parallel/distributed computing have
non
-
trivial dependences on the environment


Difficult to replicate same environment on different resources (e.g.
HPC clusters, desktops)


Difficult to cope with changes in the environment (e.g. software
upgrades)


Virtualization technologies remove key software
dependences through a layer of indirection


https://portal.futuregrid.org

Overview


FutureGrid enables new approaches to education
and training and opportunities to engage in outreach


Cloud, virtualization and dynamic provisioning


environment can adapt to the user, rather than expect
user to adapt to the environment


Focus of FutureGrid TEO is on leveraging the unique
capabilities of the infrastructure and its software to:


Reduce barriers to entry and engage new users


Use of encapsulated environments (“appliances”) as a
primary delivery mechanism of education/training
modules


promoting reuse, replication, and sharing

https://portal.futuregrid.org

Summary of activities (1)


Focus activities in the first year


Infrastructure supporting TEO activities


Documentation, integration of educational materials,
input/recommendations for portal and computing
infrastructure


Development of hands
-
on tutorials tailored to
FutureGrid technologies and resources


Development, integration, testing of educational virtual
appliances

https://portal.futuregrid.org

Summary of activities (2)


Focus activities in the first year


Education activities


Working with early adopters in class environments


Understand requirements, opportunities, challenges


Outreach activities


Demonstrations and presentations highlighting
FutureGrid’s unique capabilities in conferences,
workshops


Engaging with minority serving institutions

https://portal.futuregrid.org

TEO Infrastructure
-

guiding principles


Fidelity
: TEO activities should use full
-
fledged,
executable software: education/training modules


Learn using the proper tools


Reproducibility:
Creators of content should be able to
install, configure, and test their modules once, and
be assured of the same functional behavior
regardless of where the module is deployed


Incentive to invest effort in developing, testing and
documenting new modules

https://portal.futuregrid.org

TEO Infrastructure
-

guiding principles


Deployability:
Students and users should be
able to deploy modules in a simple manner,
and in a variety of resources


Reduce barriers to entry; avoid dependences upon
a particular infrastructure


Community
-
oriented
: Modules should be
simple to share, discover, reuse, and expand


Create conditions for “viral” growth

https://portal.futuregrid.org

Towards this vision in FutureGrid


Executable modules


virtual appliances


Deployable on FutureGrid resources


Deployable on other cloud platforms, as well as
virtualized desktops


Community sharing


Web 2.0 portal,
appliance image repositories


An aggregation hub for executable modules and
documentation


https://portal.futuregrid.org

Educational appliances


A flexible, extensible platform for
hands
-
on, lab
-
oriented

education on FutureGrid


Need to support clustering of resources


Virtual machines + social/virtual networking to
create sandboxed modules


Virtual “Grid” appliances
: self
-
contained, pre
-
packaged
execution environments


Group VPNs
: simple management of virtual clusters by
students and educators

https://portal.futuregrid.org

Virtual appliance example


Linux, Java, Hadoop, configuration scripts


copy

instantiate

Hadoop

image

A Hadoop worker

Another Hadoop worker

Repeat…

Virtualization

Layer

https://portal.futuregrid.org

Virtual Networking


A single appliance encapsulates software and
configuration


Cluster/Grid/Cloud computing


Middleware expects a collection of machines,
typically on a LAN (Local Area Network)


Appliances need to communicate and coordinate
with each other


Each worker needs an IP address, uses TCP/IP
sockets


https://portal.futuregrid.org

Virtual cluster appliances


Virtual appliance + virtual network


copy

instantiate

Hadoop

+

Virtual

Network

A Hadoop worker

Another Hadoop worker

Repeat…

Virtual

machine

Virtual

network

https://portal.futuregrid.org

Support for clustering


Network virtualization software on FutureGrid
includes ViNe and GroupVPN


Nimbus has support for contextualization of one
-
click virtual clusters


Within a LAN, or coupled with ViNe


Grid appliances use peer
-
to
-
peer overlay for
discovery and configuration of virtual addresses
(DHCP) and cluster middleware

https://portal.futuregrid.org

GroupVPN Overview

Alice

Carol

Bob

Social

Network

Web interface

Social network

(e.g. XMPP,

group site)

Virtual network


10.10.0.2


10.10.0.3

Social

Network API

Messaging layer/information system

Alice’s public keys

Bob’s public keys

Carol’s public key

Bootstrapping private
links through

Web 2.0 interfaces
and IP
-
over
-
P2P
overlay tunneling


Private IP address
spaces, DHCP


Appliances perceive
virtual LAN

10.10.0.4

https://portal.futuregrid.org

Deploying virtual clusters


Same image, different VPNs


copy

instantiate

Hadoop

+

Virtual

Network

A Hadoop worker

Another Hadoop worker

Repeat…

Virtual

machine

Group

VPN

GroupVPN

Credentials

(from

Web site)

Virtual IP
-

DHCP

10.10.1.1

Virtual IP
-

DHCP

10.10.1.2

https://portal.futuregrid.org

FutureGrid example


Deploying a Condor virtual appliance cluster on
FutureGrid or desktop resources


Nimbus: cloud
-
client.sh
--
run
--
name
grid
-
appliance
-
amd64.tar.gz


Eucalyptus: euca
-
run
-
instances
ami
-
fd4aa494

--
instance
-
type m1.large
-
k keypair

Vmware player: double
-
click
Grid
-
appliance.vmx

Upload GroupVPN configuration file to appliances


https://portal.futuregrid.org

FG appliances
-

Status

Nimbus,

Eucalyptus

Appliance

image

FutureGrid resources,

Appliance images (Condor,

Hadoop), tutorials

GroupVPN portal, image

downloads, bootstrap

routers

https://portal.futuregrid.org

Use of FutureGrid in classes


First
-
year ramp
-
up of hardware and software


Training and education emphasis has been use in
classes, tutorials with early adopters


Highlights:


Cloud computing class at Indiana University


Distributed Scientific Computing class at Louisiana
State University (LSU)


Big data summer school at IU


Nimbus tutorial at CloudCom conference

https://portal.futuregrid.org

University of

Arkansas

Indiana

University

University of

California at

Los Angeles

Penn

State

Iowa

State

Univ.Illinois


at Chicago

Uni versi ty of

Mi nnesota

Mi chi gan

State

Notre

Dame

University of

Texas at El Paso

IBM
Almaden

Research Center

Washington

University

San Di ego

Supercomputer

Center

University

of Florida

Johns

Hopkins

July 26
-
30, 2010 NCSA Summer School Workshop

http://salsahpc.indiana.edu/tutorial

300+ Students (200 on sites from 10 institutes; 100 online)

IU MapReduce and UF Virtual Appliance technologies are supported by FutureGrid.

(Slide courtesy of Judy Qiu)

Big Data for Science

https://portal.futuregrid.org

Cloud computing class at IU


Graduate
-
level “Cloud computing for Data
-
Intensive Sciences” (Judy Qiu, Fall 2010)


Virtualization technologies and tools


Infrastructure as a service


Parallel programming (MPI, Hadoop)


FutureGrid provided a set of software options that
made it possible for students to work on different
projects along the system stack.


https://portal.futuregrid.org

Cloud Storage

#8

Cloud Storage Survey (
Xiaoming
,
Nixiaogang
)


Iterative
MapReduce

#3

LDA (
Changsi
, Yang)

#4

MemCache

(
Saliya
,
Yiming

,Jerome)

#5

Avro (
Yuduo
, Yuan,
patanachai
)

#6

PageRank

(
Shuo
-
Huan,Parag
)


Virtualization

#9

Hypervisor Performance Analysis Project (James , Andrew)

Cloud
Platform

Cloud

Infrastructure

Cloud Infrastructure

#7

Nimbus, Eucalyptus (Stephen,
Sonali
,
Shakeela
)

Hypervisor/

Virtualization

Dryad/
DryadLINQ

#1

Matrix Multiplication (
Swapnil,Amit,Pradnay
)

#2

PhyloD

(
Ratul,Adrija,Chengming
)



Higher Level
Languages

Term Projects

(Slide courtesy of Judy Qiu)

https://portal.futuregrid.org

Distributed Scientific Computing class
at LSU


FutureGrid supported activities in a new semester
-
long class
offered Fall 2010 at LSU (Gabrielle Allen, Shantenu Jha)


A practical and comprehensive graduate course preparing
students for research involving scientific computing


Module E (Distributed Scientific Computing) taught by Shantenu Jha


Topics where FutureGrid was used:


Introduction to the practice of distributed computing


Cloud computing and master
-
worker pattern


Distributed application case studies


Approximately half of a lecture provided an overview of
FutureGrid and the process to get accounts and started


As part of the homework assignment associated with lecture
E0, each student had to confirm access and successful login to
FG
-
Sierra and FG
-
India


https://portal.futuregrid.org

Distributed Scientific Computing class
at LSU


FutureGrid (FG) was used by students to

(i) compile, deploy and execute basic SAGA commands

(ii) learn the basics of remote job submission and elementary Master
-
Worker
based distributed applications (such as MapReduce and computing the
Mandelbrot Set) using FG
-
India and FG
-
Sierra nodes

(iii) to get hands on training with IaaS Clouds, namely stand
-
up virtual machines
using Eucalyptus and deploy software and/or applications from (i) and (ii)



Students also used Eucalyptus on FG
-
India and FG
-
Sierra to do their
Module E projects, which ranged from:


(a) Clouds as accelerators for Cactus
-
based applications,


(b) calculate PI using distributed tasks,


(c) extend the calculation of the Mandelbrot Set to ``new'' backends on
FutureGrid (in addition to the ``default'' remote/ssh backends), and


(d) the execution of workers on bare
-
metal as well as Clouds concurrently (i.e.,
hybrid Grid
-
Cloud infrastructure) for master
-
worker applications.

https://portal.futuregrid.org

Images


IMAGE

emi
-
8D2A13F7

smaddi2
-
saga
-
bucket/saga153
-
ubuntu.manifest.xml

smaddi2

available

public


x86_64

machine

eri
-
5BB61255

eki
-
78EF12D2


IMAGE

emi
-
DBD61078

ubuntu
-
0904
-
saga
-
1.5.2/image.manifest.xml

luckow

available

public


x86_64

machine

eri
-
5BB61255

eki
-
78EF12D2


IMAGE

emi
-
0E0E165E

ajyounge/ubuntu
-
twister
-
memcached.img.manifest.xml

ajyounge

available

public


x86_64

machine

eri
-
5BB61255

eki
-
78EF12D2

https://portal.futuregrid.org

Nimbus tutorial at CloudCom


Half
-
day (3
-
hour) presentation + hands
-
on
activities


30 attendees used their own computers to
instantiate virtual machines on FutureGrid
resources


Template for a self
-
learning tutorial for new users
and prospective users


https://portal.futuregrid.org

Nimbus tutorial at CloudCom

https://portal.futuregrid.org

FutureGrid tutorials


Tutorial topic 1: Cloud Provisioning Platforms


Using Nimbus on FutureGrid


Nimbus One
-
click Cluster Guide


Using the Grid Appliances to run FutureGrid Cloud Clients


Using Eucalyptus on FutureGrid


Tutorial topic 2: Cloud Run
-
time Platforms


Introduction to Hadoop using the Grid Appliance


Running Hadoop on FG using Eucalyptus (.ppt)


Running Hadoop on Eualyptus


Tutorial topic 3: Educational Virtual Appliances


Introduction to the Grid Appliance


Creating Grid Appliance Clusters


Building an educational appliance from Ubuntu 10.04


Deploying Grid Appliances using Nimbus


Deploying Grid Appliances using Eucalyptus


Customizing and registering Grid Appliance images using Eucalyptus


MPI Virtual Clusters with the Grid Appliances and MPICH2


Tutorial topic 4: High Performance Computing


Performance Analysis with Vampir


Instrumentation and tracing with VampirTrace

https://portal.futuregrid.org

Year
-
1 Outreach activities


Demonstrations, presentations, booths at
major events


SuperComputing, TeraGrid Conference, OGF
(Open Grid Forum), CloudCom, CCGrid, Grid’5000
meeting, Vampir workshop

1114 CPU cores

(
457 VMs
)
distributed over 3 sites in FutureGrid
and 3 sites in Grid’5000 (P. Riteau et
al, OGF
-
29 demo, Chicago, IL, June
2010).

https://portal.futuregrid.org


Outreach activities


At IU, working with dean for diversity and
education to organize outreach and pursue
REU funding to bring MSI students to IU for
summer internships and to coordinate
education and training workshops


Involvement of students from Historically
Black Colleges and Universities (HBCUs)


REU supplement for FutureGrid this year funded 2
HBCU students in summer 2010; will apply each
year

https://portal.futuregrid.org

Planned TEO activities


Plan to engage MSIs with which IU has already
established formal collaborative agreements


MSI Cyberinfrastructure Empowerment Coalition (MSI
-
CIEC). Primary theme: “teach the teachers” at MSIs so that
they can incorporate cyberinfrastructure into their
research and involve students and staff at their home
institutions.


MSI
-
CIEC’s principal activity: Cyberinfrastructure Days
-

daylong workshops feature prominent speakers who
discuss the application of cyberinfrastructure to research
and education


https://portal.futuregrid.org

Planned TEO activities


With Elizabeth City State University


Planning summer school on cloud computing for ADMI
(Association of Computer/Information Sciences and
Engineering Departments at Minority Institutions) faculty
and students


Leverage Indiana University’s STEM Initiative


Provides travel, housing, and support for HBCU students
to intern at Indiana University during the summer


https://portal.futuregrid.org

Planned TEO activities


Coordinate Web tutorials and documentation;
emphasis to support short tutorials that can be given
by partners at conferences, and self
-
guided learning
by new or prospective users


Continuously provide recommendations and
guidance, Web portal, user accounts


Engage with potential early adopters in computer
science and engineering classes


Leverage existing MSI contacts, and use of
FutureGrid in workshops, summer schools, and
internships