The iPlant Collaborative

farmpaintlickInternet and Web Development

Oct 21, 2013 (3 years and 10 months ago)

132 views

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

The iPlant Collaborative

IBP Annual Meeting


June 1
st

2011

Steve Goff

iPlant Collaborative, BIO5 Institute

School of Plant Science

University of Arizona

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

What is iPlant?


iPlant’s

mission is to build the CI to support plant
biology’s Grand Challenge solutions


Phase I


Community Input


Phase II


Building the CI Foundation


Next Phase


Enabling Plant Science Discovery

Now need to integrate workflows and
test theories

Will support tool integration and
synthesis activities

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

NSF Cyberinfrastructure Vision


High Performance Computing


Data and Data Analysis


Virtual Organizations


Learning and Workforce

Ref:

“Cyberinfrastructure Vision for 21st Century Discovery”, NSF Cyberinfrastructure Council, March 2007.

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

CI for Plant Science: Observations


Investment in data creation is high


Sources of data are disparate.


Investment in existing tools is significant


Tools shouldn’t be discarded


Tools shouldn’t be reproduced, but lack:


Interoperability
w
/other tools



Data standards










Scalability




Consistency of interface access & use



Experimental reproducibility

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

iPlant is a process and a platform

(or set of platforms, depending on
your point of view).

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Computational & Storage Capability


Compute
: Ranger,
Lonestar
, Stampede (UT/
TeraGrid
) Saguaro, Sonora
(ASU) Marin, Ice (UA)


~700 Teraflops



Storage
: Corral, Ranch (UT), Ocotillo (ASU)


> 10
Petabytes

of storage available for the project



Visualization
: Spur, Stallion (UT), Matinee (ASU), UA
-
Cave


Among the world’s largest

visualization systems



Virtualized/Cloud Services
: iPlant,
TeraGrid
, vendor clouds



Cloud tech to deliver persistent gateways and user services


Thanks to large
-
scale NSF investments, iPlant
has excellent CI access

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Bench

Biologists

APIs

APIs

Data

Algorithms

Discovery

Environment

Data Store

Atmosphere

Computational
Biologists


Semantic Web Layer

iPlant
Cyberinfrastructure

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Overview of Components


iPlant Discovery
Environment

-

Core Software


iRODS

Integration


Core Services


Atmosphere Cloud


Core Services


Semantic
Web Tech



SSWAP Team


iPlant Tool/Workflow API



Core Software &
Engagement Teams

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Discovery

Environment

DNA

Subway

3
rd

Party Science

Gateways

User Scripts &

Applications

Public APIs

Low
-
Level Services

Event

I/O

Data

Apps

Job

Profile

Auth

Condor

PBS

SGF

LSF

LL

iRODS

MySQL

LDAP

Eucalyptus

Action

Folders

Shibboleth

Globus
/

Unicore

GPIR

MyProxy

XSEDE

iPlant Hardware Resources

High
Perf

Computing Databases Storage Cloud Systems

Semantic Web

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

iRODS

Integrated Rule
-
Oriented Data System

www.irods.org


Why
iRODS
?


Large data
storage in simple format


Sharing of large data among iPlant CI Resources


Sharing of large data with colleagues and collaborators


Processing large data with TACC resources


General information on
iRODS
:
www.irods.org


Access
iPlant’s
iRODS
:

irodsweb.iplantcollaborative.org


Documentation:
https://pods.iplantcollaborative.org/wiki/display/systems/iRODS

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

1
1

Atmosphere

iPlant’s Cloud Computing Resources

http://atmosphere.iplantcollaborative.org



Tutorial:
https://pods.iplantcollaborative.org/wiki/display/atmosphe
re/Demo+with+picture+walkthrough



Why
Atmosphere?


Use a
virtual machine (VM) with

preinstalled software


Create a VM to install complex software


Create and share an image of a
VM (VMI)


Mount data from iPlant
iRODS

for use by your VM

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

1
2

Semantic Web

http://www.iplantcollaborative.org/communities/developers/semanticweb


Why Semantic
Web Technology?


Provides a means for web
-
services to
communicate and be aware of one another

iPlant
Consumer

Semantic
Web

Remote
Service

User
-
Created
Service in
Atmosphere

Semantic
Web

iPlant’s
Discovery
Environment

iPlant
Service

Semantic
Web

Remote
Consumer

iPG2P: From Genotype to Phenotype


Visual Analytics


R.
Grene

and G. Abram: Information Visualization
T
ools capable of
displaying diverse types of data from laboratory, field, in
silico

analyses
and simulations


Data Integration


D. Ware and C. Jordan: Methods for describing and unifying data sets
into systems that support iPG2P activities


Statistical Inference


D.
Kliebenstein

and E. Buckler: Platform for using advanced
computational approaches to statistically link genotype to phenotype


Modeling Tools


J. White, C. Myers, S. Welch : Framework for the construction,
simulation and analysis of computational models of plant


Ultra High Throughput Sequencing


T.
Brutnell

and M. Vaughn: HPC resources and applications to process
large
-
volume sequence data

Genome
Services

Ultra High
-
Throughput Sequencing

Scalable computing

Data


NCBI SRA


Desktop


AmazonS3


FTP


HTTP

Data Wrangling


Quality Control


Preprocessing


Rescaling


Barcoding

Alignments


BWA


TopHat

Cufflinks

SAMTools

SAM Alignments

Expression

Levels

(RPKM)

Genome
Variants

(VCF3.3)

Community Use Cases

Expression
studies

Forward genetic screens

Association studies

High Throughput Image Analysis

Scope:
Enable
image
-
based plant sciences
research by incorporating image
processing algorithms, grid computing, and
databasing

into an analysis pipeline

Objectives


1.
Integrate
Phytomorph

and BISQUE as
PhytoBisque

2.
Broaden

access
to algorithms that benefit the community

3.
Automate workflows
so that plant biologists need not be computer scientists

Storage

Authentication

APIs

Compute cluster

E. Spalding @ U of Wisconsin, B.S
Majunath

and K.
Kvilekval

@ UCSB

Phytobisque
: Example Use Case

Given a flatbed scanner image of Arabidopsis
seeds, measures the length, width, and area and
produce a population estimate for each trait

Seed trait QTL can be mapped when applied to
mapped populations like Ler x CVI

Basic QTL/GWAS analysis


R/
Qtl
,
QTLcartographer
, et al.


Community can integrate these into the CI

Iterative analyses


iPlant workflow
management simplifies
automation


Compare methods!

Exploratory methods


Hand
-
built R, Python,
SAS, C codes


Easy integration into
iPlant CI via API


Adopt common data
model

Scalability Challenges: High
-
density markers, large
populations, combinatorial
analyses


iPlant
-
authored parallel GLM (
etc
)
implementations


Common data model


Utilize workflow framework

A Strategy for

Association

Studies


Simplest
case*:
a few
minutes

using
GLM

on

desktop
TASSEL


1000
-
replicate bootstrap: 75
-
150
hours
/

trait


Runtimes only gets larger (
days to
years
) for more complex
analyses


* One trait
x

40 million markers with no
bootstrapping or
epistasis

testing

Statistical Inference:
Scalable GLM

6

traits
of
interest

40 million markers
in maize NAM

1000 replicate
analyses

Epistasis
testing

X

X

Genotype

Phenotype

ANOVA

GPU
-
based QTL Mapping

19


Aspects of the problem are highly parallel


Re
-
architect data flow and mapping algorithms for GPU architecture


Interface for C and GPU implementations will be identical

Ali
Akoglu

and Dave
Lowenthal
,
UArizona

Alignment
-
based protein searches sped up 6
-
10x

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

iPlant Tree of Life (iPToL)

Large phylogenetic inference

Building a tree of life for up to 500,000 green plants

Tree Visualization

Scalable visualization for small to large trees

Data Assembly and Integration

Acquisition, organization and processing the data

Taxonomic Intelligence

Sorting out different names for the same species

Tree Reconciliation

Resolving discordant gene and species trees

Trait Evolution

Using tree to understand how traits evolved






www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Phyloviewer
: visualization of large
phylogenetic

trees


21

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

My
-
Plant


Social networking
for plant biologists


Organized by
clade


Used to organize
the data collection
for the “big tree”

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Taxonomic Name Resolution Service

www.iplantcollaborative.org

sgoff@iplantcollaborative.org

Integration of New Tools w/o Programming

This part is done!!!

This part is coming soon!

Related Activities



Integrated Breeding Platform



Social networking portal for plant breeders



R analysis packages



Breeders
fieldbook



1kp (1,000 plant transcriptomes)



DOE’s

Knowledgebase (Kbase)



Seed projects



Elixir



CoGe

Future Workshop Activities



Small tool/workflow integration meetings



2
-
3 days each, 10
-
20 local participants



4
-
5 meetings starting in June 2011



Addressing specific biological questions



With appropriate test data and available software



Building on iPlant’s
cyberinfrastructure



Complementary tools and additional data access



Preference for broad use, high impact tools & workflows



Can be kept private until published



Positive results will stimulate additional support


www.iplantcollaborative.org

sgoff@iplantcollaborative.org

2
7

iPlant’s

Building Blocks

27

Metadata

Data

Tools

Workflows

Viz

Executive Team:

Steve Goff

Dan
Stanzione


Staff:

Greg Abram

Victoria Bryan

Rion

Dooley

Andy Edmonds

Juan Antonio



Raygoza

Garay

Karla
Gendler

Damian
Gessler

Cornel
Ghiban

Michael Gonzales

Hariolf

Häfele

Matthew
Helmke

Faculty Advisors:

Greg Andrews

Kobus

Barnard

Susan Brown

Vicki Chandler

John Hartman

Nirav

Merchant

Students:

Storme

Briscoe

Steven Gregory

Monica Lent

Bansri

Poduval

Pavithra

Ravi

Shannon
Wermes

Jill
Yarmchuk


Sudha

Ram

Ann Stapleton

Lincoln Stein

Doreen Ware

Sue
Wessler

Ramin

Yadegari


Natalie
Henriques

Uwe

Hilgert

Nicole Hopkins

Lisa Howells

Kathleen Kennedy

Mohammed
Khalfan

Seung
-
jin

Kim

Adam
Kubach

Sangeeta

Kuchimanchi

Tina Lee

Andrew
Lenards

Sonya Lowry


Jerry Lu

Eric Lyons

Naim

Matasci

Sheldon McKay

Dave
Micklos


Andy Muir

Martha
Narro

Christos
Noutos

Dennis Roberts

Bernice
Rogowitz

Jerry Schneider

Bruce
Schumaker

Edwin Skidmore

Sriram

Srinivasan

Mary Margaret Sprinkle

Matthew Vaughn

Liya

Wang

Sharon Wei

Jason Williams

Frank
Willmore

John
Wregglesworth

Weijia

Xu