PowerPoint - People - Virginia Tech

weinerthreeforksBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

76 views

1

Bioinformatics in the
Department of Computer
Science

Lenwood S. Heath

Department of Computer Science

Blacksburg, VA 24061

College of Engineering

Northern Virginia Engineering Showcase

March 5, 2004

2

Bioinformatics Faculty

Cliff Shaffer

Adrian Sandu

Alexey Onufriev

Lenny Heath

T. M. Murali

Naren Ramakrishnan

Eunice Santos

Layne Watson

Roger Ehrich

Chris North

Joao Setubal, CS and VBI

3/5/2004
Bioinformatics in Computer Science

3

Relevant Expertise


Algorithms



Heath, Santos, Setubal, Shaffer, Watson


Computational structural biology


Onufriev, Sandu


Computational systems biology


Murali


Data mining


Ramakrishnan


Genomics


Heath, Murali, Ramakrishnan


Human
-
omputer interaction, visualization


North


Image processing


Ehrich, Watson


High performance computing



Sandu, Santos, Watson


Numerical analysis


Onufriev, Watson


Optimization


Watson


Problem solving environments


Ramakrishnan, Shaffer


3/5/2004
Bioinformatics in Computer Science

4

Selected Collaborations


Virginia Tech:
Biochemistry, Biology, Fralin
Biotechnology Center, Plant Physiology,
Veterinary Medicine, Virginia Bioinformatics
Institute (VBI), Wood Science


North Carolina State University:
Forest
Biotechnology Center


Duke:
Biology


University of Illinois:
Plant Biology

5

Selected Funding


NSF IBN 0219322:

ITR: Understanding Stress Resistance Mechanisms in
Plants: Multimodal Models Integrating Experimental Data, Databases,
and the Literature.
L. S.

Heath;

R. Grene, B. I. Chevone,

N.
Ramakrishnan,

L. T. Watson.

$499,973.


NSF

EIA
-
01903660:

A Microarray Experiment Management System.
N.
Ramakrishnan, L. S. Heath, L. T. Watson,

R. Grene,

J. W. Weller (VBI).
$600,000.


DARPA

N00014
-
01
-
1
-
0852:

Dryophile Genes to Engineer Stasis
-
Recovery
of Human Cells.
M. Potts,

L. S. Heath,

R. F. Helm,
N. Ramakrishnan,

T.
O. Sitz, F. Bloom, P. Price (Life Technologies), J. Battista (LSU).
$4,532,622.


NSF MCB
-
0083315:
Biocomplexity
---
Incubation Activity: A Collaborative
Problem Solving Environment for Computational Modeling of Eukaryotic
Cell Cycle Controls.
J. J. Tyson,

L. T. Watson, N. Ramakrishnan, C. A.
Shaffer,

J. C. Sible.

$99,965.


NIH 1 R01 GM64339
-
01:

``Problem Solving Environment for Modeling
the Cell Cycle.
J. J. Tyson, J. Sible, K. Chen,

L. T. Watson, C. A. Shaffer,
N. Ramakrishnan,

P. Mendes (VBI).
211,038.


Air Force Research Laboratory F30602
-
01
-
2
-
0572:

The Eukaryotic Cell
Cycle as a Test Case for Modeling Cellular Regulation in a Collaborative
Problem Solving Environment.
J. J. Tyson, J. C. Sible, K. C. Chen,

L. T.
Watson, C. A. Shaffer, N. Ramakrishnan.

$1,650,000.

3/5/2004
Bioinformatics in Computer Science

6

Research Resources

System X


Third fastest computer on the planet

Laboratory for Advanced Scientific Computing &
Applications (LASCA)


Parallel algorithms & math software


Anantham Cluster


Grid computing

Bioinformatics Research LAN


Linux, Mac OS X, Windows


Bioinformatics databases and analysis


3/5/2004
Bioinformatics in Computer Science

7

JigCell: A PSE for

Eukaryotic Cell Cycle Controls

Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa,

Clifford A. Shaffer, Layne T. Watson,

Naren Ramakrishnan, and
John J. Tyson

Departments of Computer Science and
Biology

3/5/2004
Bioinformatics in Computer Science

8

Computational Molecular Biology

DNA

mRNA

Protein

Enzyme

Reaction Network

Cell Physiology

…TACCCGATGGCGAAATGC...

…AUGGGCUACCGCUUUACG...

…Met
-

Gly
-

Tyr
-

Arg
-

Phe
-

Thr...

ATP

ADP

-
P

X

Y

Z

E
1

E
2

E
3

E
4

9

Clb5

MBF

P

Sic1

SCF

Sic1

Swi5

Clb2

Mcm1

Unaligned
chromosomes


Cln2

Clb2

Clb5

Cdc20

Cdc20

Cdh1

Cdh1

Cdc20

APC

PPX

Mcm1

SBF

Esp1

Esp1

Pds1

Pds1

Cdc20

Net1

Net1P

Cdc14

RENT

Cdc14

Cdc14

Cdc15

Tem1

Bub2

CDKs

Esp1

Mcm1

Mad2

Esp1

Unaligned
chromosomes


Cdc15

Lte1

Budding

Cln2

SBF

?

Cln3

Bck2

and

growth

Sister chromatid
separation

DNA synthesis

Cell Cycle of Budding Yeast

3/5/2004
Bioinformatics in Computer Science

10

JigCell Problem
-
Solving Environment

Experimental Database

Wiring

Diagram

Differential

Equations

Parameter

Values

Analysis

Simulation

Visualization

Automatic Parameter Estimation

3/5/2004
Bioinformatics in Computer Science

11

Why do these calculations?


Is the model “yeast
-
shaped”?


Bioinformatics role: the model organizes
experimental information.


New science: prediction, insight



JigCell is part of the DARPA BioSPICE suite of
software tools for computational cell biology.

3/5/2004
Bioinformatics in Computer Science

12

Expresso:

A Next Generation Software
System for Microarray
Experiment Management
and Data Analysis

3/5/2004
Bioinformatics in Computer Science

13


Integration of design, experimentation, and analysis


Data mining; inductive logic programming (ILP)


Closing the loop


Drought stress experiments with pine trees and
Arabidopsis

Expresso: A Problem Solving Environment
(PSE) for Microarray Experiment Design and
Analysis

3/5/2004
Bioinformatics in Computer Science

14

Scenarios for Effects of Abiotic Stress
on Gene Expression in Plants

3/5/2004
Bioinformatics in Computer Science

15

Data Mining with ILP


ILP (inductive logic programming) is a data mining
algorithm for inferring relationships or
rules
.


ILP groups related data and chooses in favor of
relationships having short descriptions.


ILP can also flexibly incorporate
a priori

biological
knowledge (e.g., categories and alternate
classifications).


Hybrid reasoning: Information Integration

“Is there a relationship between genes in a given
functional category and genes in a particular
expression cluster?”

ILP mines this information in a single step


3/5/2004
Bioinformatics in Computer Science

16

Rule Inference in ILP


Infers rules relating gene expression levels to
categories, both within a probe pair and across
probe pairs,
without explicit direction


Example Rule:


[Rule 142] [Pos cover = 69 Neg cover = 3]


level(A,moist_vs_severe,not positive) :
-

level(A,moist_vs_mild,positive).


Interpretation:


“If the moist versus mild stress comparison was
positive for some clone named A, it was
negative or unchanged in the moist versus
severe comparison for A, with a confidence of
95.8%.”

3/5/2004
Bioinformatics in Computer Science

17

ILP in the Expresso Pipeline

Expresso is a next generation software system for microarray
experiments that provides a database interface to ILP functionality.



3/5/2004
Bioinformatics in Computer Science

18

Status of Expresso


Capabilities


Data capture and storage


Statistical analysis


Data mining by ILP


Microarray experiment design


GeneSieve


Expresso
-
assisted experiment composition


Closing the experimental loop


Successful microarray experiment analysis


Pine, Norway spruce, yeast,
Deinococcus radiodurans
(an extremophile microorganism), human cell lines


Planned microarray experiment analysis


Potato,
Arabidopsis thaliana,
tomato, rice, corn

3/5/2004
Bioinformatics in Computer Science

19

Networks in Bioinformatics




Mathematical Model(s) for Biological Networks



Representation: What biological entities and parameters to
represent and at what level of granularity?



Operations and Computations: What manipulations and
transformations are supported?



Presentation: How can biologists visualize and explore
networks?

3/5/2004
Bioinformatics in Computer Science

20

Reconciling Networks

Munnik and Meijer,

FEBS Letters,

2001

Shinozaki and Yamaguchi
-
Shinozaki,
Current Opinion
in Plant Biology,
2000

3/5/2004
Bioinformatics in Computer Science

21

Multimodal Networks



Nodes and edges have flexible semantics to represent:

-

Time

-

Uncertainty

-

Cellular decision making; process regulation

-

Cell topology and compartmentalization

-

Rate constants

-

Phylogeny



Hierarchical

3/5/2004
Bioinformatics in Computer Science

22

Using Multimodal Networks



Help biologists find new biological knowledge



Visualize and explore



Generating hypotheses and experiments



Predict regulatory phenomena



Predict responses to stress



Incorporate into Expresso as part of closing the loop

3/5/2004
Bioinformatics in Computer Science

23

Conclusions


Engaged faculty with the right expertise


Numerous life science collaborations


Federal research funding


First
-
class computational resources


A variety of cutting
-
edge bioinformatics
research projects

3/5/2004
Bioinformatics in Computer Science

24

Bioinformatics Education


Courses in Computer Science


Courses in the Life Sciences


Bioinformatics Option


Doctoral Program in Genetics,
Bioinformatics, and Computational
Biology

3/5/2004
Bioinformatics in Computer Science

25

Doctoral Program in
Genetics, Bioinformatics,
and Computational Biology

Multidisciplinary:

biology,
biochemistry, crop science, plant
physiology,
computer science,

mathematics, statistics, veterinary
medicine

3/5/2004
Bioinformatics in Computer Science

26

Anantham Cluster


Previous cluster specs



200 AMD 1 GHz processors



1 GB RAM per processor



2 TB disk space



2.56 Gb/s Myrinet network

Previous 200 processor cluster