Bioinformatics and Graphical Models - Microsoft Research

abalonestrawΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

84 εμφανίσεις

Bioinformatics and Graphical Models:


Computation, approximation, and their value

MSR:

Nebojsa Jojic, Vladimir
Jojic, Chris Meek,
David Heckerman

UW:

Jim Mullins, Mark
Jensen, Jerry Learn

Overview


Computational cost of usual algorithms


State of the art


Phylogeny + alignment


Phylogeny + sequence modeling


Approximations and their pitfalls


Recombination


Analogy to other ML domains


Graphical model


Experiments and computational cost


Value of the computation


Potential applications


Drug discovery cycle


Value of time and clinical success


Market size and growth


Discussion


Rational vaccine design

(Jim Mullins et al)


Rational design


Analysis of sequences to form a model of
virus evolution (phylogenies, etc.)


Develop vaccines that target as much
variability as possible



Traditional design


Trial and error


Educated guesses

State of the art sequence analysis
programs


Example:


Rational AIDS vaccine design


Analysis of the envelope gene from a single patient in one visit


200 sequences with 600 base pairs each


Overnight to align


1
-
2 hours to 2
-
3 days to build a tree, depending on how much
search you are willing to do


This does not include modeling the inter
-
sequence
dependencies, coupling alignment and tree search, and it
ignores recombination


The total length of the HIV genome is 10000 and the
number of samples is practically only limited by cost

Computational cost of a slightly
more detailed analysis


Metropolis search over all trees on 400
sequences of the full genome (10k) would
last around 2 years on one machine



Exact search intractable!

Approximation


Free energy as a bound on negative log
-
likelihood


Computation and approximation of the free
energy:


Iterative conditional modes


Mean
-
field method


Structured variational techniques


(Loopy) belief propagation


Sampling techniques


How tight is the bound?


What does the looseness translate to?

An example of the approximation issues

An example of the approximation issues

An example of the approximation issues:

Tightness of the bounds

Variational technique

Exact EM algorithm

Recombination


In HIV, the rate of recombination has
recently been estimated to be ¼ of the
rate of mutation!


Combinatorial explosion in inference

Similar situations in other domains
where graphical models work well



Occlusion in video



Source interaction in audio



Composition of images

“Occlusion” in audio

Speaker1

Speaker2

M

1
-
M

*

*

+

||

Retrieved
Speaker1

Retrieved
Speaker2

Epitome of an image

Input image

A set of image patches

Epitome

Layers from a single photograph

e
m

e
s

S
1

s
2

M

x

Modeling alignment and recombination by
learning a library of gene patterns

Experimental results


Value of computation

(from Tufts Center)

Growth


Human viruses


West Nile


SARS


Hepatitis C


Polio





Animal viruses


FIV


Pig, chicken and cow viruses


Most bacterial diseases


Parasitic diseases


The first sign of success of rational design might trigger
great increase in the number of diseases tackled


How can MS/MSR be involved?


MS: Architecture, platform, tools


Storage, transmission, computation


E.g., parallelizable computation on a single machine;
pear
-
to
-
pear networks for parallel computation on
multiple machines


MSR:


Helping to speed up the scientific progress leading to
the new opportunities for growth


Advising MS on the research direction in the
community and the future requirements for the
platform