17 - Digital Biology Laboratory - University of Missouri

fleagoldfishBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

104 views

CS 7010: Computational

Methods in Bioinformatics

(course review)


Dong Xu





Computer Science Department

271C Life Sciences Center

1201 East Rollins Road

University of Missouri
-
Columbia

Columbia, MO 65211
-
2060

E
-
mail: xudong@missouri.edu

573
-
882
-
7064 (O)

http://digbio.missouri.edu

Technical Definitions

NIH (http://www.bisti.nih.gov/)

Bioinformatics
: “research, development, or
application of computational tools and
approaches for expanding the use of biological,
medical, behavioral or health data, including
those to
acquire,
represent
,
describe
, store,
analyze
, or visualize

such data”.

Computational Biology
: “the development and
application of data
-
analytical and theoretical
methods, mathematical modeling and
computational simulation techniques to the
study of
biological, behavioral, and social
systems
”.

Course Topics


Data interpretation in analytical technologies


Data management and computational
infrastructure


Discovery from data mining


Modeling, prediction and design


Theoretical
in silico

biology



Cover classical/mainstream bioinformatics
problems from computer science prospective


Discovery from Data Mining (I)


Data source


Genomic / protein sequence


Microarray data


Protein interaction



Complicated data


Large
-
scale, high
-
dimension


Noisy (false positives and false negatives)




Discovery from Data Mining (II)


Pattern/knowledge discovery from data


many biological data are generated by
biological processes which are not well
understood


interpretation of such data requires discovery of
convoluted relationships hidden in the data


which segment of a DNA sequence represents a
gene, a regulatory region


which genes are possibly responsible for a particular
disease

Discovery from Data Mining (III)

Modeling, Prediction
and Design (I)


Modeling and prediction of biological
objects/processes


Sequence comparison


Secondary structure prediction


Gene finding


Regulatory sequence


identification




Prediction of outcomes of biological processes


computing will become an integral part of modern biology through an
iterative process of









From prediction to engineering design


Drug design


Protein structure prediction to protein engineering


Design genetically modified species


model
formulation

computational
prediction

experimental
validation

Modeling, Prediction
and Design (II)

Scope of Bioinformatics

data management
;
data mining
;
modeling
; prediction;
theory formulation








engineering
aspect

scientific
aspect

bioinformatics

an indispensable part of biological science

genes,
proteins
,
protein complexes
,
pathways
,
cells,

organisms
, ecosystem

computer science, biology, statistics

mathematics, physics, chemistry, engineering,…

Bioinformatics Foundations


Technology


Biology/medicine


Computer Science


Statistics



From interdisciplinary field to a
distinct discipline

Course Coverage


A general introduction to the field of bioinformatics


problems definitions: from biological problem to computable problem


key computational techniques


A way of thinking: tackling “biological problem”
computationally


how to look at a biological problem from a computational point of view


how to formulate a computational problem to address a biological issue


how to collect statistics from biological data


how to build a computational model


how to design algorithms for the model


how to test and evaluate a computational algorithm


how to access confidence of a prediction result

Dong’s top 10 list for

computational methods in BI

1.
Dynamic programming

2.
Neural network

3.
Hidden Markov Model

4.
Hypothesis test

5.
Bayesian statistics

6.
Clustering

7.
Information theory

8.
Support Vector Machine

9.
Maximum likelihood

10.
Sampling search (Gibbs, Monte Carlo, etc)

1.
“Solved” problems

2.
“Developed” areas with remaining
challenges hard to solve

3.
Developing areas

4.
Emergent areas

5.
Future directions

Research Areas

5

4

3

2

1


DNA sequence base calling and assembly


Pairwise sequence comparison


Protein secondary structure prediction


Disordered region in proteins


Transmembrane segment prediction


Subcellular localization


Signal peptide prediction


Protein geometry


Homology modeling


Physical/genetic mapping informatics

“Solved” Problems


Gene finding


Phylogenetic tree construction and evolution


Protein docking


Drug design


Protein design


Linkage analysis and quantitative traits (QTL)


Microarray data collection


Gene expression clustering



Developed


areas with
remaining challenges


Multiple sequence comparison and remote homolog
search


Repetitive sequence analysis


Protein structure comparison


Protein tertiary structure prediction


RNA secondary structure prediction


Regulatory sequence analysis


Computational proteomics


Protein interaction networks


Gene ontology and function prediction


Computational neural science and applications in various
species and systems (e.g., cancer)


Developing Areas


Pathway (regulatory network) prediction


ChIP
-
chip analysis


Tiling array analysis


Haplotype/SNP analysis


Computational comparative genomics


Text (literature) mining


Small RNA and anti
-
sense regulation


Alternative splicing prediction


Computational metabolomics

Emergent Areas


Genome semantics


Membrane protein structure prediction


RNA tertiary structure prediction


Post
-
translational modification


Dynamics of regulatory networks


Virtual cell/organism modeling


Phenotype
-
genotype relationship


… (nobody knows)


Possible future directions

Where the science is
going? (1)



Bioinformatics has been a “technology” to biological
research: Interpretation of data generated by bench
biologists


We start to see a trend that computational predictions
can guide experimental design


With more high
-
throughput technologies become
available, discovery
-
driven science will play increasingly
more important roles in biology research


With computational techniques continue to mature for
biological applications, we will see more and more
computational applications with powerful prediction
capabilities

Where the science is
going? (2)



Like physics, where
general rules and laws

are
taught at the start,
biology will surely be presented
to future generations of students as a set of basic
systems

....... duplicated and adapted to a very
wide range of cellular and organismic functions,
following basic evolutionary principles constrained
by Earth’s geological history.





--
Temple Smith,
Current Topics in Computational Molecular Biology


Major research centers (1)


National Center for Biotechnology Information
(NCBI) of NIH (http://www.ncbi.nlm.nih.gov/)


the home of many important databases including GenBank


the home of many important bioinformatics tools including
BLAST



European Molecular Biology Laboratory (EMBL)
(
http://www.embl
-
heidelberg.de/
)


has some of the most powerful research groups in
bioinformatics


Has numerous tools and databases

Major research centers (2)


Sanger Institute (http://www.sanger.ac.uk/)



The Institute for Gonomic Research (TIGR,
http://www.tigr.org/)



Swiss
-
Prot (http://www.tigr.org/)

Major research centers (3)

Major Universities in US



University of California at Santa Cruz


University of California at San Diego


Washington University


University of Southern California


Stanford University


Columbia University


Boston University


Harvard University


MIT


Virginia Tech


Major journals



Bioinformatics


Nucleic Acids Research


Genome Research


Journal of Computational Biology


Journal of Bioinformatics and Computational Biology


In silico Biology


Briefings in bioinformatics


Applied Bioinformatics


IEEE/ACM Transactions on Computational Biology and
Bioinformatics


Proteins: structure, function and bioinformatics


Journal of Computer Science and Technology


Genomics, Proteomics and Bioinformatics




Major conferences



Intelligent Systems for Molecular Biology (ISMB)


Annual Conference on Computational Biology (RECOMB)


IEEE/Computational Systems Bioinformatics Conference
(CSB)


Pacific Symposium on Biocomputing (PSB)


European Conference on Computational Biology (ECCB)


IEEE Conference on Biotechnology and Bioinformatics
(BIBE)


International Workshop on Genome Informatics (GIW)


Asia
-
Pacific Bioinformatics Conference (APBC)





Academicians


Michael Waterman


Phil Green


Gene Myers


Barry Honig



No Nobel Price Winner yet…

Discussions


Scope of the new biology (large
-
scale)


Technology (tool development) vs. science
(biological application)


Knowledge vs. prediction


Experimental vs. computational/theoretical


First principle vs. empirical / statistical


Automated vs. curated


One machine can do the work of fifty ordinary
men. No machine can do the work of one
extraordinary man.

Choosing Bioinformatics

as Career
-

1



Field outlook


Must be a believer of bioinformatics (for
its value to science)


Must have a strong motivation and
willing to walk extra miles (learn more
disciplines)


Technologist vs. technician


Choosing Bioinformatics

as Career
-

2



Molecular & cellular and evolutionary biology


understanding the science



Computational, mathematical, and statistical
sciences


mastering the techniques



High
-
throughput measurement technologies


Knowing what biological data are obtainable