CS-7010, course introduction

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

75 views

CS 7010: Computational

Methods in Bioinformatics

(course introduction)


Dong Xu





Computer Science Department

109 Engineering Building West

E
-
mail: xudong@missouri.edu

573
-
882
-
7064

http://digbio.missouri.edu



Challenges of Our Civilization
-
1



top 125 unsolved problems in science over the next
quarter
-
century (
http://www.sciencemag.org/sciext/125th/
)


The Top 25


What Is the Universe Made Of?


What is the Biological Basis of Consciousness?


Why Do Humans Have So Few Genes?


To What Extent Are Genetic Variation and Personal Health Linked?


Can the Laws of Physics Be Unified?


How Much Can Human Life Span Be Extended?


What Controls Organ Regeneration?


How Can a Skin Cell Become a Nerve Cell?

Challenges of Our Civilization
-
2


How Does a Single Somatic Cell Become a Whole Plant?


How Does Earth's Interior Work?


Are We Alone in the Universe?


How and Where Did Life on Earth Arise?


What Determines Species Diversity?


What Genetic Changes Made Us Uniquely Human?


How Are Memories Stored and Retrieved?


How Did Cooperative Behavior Evolve?


How Will Big Pictures Emerge from a Sea of Biological Data?

Challenges of Our Civilization
-
3


How Far Can We Push Chemical Self
-
Assembly?


What Are the Limits of Conventional Computing?


Can We Selectively Shut Off Immune Responses?


Do Deeper Principles Underlie Quantum Uncertainty
and Nonlocality?


Is an Effective HIV Vaccine Feasible?


How Hot Will the Greenhouse World Be?


What Can Replace Cheap Oil
--

and When?


Will Malthus Continue to Be Wrong?


Lecture Outline


What does bioinformatics do?


Course topics


Course Organization


Workload/grades



Technical Definitions

NIH (http://www.bisti.nih.gov/)

Bioinformatics
: “research, development, or
application of computational tools and
approaches for expanding the use of biological,
medical, behavioral or health data, including
those to
acquire,
represent
,
describe
, store,
analyze
, or visualize

such data”.

Computational Biology
: “the development and
application of data
-
analytical and theoretical
methods, mathematical modeling and
computational simulation techniques to the
study of
biological, behavioral, and social
systems
”.

Scope of Bioinformatics:

Studying Biology on Computer

data management
;
data mining
;
modeling
;
prediction
;
theory formulation








engineering
aspect

scientific
aspect

bioinformatics

an indispensable part of biological science

with its own methodology

genes,
proteins
,
protein complexes
,
pathways
,
cells,

organisms
,
ecosystem

computer science, biology, statistics

physics, mathematics, chemistry, engineering,…

Why Bioinformatics is So
Hot
? (I)


More than 80 universities offer graduate
degrees in bioinformatics


At cross
-
section of two most active fields:
computer science and molecular biology


Exponential

growths in computer
technologies (hardware, Internet) pave the
way for bioinformatics development

Why Bioinformatics is So
Hot
? (II)

Analytical technology



High
-
throughput data



Biological knowledge



Medicine & bioengineering

What Can Computing

Do for Biology?


Data interpretation in analytical technologies


Data management and computational
infrastructure


Discovery from data mining


Modeling, prediction and design


Theoretical /
in silico

biology


Almost cover every area of computer science

Data Interpretation

in Analytical Technologies (I)


Analytical technologies are the driving force of
new (large
-
scale) biology:


DNA sequencing (genomics)


X
-
ray / NMR structure determination (structural
genomics)


Protein identification using mass spectrometry
(proteomics)


Microarray chips (functional genomics)

i+4

i+3

i+2

i

i
-
1

i+1

C

R

H

N

N

C

C

C

R

O

H

H

H

H

N

C

C

O

H

H

N

C

C

O

H

H

C

R

H

NMR spectra

peak assignment

structural

restraint

extraction

protein structure

structure calculation

Data Interpretation

in Analytical Technologies (II)

NMR protein structure determination

Data Interpretation

in Analytical Technologies (III)


From image to data (imaging processing)



Large
-
scale data cannot be handled without computer



Noisy data (optimization with under
-
constraint / over
-
constraint)



Computer algorithms/programs can mimic human
interpretation process and do it much faster



Automation of experimental data interpretation



Data Management and
Computational Infrastructure


Track instruments, experiment conditions and
results at each step of a complicated biological
experiment (LIMS at modern wet labs)



Data storage and retrieval (database)



Data visualization



Data query and analysis pipeline



Discovery from Data Mining (I)


Pattern/knowledge discovery from data


many biological data are generated by biological processes
which are not well understood


interpretation of such data requires discovery of convoluted
relationships hidden in the data


which segment of a DNA sequence represents a gene, a regulatory
region


which genes are possibly responsible for a particular disease



Complicated data


Large
-
scale, high
-
dimension


Noisy (false positives and false negatives)




Discovery from Data Mining (II)

Modeling, Prediction
and Design (I)


Modeling and prediction of biological
objects/processes


modeling of biochemistry


enzyme reaction rates


modeling of biophysics


dynamics of biomolecules


modeling of evolution


prediction of phylogeny



Prediction of outcomes of biological processes


computing will become an integral part of modern biology through an
iterative process of









From prediction to engineering design


Protein structure prediction to protein engineering


Design genetically modified species


model
formulation

computational
prediction

experimental
validation

Modeling, Prediction
and Design (II)

Theoretical /
In Silico

Biology


Generate new hypothesis, formulate and
test fundamental theories of biology


new hypothesis about detailed evolutionary
history, through mining genomic sequence data?


new hypothesis about a particular signaling
network, through data mining?


new hypothesis about protein folding pathways,
through simulations?


Bioinformatics Application to

Biological Systems

plants
(
Arabidopsis)

bacteria

(
Synechococcus
)

viruses

(SARS)

yeast

(
Saccharomyces cerevisia)

neural systems

(neurons)

Can Biology Help Computing?


Computational techniques inspired by biology:


Neural network (artificial intelligence)


Genetic algorithm, automata


A new driver of computer science:


Better hardware (supercomputers)


New data representation



Develop new theoretical framework:


DNA computing


Network communication


(communication between ants, see
http://news
-
service.stanford.edu/news/2003/may7/antchat
-
57.html
)

Computing
versus

Biology


what computer science is to molecular biology is
like what mathematics has been to physics ......






--

Larry Hunter, ISMB’94



molecular biology is (becoming) an information
science .......





--

Leroy Hood, RECOMB’00



Bioinformatics is still in its infancy!

Lecture Outline


What does bioinformatics do?


Course topics


Course Organization


Workload/grades



Course Topics


Data interpretation in analytical technologies


Data management and computational
infrastructure


Discovery from data mining


Modeling, prediction and design


Theoretical /
in silico

biology



Cover classical/mainstream bioinformatics
problems from computer science prospective

Course Schedule






o

See
http://digbio.missouri.edu/cs7010/


First take home exam:


--
given on 9/29; due on 10/6


Second take home exam:


--
given on 11/17; due on 11/29


Three phases of project:


--
9/22, 10/20,
11/17, final report due 12/8

What I Will Teach


A general introduction to a few major problems in the
field of bioinformatics


problems definitions: from biological problem to computable problem


some key computational techniques


A way of thinking: tackling “biological problem”
computationally


how to look at a biological problem from a computational point of view


how to formulate a computational problem to address a biological issue


how to collect statistics from biological data


how to build a computational model


how to design algorithms for the model


how to test and evaluate a computational algorithm


how to access confidence of a prediction result

New Ways of Thinking


Critical thinking


Analytical thinking


Quantitative thinking


Algorithmic thinking

Lecture Outline


What does bioinformatics do?


Course topics


Course Organization


Workload/grades



A Brief Survey


Register for the course?


Academic department?


Computer background?


Biology background?


Statistical background?


Taken another bioinformatics course?


Prerequisites


CS 2050 (Algorithm Design and Programming
II) or equivalent training


Statistics 2500 (Introduction to Probability and
Statistics I) or equivalent training


Programming skills in any programming
language are required


No biology background is necessary

Course Info


Co Instructor:
Trupti Joshi
(joshitr@missouri.edu)


Course Web Site:

http://digbio.missouri.edu/cs7010/


Reference Books
-

1


• Neil C.Jones and Pavel A. Pevzner: An
Introduction to Bioinformatics Algorithms
(Computational Molecular Biology). MIT Press,
2004.

• Pavel Pevzner: Computational Molecular
Biology
-

An Algorithmic Approach. MIT Press,
2000.

• Current Topics in Computational Molecular
Biology, edited by Tao Jiang, Ying Xu, and
Michael Zhang. MIT Press. 2002.



Reference Books
-

2



Pierre Baldi and Soren Brunak: Bioinformatics


The Machine Learning Approach (second
edition). MIT Press, 2001.



• Dan Gusfield: Algorithms on Strings, Trees,
and Sequences. Cambridge University Press.
1997.

• Warren J. Ewens and Gregory R. Grant:
Statistical Methods in Bioinformatics


An
Introduction. Springer. 2001.

• Terry Speed: Statistical analysis of gene
expression of gene expression microarray data.
Chapman&Hall/CRC. 2003.


Lectures


3:30pm


4:45pm, Tuesday and Thursday


Powerpoint sides for each lecture (posted
before the lecture)


Questions/answers in the beginning and
end of lecture


Discussions are encouraged during the
lecture (A topic discussion may be at the
end of a lecture)

Office Hours


4:45pm
-
5:35pm, Tuesdays and Thursdays


The instructor who deliver the lecture will give
the office hour


Dong Xu: Room 109, Engineering Building West
(882
-
7064)


Trupti Joshi: Room 317, Engineering Building
North (884
-
3528)


Special office hours will be arranged close to the
final


Appointments at other time

Lecture Outline


What does bioinformatics do?


Course topics


Course Organization


Workload/grades



Minimum Requirement


Attend class regularly


Read suggested class handout after class


Deliver the two take
-
home exams


Deliver final project (for graduate
students)



Expected workload: 5
-
6 hours / week in
addition to class attendance

How to Get Maximum

out of the Course


Study suggested reading/slide
before

class


Study optional reading


Ask questions on class


Frequent visits at office hours


Perform homework assignments (not graded)



Not required (not counted in the final grade) but
encouraged.


Grading


A final grade of A, B, C, etc. will be assigned,


2 take
-
home exams (20% each)



Project : 3 Phase Reports (5% each), Final Report
(15%), Software Demo (15%), Presentation (15%)


Final project


A working bioinformatics program that can be used
by biologists or comprehensive computational
analysis on bioinformatics tool outputs


One student one project (independent development)
with consultation from instructors


Potential for publication

Three Phases of Project


Phase 1 (due 9/22): Define your project
subject. A brief literature survey and
illustration of its importance.


Phase 2 (due 10/20): Describe key
methods.


Phase 3 (due 11/17): Present key results.



Final report: due 12/8


Discussion

What do you expect from this course?


-

content?


-

ways of teaching?


-

how the instructors can help?


-


Assignments


Suggested reading:


http://bioinfo.mbb.yale.edu/e
-
print/whatis
-
mim/text.pdf



Bioboxes in “Neil C.Jones and Pavel A. Pevzner: An
Introduction to Bioinformatics Algorithms (Computational
Molecular Biology). MIT Press, 2004.”


Optional reading:


Chapter 1 in “
Current Topics in Computational
Molecular Biology, edited by Tao Jiang, Ying Xu, and
Michael Zhang. MIT Press. 2002.”


http://www.ncbi.nih.gov/About/primer/bioinformatics.html