What does one do in Bioinformatics? - Ccsf

weinerthreeforksBiotechnology

Oct 2, 2013 (3 years and 6 months ago)

72 views

Introduction to
Bioinformatics

Alexandra M Schnoes

Univ. California San Francisco

Alexandra.Schnoes@ucsf.edu

What is Bioinformatics?


Intersection of
Biology

and
Computers


Broad field


Often means different things to different people



Personal Definition:


The utilization of computation for biological
investigation and discovery

the process by which you
unlock the biological world through the use of
computers.

What does one
do

in
Bioinformatics?




(a small sample)


dsafd


dsafd

?


Our Lab
: Understanding Protein (Enzyme) Function

What does one
do

in
Bioinformatics?




(a small sample)


Discover new drug targets

computational docking

Atreya, C. E. et al. J. Biol. Chem. 2003;278:14092
-
14100

Shoichet, B. K. Nature. 2004;432:862
-
865

What does one
do

in
Bioinformatics?




(a small sample)


Systems Biology

sbw.kgi.edu/


www.sbi.uni
-
rostock.de/ research.html

This lab: Nucleotide & Protein
Informatics



Sequence analysis


Finding similar sequences


Multiple sequence alignment


Phylogenetic analysis

Sequence

却牵捴畲u

䙵湣瑩tn

Process of Evolution


Sequences change due to


Mutation


Insertion


Deletion

Use Evolutionary Principles to
Analyze Sequences


If sequence A and sequence B are similar


A and B evolutionarily related



If sequence A, B and C are all similar but A and B
are
more

similar than A and C and B and C.


A and B are more closely evolutionarily related to each
other than to C

Extremely Powerful Idea

1.
Start with
unknown

sequence


2.
Find what the
unknown

is similar to


3.
Use information about the
known

to make
predictions about the
unknown

How do you know when
sequences are similar?



Align two sequences together and score their similarity

TA
S
S
W
SY
I
V
E

TA
T
S
F
SY
L
V
G


Use substitution matrices to score the alignment

Substitution Matrices Give a
Score for Each Mutation


Many different matrices available


Blosum matrices standard in the field

Blosum 62 Scoring
matrix

http://www.carverlab.org/images/

Scoring: Add up the positional
Scores


Score of 30

TA
S
S
W
SY
I
V
E

TA
T
S
F
SY
L
V
G

TA
SSW
S
YIVE


TAT
S
FSYL
VG



Score of 1

Additional issues…


Gaps (insertions/deletions)


Have scoring penalties for opening and continuing a
gap

TASSWSYIVE

TA
S
S
W
SY
I
V
E

TATSFLVG

TA
T
S
F
--
L
V
G

How do we find similar
sequences?


Start at the National Center for Biotechnology
Information


http://www.ncbi.nlm.nih.gov/

How do we find similar
sequences?


Nucleotide Sequence Databases

How do we find similar
sequences?


Protein Sequence Databases

How do we find similar
sequences?


Similarity Search: BLAST


Basic Local Alignment Search Tool

BLAST is very quick but …


Only local alignments


Alignments aren’t great


Only pair
-
wise alignments

Want better alignments …


Multiple alignment


Multiple sequences


Better signal to noise


More Sequences = Better alignment


More accurate reflection of evolution


ClustalW


Commonly used


Easy to use


Visualize the Multiple Alignment


Use the Alignment
to Calculate
Evolutionary
Distances


See ‘how close’
sequences are to each
other


Best way to tell what is
‘most similar’


Can calculate simple tree
from clustalW

Taubenberger et al., Nature: 437, 889
-
893, 2005

Caveats!


In reality


Sequences (even
parts

of sequences) can evolve at
different rates


Don’t have a good understanding of sequence and
function


High sequence identity does not
always

mean the same
function


Getting good alignments and good trees can be very
hard

Bioinformatics: Sequence
Analysis

1.
Start with unknown sequence


2.
Find similar sequences


3.
Create alignment


4.
Create phylogenetic tree


5.
Use information about knowns to make
predictions about unknown

Mini Virus Intro



Often considered ‘not alive’


Extremely small (much smaller than a cell)


Cellular parasites


Has a genome but can only reproduce inside a host cell


Different Viruses


RNA & DNA viruses


Both single and double
-
stranded

Different Viruses


RNA & DNA viruses


Both single and double
-
stranded


Influenza Virus

Influenza Virus (flu)


Small genome

8 RNA molecules


Evolves quickly


genetic drift, antigenic shift

Influenza Virus (flu)


Sequencing

Reverse
Transcriptase


DNA


Sequencing


Genomic
Nucleotide
Sequence


Influenza Pandemics


1918 Flu


Killed from 50
-
100 Mil. people worldwide


Considered to be one of the most deadly pandemics


Killed many of the young and healthy


Influenza A, Type H1N1


Thought to have derived from Avian Influenza


Recently reconstituted from recovered human samples


Considerable ethical debate

Avian Influenza


Current fear of pandemic


High mortality rate (including young and healthy)


Current concern is Influenza A, Type H5N1


Still only transmitted by contact with birds


Is now in Asia and Eastern and Western Europe

This lab: Nucleotide & Protein
Informatics



Sequence analysis


Finding similar sequences


Multiple sequence alignment


Phylogenetic analysis