Bioinformatics and Phylogenetic Analysis - the laboratory for ...

powerfultennesseeBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

63 views

Bioinformatics and
Phylogenetic Analysis

Edgar Scott

Multicampus Bioinformatics
Education Specialist

What is Bioinformatics


Interdisciplinary field that combines
principles and techniques from
computer science, probability and
statistics, and linguistics to the study of
genomic and proteomic sequences.


Biological database for storing and
organizng DNA and protein sequences


Computational tools for analyzing
sequences

Phylogenetic Analysis and
Bioinformatics


Phylogenetics


study of evolutionary
relationships


Phylogenetic trees used to represent
evolutionary relationships


Use of protein or DNA sequences to detect
relationships versus morphological characters


Bioinformatics provides both sequence
repositories and sequence analysis software.

Overview


Acquiring Data Set


Text searching at the National Center for
Biotechnology Information (NCBI)


Sequence similarity and homology


Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)


Analyzing Data Set


Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software


Build multiple sequence alignments of sequences using
ClustalW


Build phylogenetic trees

Text Searching at NCBI


NCBI maintains provides molecular
information and bioinformatic tools to
the scientific community


GenBank


an archival DNA and protein
sequence database


RefSeq


a curated DNA and protein
sequence database


Entrez Gene


a gene centered database

Sequence Similarity and
Homology


Homology


sequence that share a common
ancestral sequence


Paralogs


arise via gene duplication


Orthologs


arise via speciation event


Xenologs


arise via gene transfer


Evolutionarily related sequences have similar
sequences.


Sequence differences correspond to amount
of change that has occurred since they last
shared a common ancestral sequence.

Sequence Alignments


Sequence Alignment


a process that identifies a
series of characters or character patterns that are in
the same order in both sequences.


Pairwise Global alignment


Pairwise Local alignment


Optimal alignment


an alignment between
sequences in which the number of matching
characters are maximized and the mismatching
characters are minimized.


Quantifying alignments


Alignment score of the optimal alignment


Percent identity scores


Percent similarity scores



Sequence Similarity Searching


Basic Local Alignment Search Tool (BLAST)


Blastp, Blastn, Blastx, Tblastn, & TblastX


Local alignments are reported


Expectation Value


the number of times an
investigator can expect to find an alignment
that has an alignment score as good or better
than the alignment score under
consideration.

Steps to Build a Tree


Build a multiple sequence alignment of
data set.


Analyze multiple sequence alignment
using either distance based methods or
character based methods.


Molecular Evolutionary
Genetics Analysis (MEGA) 3.1


Phylogenetic Analysis program


Constructs multiple sequence alignment using
ClustalW


Provides tree building methods


Distance based Methods


UPGMA


Neighbor
-
joining method


Minimum Evolution


Character based Method


Maximum Parsimony


Provides a great help document!


Multiple Sequence Alignment


Multiple Sequence Alignment


an alignment
between three or more sequences.


Computationally classified as NP
-
hard


Programs


ClustalW


fast, applies a progressive method


T
-
Coffee


slower, applies an advanced
progressive method


Dialign


slow, applies an iterative method


Combine


combines multiple sequence
alignments


Tree Building methods


UPGMA, Neighbor
-
Joining, Minimum Evolution


Distance based methods


Analyze the multiple sequence alignment to
calculate a distance matrix.


Clustering algorithm analyzes the distance matrix
to determine which sequences should be
clustered.


Maximum parsimony


Character based method


Analyze the multiple sequence alignment to create
a tree whose tree length has been minimized.



Tree Reliability


Bootstrapping


method for assessing
the reliability of trees.


Steps


The original data set is resampled several
times (e.g. 1000).


For each resampling, a tree is built


The trees created from the resampling
iterations are compared to the original
tree.



Review


Acquiring Data Set


Text searching at the National Center for
Biotechnology Information (NCBI)


Sequence similarity and homology


Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)


Analyzing Data Set


Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software


Build multiple sequence alignments of sequences using
ClustalW


Build phylogenetic trees