Introduction to Bioinformatics - RVC Bioinformatics Resource

disturbedtonganeseBiotechnology

Oct 2, 2013 (3 years and 11 months ago)

195 views

Introduction to Bioinformatics

http://
bioinformatics.rvc.ac.uk

Victoria
Offord


PhD Workshop


December 6
th

2012

What is bioinformatics?

Goal


Answer biological questions computationally (
in
silico
)


To discover new biological insights using both computers and biology

Definition


Bioinformatics is the computational analysis and storage of biological data


Interface between computer sciences and life sciences

Why do we need bioinformatics
?


High
-
throughput technologies generate vast amounts of data




Biological questions have become more complex and require algorithms which
take into account their size and complexity





Models are required to infer relationships between components of complex
biological relationships

Storage!

Analysis!

Interpretation!

What can we use bioinformatics for?

BIOINFORMATICS

Algorithms

Protein

Modelling

Genome
Mapping

Microarray

Analysis

Phylogenetics

Genome

Annotation

Machine
Learning

Database
Mining

Genes and Genomes

There are over 3 billion
nucleotides in the
human genome

Human Genome


~60,000 genes


~22,000 proteins


>50,000,000 variations

Genetics Tools & Resources

Genome Browsers


Ensembl (www.ensembl.org)


UCSC Genome Browser (
http
://genome.ucsc.edu
/
)


NCBI Resources
(
www.ncbi.nlm.nih.gov
)


Nucleotide


Protein


Expressed Sequence Tag (EST)


Single Nucleotide Polymorphism (SNP)


Taxonomy


BLAST (http://
blast.ncbi.nlm.nih.gov/Blast.cgi)


Primers


Primer3 (
http://frodo.wi.mit.edu/
)


Primer
-
BLAST (
www.ncbi.nlm.nih.gov/tools/primer
-
blast/
)


Nucleotide Translation


Transeq

(
www.ebi.ac.uk/Tools/st/
)


Sequence Management


CLC Main
Workbench (http://
www.clcbio.com/index.php?id=92)

NCBI Databases

GenBank


Open access, annotated collection of all publically available
genomic DNA
,
collated genes (
UniGene
), transcripts, proteins and expressed sequence tags
(ESTs)

RefSeq


Open access, annotated collection
of
curated

publically available genomic
DNA, transcripts and proteins


Provides a
single reference sequence
per molecule (RNA/DNA/protein) for
major organisms


GenBank

Records

Always double check the
annotation
!!!

Ensembl

Genome Browser

Always double check the
Ensembl

release number
!!!

Release 69 (Oct 2012)


Transcript ID
ENSCAFT00000018059


2 exons encoding 1422 amino acids

Release 60 (Nov 2010)


Transcript ID
ENSCAFT00000018059


1 exon encoding 858 amino acids

Basic Local Alignment Search Tool

(BLAST)


Used to identify an unknown sequence or find similar sequences


blastn



nucleotide query / nucleotide database


blastp



protein query / protein database


blastx



translated nucleotide query / protein database


tblastx



protein query / translated nucleotide database


tblastn

-

translated nucleotide query / translated nucleotide
database


Protein
vs

Nucleotide BLAST

When possible query the protein sequence!

Evolution /
Phylogenetics


The genetic connections and relationships between species


Compare specific characters/sequences from different species


Assume that species with similar characters/sequences are genetically close


The term for these evolutionary relationships is
phylogeny


Evolutionary relationships are often presented as
phylogenetic trees

Phylogenetics

Tools & Resources

Sequence Alignment


ClustalW2 (
www.ebi.ac.uk/Tools/msa/clustalW2
)


MAFFT (
http://mafft.cbrc.jp/alignment/server/
)


T
-
Coffee
(
www.tcoffee.org/Projects/tcoffee
/
)


MUSCLE (
www.ebi.ac.uk/Tools/msa/muscle
/
)


Phylogenetics

Analysis Packages


MEGA (
www.meagsoftware.net
)


Parsimony Programs


PAUP (
http://paup.csit.fsu.edu/
)


PhyliP

(
http://evolution.genetics.washington.edu/phylip.html
)


Phylogenetics

Tree Viewers


TreeView

(
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
)


MacClade

(
www.macclade.org/
)


Phylogenetics

Tree Databases


TreeBASE

(
www.treebase.org/treebase
-
web/home.html
)

What is a protein?

Protein Statistics

0
20
40
60
80
100
120
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
Millions

Total
Yearly
0
2
4
6
8
10
12
14
16
18
20
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
Hundreds

Total
Yearly
Protein Structures

Protein Folds

Protein Sequences

0
20
40
60
80
100
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
Thousands

Total
Yearly
Domain Identification Resources

Conserved Domain Database (CDD)

www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

SMART

(
http://smart.embl
-
heidelberg.de
/)

Pfam

http://pfam.sanger.ac.uk/search/sequence

SignalP

(signal peptides)

http://www.cbs.dtu.dk/services/SignalP/

TMHMM
(
transmembrane

helices)

http://www.cbs.dtu.dk/services/SignalP/

Comparative
Modelling


Use known structures from the Protein Data Bank (PDB) to generate predicted
models of unknown protein structures





B

D

C

A

Protein Tools & Resources

Protein Databases


Protein Data Bank (
www.rscb.org
)


Uniprot

(
www.uniprot.org
)


Identifying Protein Domains


Interpro

(
www.ebi.ac.uk/interpro/
)


Conserved
Domains Database
(
www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml
)


Transmembrane Helices
-

TMHMM (
www.cbs.dtu.dk/services/TMHMM
/
)


Signal Peptides
-

SignalP

(
www.cbs.dtu.dk/services/SignalP
/
)


Secondary Structure Prediction and Alignment


PSIPRED (
http://bioinf.cs.ucl.ac.uk/psipred
/
)


Hhpred

(
http://toolkit.tuebingen.mpg.de/hhpred
)


Comparative Modelling


MODELLER (
http://salilab.org/modeller
/
)


I
-
TASSER (
http://zhanglab.ccmb.med.umich.edu/I
-
TASSER
/
)


Model Validation


SAVES (
http://nihserver.mbi.ucla.edu/SAVES
/
)

RVC Bioinformatics Resource

http://bioinformatics.rvc.ac.uk


Upcoming Events


News


RVC Publication Search


Bioinformatics Resources


Helpdesk


FAQs

Introduction to Bioinformatics

http://bioinformatics.rvc.ac.uk


Be able to design primers


Navigate many of the key biological databases


Understand how to identify sequences using BLAST


Understand sequence alignment techniques


Be able to build a phylogenetic tree


Understand the principles and techniques for comparative protein
modelling

Wednesday 16
th

January 2013

1
-
5pm Room S79

This talk can be found under:

Resources
-
> Talks and Presentations
-
> RVC Presentations