Protein functions prediction - Swiss EMBnet

fleagoldfishΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 11 μήνες)

81 εμφανίσεις

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Protein functions prediction

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Introduction


Signal peptides


Transmembrane regions
and topology


PTM (post
-
translational
modifications)


Low complexity and
biased regions


Repeats


Coils


Secondary structure


Antigenic peptides


Domain/Motifs


Tools


The EMBOSS package

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Different techniques


Algorithms


Sliding window, Nearest Neighbor


Patterns, regular expression


Weight matrices


HMM, profiles


Neural Networks


Rules

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Sliding window

THISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQW


Score
1

Score
2

Score
n

Width or Size=11, Step=5

Results are usually displayed
as a graph, see example
-
>

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Patterns / regular expression


Pattern:
<A
-
x
-
[ST](2)
-
x(0,1)
-
{V}


Regexp:
^A
.
[ST]{2}
.?
[^V]


Text:

The sequence must start with an
alanine
,
followed by any amino acid
,
followed
by a serine or a threonine, two times
,
followed by any amino acid or nothing
,
followed by any amino acid except a valine.


Simply the syntax differ…

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Weight matrices (PSSM)

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

HMM / profiles

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Neural Networks

General principle:



Example:

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Signals found in proteins


N
-
ter


exportation
-

secretion


mitochondria


chloroplast


internal


NLS (nuclear localization
signal)



C
-
ter


GPI
-
anchor (Glycosyl
Phosphatidyl Inositol)


other membrane
anchors (see PTM)


other unknown ?

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Signals detection tools


SignalP


MitoProt


ChloroP


Predotar


PSort


TargetP


Sigcleave (EMBOSS)


Big
-
PI


DGPI

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Transmembrane regions


Detection (signal peptide, hydropathy, helices)


Organisation (topology)

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Transmembrane detection tools


TMHMM


TMPred


TopPred2


DAS


HMMTop


Tmap (EMBOSS)



Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Post translational modifications


Phosphorylation


S
-

T
-

Y


N
-
glycosylation


N


O
-
glycosylation


S
-

T
-

(HO)K


Acetylation, methylation


D
-

E
-

K


Sulfation


Y


Farnesylation, myristylation,
palmitoylation,
geranylgeranylation, GPI
-
anchor


C
-

Nter
-

Cter


Ubiquitination and family


K
-

Nter


Inteins (protein splicing)


Pre
-
translational


Selenoprotein


C

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

PTM detection


Pattern prediction
(PROSITE)


Short or weak signal


Frequent hit producer


Best method is experimental


MS/MS detection


Most method use «

rules

»
joining pattern detection and
knowledge to predict sites.


NetOGlyc
-

Prediction of type O
-
glycosylation sites in mammalian
proteins


DictyOGlyc
-

Prediction of GlcNAc
O
-
glycosylation sites in
Dictyostelium


YinOYang
-

O
-
beta
-
GlcNAc
attachment sites in eukaryotic
protein sequences


NetPhos
-

Prediction of Ser, Thr
and Tyr phosphorylation sites in
eukaryotic proteins


NMT
-

Prediction of N
-
terminal N
-
myristoylation


Sulfinator
-

Prediction of tyrosine
sulfation sites



Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Low complexity regions


repeats


compositional bias


PEST

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Low complexity / Repeats


DUST (DNA) / SEG


de novo detection


RepeatMasker (DNA)


search collection


REP


search collection


REPRO, Radar


de novo detection


PEST, PESTFind


de novo detection


EMBOSS (DNA)


einverted


equicktandem


etandem


palindrome


EMBOSS (protein)


oddcomp

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Coils


Helix of helix


coiled
-
coil


Leu
-
zipper

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Coils detection


COILS


Weight matrices


Paircoil, Multicoil


Pairwise correlation


Marcoil


HMM


Pepcoil (EMBOSS)


Weight matrices



Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Secondary structure


Structure to predict


Alpha
-
helices


Beta
-
sheets


Turns


Random coil


Garnier (EMBOSS)


PHD


DSC


PREDATOR


NNSSP


Jpred


Jnet


Many others

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Antigenic peptide


Peptides binding to MHC


class I


8, 9, 10 mers


class II


15 mers (3+9+3)


Depend highly on MHC type



Use of experimental
knowledge


Databases of known
peptides



SYFPEITHI


HLA_Bind (BIMAS)


MAPPP combined expert


Antigenic (EMBOSS)


Many more


Prediction of proteasome
cleavage sites


NetChop


PaProc


Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Domain / Motif


All the protein domain
descriptors


PROSITE


PFAM


SMART


PRODOM


BLOCKS


PRINTS





Federation: InterPro


Many techniques


Patterns, Regexp


PSSM (PSI
-
BLAST)


Profiles


HMM


Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Other Tools


You can find some of them on our servers


www.ch.embnet.org


Or on ExPASy server


www.expasy.org/tools


Or ask Google!!


www.google.com

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

European Molecular Biology Open Software Suite

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

How to use EMBOSS/Jemboss at SIB

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10


Free Open Source (for most Unix plateforms)


GCG successor (compatible with GCG file format)


More than 150 programs (ver. 2.7.1)


Easy to install locally


but no interface, requires local databases


Unix command
-
line only


Interfaces


Jemboss, www2gcg, w2h, wemboss… (with account)


Pise, EMBOSS
-
GUI, SRSWWW (no account)


Staden, Kaptain, CoLiMate, Jemboss (local)


Access: www.emboss.org

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10


Format USA


'
asis
'
::

Sequence

[start : end : reverse]


Format :: '
@
'
ListFile

[start : end : reverse]


Format :: '
list
'
:

ListFile

[start : end : reverse]


Format ::
Database

: Entry [start : end : reverse]


Format ::
Database

-

SearchField

:

Word

[start : end : reverse]


Format ::
File

: Entry [start : end : reverse]


Format ::
File

:

SearchField

:

Word

[start : end : reverse]


Format ::
Program

Program
-
parameters '
|
' [start : end : reverse]


Example:
fasta::Swissprot:UBP5_HUMAN[200:300]



Databases


Any can be added, use
showdb

to display the available databases


Some details

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10


showdb

Displays information on the currently available databases

# Name Type ID Qry All Comment

# ==== ==== == === === =======

ipr_fetch P OK OK OK InterPro current by fetch

ipi_fetch P OK OK OK IPI current by fetch

refseq_fetch P OK OK OK refseq current by fetch

repbase_fetch P OK OK OK repbase current by fetch

swiss_fetch P OK OK OK SwissProt current by fetch

swissprot P OK OK OK SWISSPROT sequences

trembl P OK OK OK TREMBL sequences

trembl_fetch P OK OK OK trembl current by fetch

tremblnew P OK OK OK TREMBL New sequences

ug_fetch P OK OK OK Unigene by fetch

embl N OK OK OK EMBL release

emhum N OK OK OK EMBL release, Human section by emboss index

emrod N OK OK OK EMBL release, Rodent section by emboss index

emvrt N OK OK OK EMBL release, Vertebrate (nonhuman, nonrodent)



seqret (seqretall, seqretset, seqretsplit)


entret (for complete untouched entry, e.g., for unigene, interpro, swissprot…)


Possible to define your own «

.embossrc

» file

databases

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10


Some tools for DNA


redata

Search REBASE for enzyme name, references, suppliers etc


remap

Display a sequence with restriction cut sites, translation etc


restover

Finds restriction enzymes that produce a specific overhang


restrict

Finds restriction enzyme cleavage sites


showseq

Display a sequence with features, translation etc


silent


Silent mutation restriction enzyme scan


cirdna

Draws circular maps of DNA constructs


lindna


Draws linear maps of DNA constructs


revseq

Reverse and complement a sequence




Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Example: remap


ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes.


Hin6I


TaqI | HhaI


| Bsc4I | Bsu6I


| | Hin6I | BssKI


| | | HhaI AciI | | BsiSI


\

\

\

\

\

\

\

\


GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT


10 20 30 40 50 60


----
:
----
|
----
:
----
|
----
:
----
|
----
:
----
|
----
:
----
|
----
:
----
|


CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA


/ / / / / / / ///


| TaqI | Hin6I AciI | | ||BssKI


Bsc4I HhaI | | |BsiSI


| | Bsu6I


| Hin6I


HhaI

# Enzymes that cut Frequency

Isoschizomers


AciI


1



Bsc4I


1



BsiSI


1



BssKI


1



Bsu6I


1



HhaI


2



Hin6I


2

HinP1I,HspAI


TaqI


1


# Enzymes that do not cut

AclI BamHI BceAI Bse1I BshI ClaI EcoRI EcoRII

Hin4I HindII HindIII HpyCH4IV KpnI NotI

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Example: cirdna


File: ../../data/data.cirp

Start 1001

End 4270

group

label

Block 1011 1362 3

ex1

endlabel

label

Tick 1610 8

EcoR1

endlabel

label

Block 1647 1815 1

endlabel

label

Tick 2459 8

BamH1

endlabel

label

Block 4139 4258 3

ex2

endlabel

endgroup

group

label

Range 2541 2812 [ ] 5

Alu

endlabel

label

Range 3322 3497 > < 5

MER13

endlabel

endgroup



Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Example: plotorf

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

EMBOSS format input/output


UFO Universal Feature Object


gff, swissprot, embl, pir, nbrf (with or without sequence)


Alignments


Multiple and pairwise, many flavors (FASTA, MSF, SRS…)


Reports


Feature (UFO), SRS, motif, seqtable, excel, diffseq, listfile (USA),
etc…


Sequences (compatible with USA)


Many!!! E.g., fasta, clustal, gcg, paup, gff, embl, swissprot, acedb,
abi, etc…

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Web interfaces


PISE (Pasteur Institute Software Environment)


http://www
-
alt.pasteur.fr/~letondal/Pise/


EMBOSS
-
GUI (Canada)
(not yet at SIB)


http://bioinfo.pbi.nrc.ca/~lukem/EMBOSS/


Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Pise
a tool to generate Web interfaces for Molecular Biology programs

http://emboss.ch.embnet.org/Pise

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

http://bioinfo.pbi.nrc.ca:8090/EMBOSS/

GUI (Canada)

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Launch Jemboss

http://emboss.ch.embnet.org/Jemboss

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Launch Jemboss

First time only…

Each time…

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Jemboss windows

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Jemboss windows other systems

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Summary


Anonymous web access through Pise


Registered access through Jemboss


Registered access through command
-
line
(requires UNIX skills)




Please report problems!

Swiss Institute of Bioinformatics

Institut Suisse de Bioinformatique

LF
-
2003.10

Exercises


DEA Exercises web based sequence analysis


The goal of this exercise is to use web based tools for protein sequence analysis


a) Take this TrEMBL sequence (
Q9X252
) and try a BLAST against swissprot with the complete protein or
with the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.


b) Pass this sequence through PFSCAN and search all databases. Compare with this command on
ludwig
-
sun1/2:

hits
-
b "prf pat pfam" tr:Q9X252


c) use the different profile, motifs, pattern databases to get more information about the domain(s) you
found.


d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (
Q9WZH0
)?


List of useful links:


basic BLAST

or
advanced BLAST
or
PSI
-
BLAST


TMPred

prediction tool for transmembrane regions (or
TMHMM
)


COILS

prediction tool for coiled
-
coil regions


SignalP

prediction tool for signal
-
peptide cleavage site


Profile, domain, motifs databases and search sites:


PFSCAN


InterPro

(
Pfam
,
PRINTS
,
PROSITE,

SMART
)


HITS