Topics in Bioinformatics

powerfultennesseeBiotechnology

Oct 2, 2013 (3 years and 10 months ago)

66 views

Proteins and Protein Function

Charles Yan

Spring 2006

2

Amino Acids


General structure of an amino acid








20 standard amino acids each with a different R group

3

Amino Acids

Amino Acid

3
-
letter

code

1
-
letter

code

Alanine

Ala

A

Arginine

Arg

R

Asparagine

Asn

N

Aspartate

Asp

D

Cysteine

Cys

C

Glutamine

Gln

Q

Glutamate

Glu

E

Glycine

Gly

G

Histidine

His

H

Isoleucine

Ile

I

Table 1. 20 standard

amino acids

4

Amino Acids

Amino Acid

3
-
letter

code

1
-
letter

code

Leucine

Leu

L

Lysine

Lys

K

Methionine

Met

M

Phenylalanine

Phe

F

Proline

Pro

P

Serine

Ser

S

Threonine

Thr

T

Tryptophan

Trp

W

Tyrosine

Tyr

Y

Valine

Val

V

Table 1. 20 standard amino acids (Cont.)

5

Amino Acids

Amino Acid

3
-
letter

code

1
-
letter

code

Asparagine
(N)
or aspart
ate (D)

Asx

B

Glutamine
(Q)
or glutam
ate (E)

Glx

Z

Any amino
acid

Xaa

X

Authority


IUPAC
-
IUB Joint Commission on Biochemical


Nomenclature.

Reference


IUPAC
-
IUB J
o
int Commission on Biochemical


Nomenclature.






Nomenclature


and


Symbolism


for


Amino


Acids


and


Peptides.





Eur. J. Biochem. 138:9
-
37(1984).

Amino Acid Abbreviations (IUPAC)

6

Proteins


T
wo separate amino acids
can be linked together by
a
peptide
bond







A chain of
amino acids
linked by peptide bonds is called a
polypeptide
.


A
protein

is made up of one or more polypeptide chains


For simplicity, in this course, a protein is a chain of amino acids
linked by peptide bonds, e.g.



VSQLLKQRVRYAPYLSKVRRAEELLPLFKHGQYIGWSGFTGVGAPKVI


7

Protein Database


UniProt (Universal Protein Resource) (
http://www.pir.uniprot.org/
) is
the world's most comprehensive catalog of information on proteins.
It is a collaboration between



Swiss Institute of Bioinformatics

(SIB)


Department of Bioinformatics and Structural Biology of the
Geneva University


European Bioinformatics Institute

(EBI)


Georgetown University

Medical Center's Protein Information
Resource (PIR)



It includes three components

8

Protein Database


UniProt Knowledgebase (UniProtKB):
the central access point
for extensive curated protein information.


UniProtKB/Swiss
-
Prot
: a manually annotated protein sequence
database which provide a high level of annotation, a minimal level
of redundancy and high level of integration with other databases.
UniProtKB/Swiss
-
Prot Release 48.7 of 20
-
Dec
-
2005: 204,086
entries


UniProtKB/TrEMBL:

a computer
-
annotated supplement of
Swiss
-
Prot that contains all the translations of EMBL nucleotide
sequence entries not yet integrated in Swiss
-
Prot.
UniProtKB/TrEMBL Release 31.7 of 20
-
Dec
-
2005: 2,506,886
entries


UniProt Reference Clusters (UniRef):

databases combine closely
related sequences into a single record to speed searches.


UniProt Archive (UniParc):
a comprehensive repository, reflecting
the history of all protein sequences

9

Protein Database

10

Protein Database


11

Protein Database


12

Protein Database


13

Protein Database


14


15

Gene Ontology

P
rotein
synthesis

Translation

Goal:
find all the
proteins
that are involved protein
synthesis

16

Gene Ontology

Volkswagen Golf

Golf


I like
golf.

Me
too!

17

Gene Ontology


Ontology


n. the branch of metaphysics dealing with the nature of being.


(The
New Oxford American Dictionary
, Edited by Elizabeth J.
Jewell, Frank Abate, Oxford University Press, 2001,pp 1197.)



Metaphysics


n. the branch of philosophy that deals with the first principles of
things, including abstract concepts such as being, knowing,
substance, cause, identity, time, and space.


(The
New Oxford American Dictionary
, Edited by Elizabeth J.
Jewell, Frank Abate, Oxford University Press, 2001,pp 1074.)


18

Gene Ontology


The Gene Ontology (GO)
(
http://www.geneontology.org/
)
project is a collaborative effort to address the need for

consistent descriptions of gene products

in different
databases. The project began as a collaboration between
three model organism databases: FlyBase (
Drosophila
),the
Saccharomyces

Genome Database (SGD) and the Mouse
Genome Database (MGD) in 1998. Since then, the GO
Consortium has grown to include many databases,
including several of the world's major repositories for
plant, animal and microbial genomes.

19

Gene Ontology


Develop structured, controlled vocabularies (ontologies)
that describe gene products


M
ake associations between the ontologies and the genes
and gene products in the collaborating databases,


D
evelop tools that facilitate the creation, maintainence
and use of ontologies



The use of GO terms facilitates uniform queries across
databases


20

Gene Ontology


The three components of GO are
molecular function
,
biological process

and
cellular component



GO terms are organized in structures called directed
acyclic graphs (DAGs), which differ from hierarchies in
that a child, or more specialized, term can have many
parent, or less specialized, terms

hexose biosynthesis


monosaccharide biosynthesis


hexose metabolism


21

Gene Ontology


The controlled vocabularies are structured so that you
can query them at different levels


GO browser AmiGO (http://www.godatabase.org/cgi
-
bin/amigo/go.cgi)

22


23

Protein function

Three steps to get a set of proteins that have a certain
function


Search for the GO term

(http://www.godatabase.org/cgi
-
bin/amigo/go.cgi)


Search for the proteins belong to a certain GO

(http://www.pir.uniprot.org/search/textSearch.shtml)


Save the sequence in FASTA format

24

Search for the GO

25

Search for the proteins belong to a certain GO


26

Save sequences in FASTA format