Bioinformatic Databases

signtruculentBiotechnology

Oct 2, 2013 (3 years and 11 months ago)

70 views

On line (DNA and amino acid)
Sequence Information

Lecture 9


Introduction


Annotation of genes


Basic bioinformatics Databases


NCBI home page


Query and return results


DNA sequence results page


Protein sequence results page

Bioinformatcs

Databases


The Biological data, generated by various labs, is
submitted and stored in specific databases is :


The data is

Nucleotide
: DNA and mRNA (
cDNA
)
and
Proteins
sequences


The main “primary”
nucleotide

sequence
databases are:


United states:
Genebank

(NCBI)


Europe:
Nucleotide sequence database

(EMBL)


Japan:
DNA databank of Japan
.


These databases also contain sequences related
to:


Expressed sequence tags (ESTs) small (
800
bp
) of
mRNA and can be used to see what genes are
expressed…


Protein Databases


The main protein databases is:


Uniprot
: (universal Protein resource)


Uniprot

(KB)

databases contains data from


SWISS
-
PROT (most up
-
to date information)


Trembl: (translation of coding sequences.)


PIR

database



Both the nucleotide and databases contain much
more detail than sequences and the detail is
referred to annotation.




Annotation of sequences


Once the gene sequence’s have been
determined then the data must be annotated:
(Klug 2010)


Identify regulatory regions


Other sequences of interest:
exons
/
introns
,
coding sequences (
cds
),
polyA

signal


In protein annotation there are mRNA sequences


Other organisms where the DNA sequence/ AA
sequence is to found


Journals/Reference to where data came from.




5

Global Sequence

Bioinformatics Database


Bioinformatic

Databases contain information for
various biological data:


To
faciliate

finding information there are a
number of specific search engines:


NCBI has
ENTREZ


EMBL has
SRS



Consider the following query:


What is the DNA and amino acid sequence for the
following gene:
Human BTEB



more
detail on the terms can be found by looking at a
sample record:
http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord






NCBI
Entrez

search page


Nucleic Record


Coding section of gene


The Exon intron structure is also available in graphic form

Protein records


Other databases databases


The nucleotide (
Genbank

and EMBL) and protein
(
Uniprot
) contain the “raw data” and are referred
to as primary databases.


More specific databases derive data from these
and are referred to as secondary database;
examples include protein family and sequence
similarity databases such as
PROSITE

and
PRINTS


There are databases which contain information
about specific organisms such as e. coli using
Genome online database (
GOLD
)

Other databases


Databases for specific types of sequences such
as those associated with promoters and other
regulatory elements.


Others include structural databases from the
Protein Data Bank


On
-
line
Mendelian

inheritance of man

(OMIM) which contains information on human
genes and genetic disorders.

Bioinformatics Search Engines


The
Entrez

(NCBI) search engine
retrives

information from NCBI databases and can be
used to obtain other information including
publications (
Pubmed
), 3D protein structures,
online
mendellian

inheritance of Man…. A
tutorial can be found at:


Entrez
: Making use of its power:


The EMBL uses
ExPASy

site which utilises the
open source application: Sequence
retrival

system: a tutorial can be found at:


SRS
tutotial
:
quick tour



Other important information sources


PUBMED
: Literature research: journal articles/
conference proceedings/ books etc.


Search under many fields: keyword, author….


Returns: journal articles/abstracts


Two types: general/review.



NCBI account: set up an NCBI account to manage
previous searches….



BTEB
pubmed

search found at:


http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&c
md=DetailsSearch

BTEB

pubmed

search result