10/3/2013 1:47 AM
Introduction to Bioinformatics
databases: Nucleic Acid
Databases
Dinesh
Gupta
ICGEB
10/3/2013 1:47 AM
Biological databases: why?
•
Need for storing and communicating
large datasets has grown
•
Make biological data available to
scientists.
•
To make biological data available in
computer
-
readable form.
10/3/2013 1:47 AM
Different classifications of
databases
•
Type of data
–
nucleotide sequences
–
protein sequences
–
proteins sequence patterns or motifs
–
macromolecular 3D structure
–
gene expression data
–
metabolic pathways
10/3/2013 1:47 AM
Different classifications of databases….
•
Primary or derived databases
–
Primary databases: experimental results
directly into database
–
Secondary databases: results of analysis of
primary databases
–
Aggregate of many databases
•
Links to other data items
•
Combination of data
•
Consolidation of data
10/3/2013 1:47 AM
Different classifications of databases….
•
Technical design
–
Flat
-
files
–
Relational database (SQL)
–
Exchange/publication technologies (FTP,
HTML, CORBA, XML,...)
10/3/2013 1:47 AM
Different classifications of databases….
•
Availability
–
Publicly available, no restrictions
–
Available, but with copyright
–
Accessible, but not downloadable
–
Academic, but not freely available
–
Proprietary, commercial; possibly free for
academics
10/3/2013 1:47 AM
Where do I get DB of my interest ?
10/3/2013 1:47 AM
10/3/2013 1:47 AM
http://www3.oup.co.uk/nar/database/c/
10/3/2013 1:47 AM
Nucleotide sequence databases
•
EMBL, GenBank, and DDBJ are the
three
primary nucleotide sequence
databases
•
EMBL
www.ebi.ac.uk/embl/
•
GenBank
www.ncbi.nlm.nih.gov/Genbank/
•
DDBJ
www.ddbj.nig.ac.jp
10/3/2013 1:47 AM
Genbank
•
An annotated collection of all publicly
available nucleotide and proteins
•
Set up in 1979 at the LANL (Los Alamos).
•
Maintained since 1992 NCBI (Bethesda).
•
http://www.ncbi.nlm.nih.gov
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
EMBL Nucleotide Sequence
Database
•
An annotated collection of all publicly available
nucleotide and protein sequences
•
Created in 1980 at the
European Molecular
Biology Laboratory
in Heidelberg.
•
Maintained since 1994 by EBI
-
Cambridge.
•
http://www.ebi.ac.uk/embl.html
10/3/2013 1:47 AM
10/3/2013 1:47 AM
http://www3.ebi.ac.uk/Services/DBStats/
10/3/2013 1:47 AM
DDBJ
–
DNA Data Bank of Japan
•
An annotated collection of all publicly available
nucleotide and protein sequences
•
Started, 1984 at the
National Institute of
Genetics
(NIG) in Mishima.
•
Still maintained in this institute a team led by
Takashi Gojobori.
•
http://www.ddbj.nig.ac.jp
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
Other NCBI nucleic acids DBs
•
EST database:
A collection of expressed sequence tags, or short, single
-
pass sequence
reads from mRNA (cDNA).
•
GSS database
: A database of genome survey sequences, or short, single
-
pass genomic
sequences.
•
HomoloGene:
A gene homology tool that compares nucleotide sequences between pairs of
organisms in order to identify putative orthologs.
•
HTG database:
A collection of high
-
throughput genome sequences from large
-
scale
genome sequencing centers, including unfinished and finished sequences.
•
SNPs database:
A central repository for both single
-
base nucleotide substitutions and
short deletion and insertion polymorphisms.
•
RefSeq:
A database of non
-
redundant reference sequences standards, including genomic
DNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both within
NCBI and with external groups, supports data
-
gathering efforts.
•
STS database:
A database of sequence tagged sites, or short sequences that are
operationally unique in the genome.
•
UniSTS:
A unified, non
-
redundant view of sequence tagged sites (STSs).
•
UniGene:
A collection of ESTs and full
-
length mRNA sequences organized into clusters,
each representing a unique known or putative human gene annotated with mapping and
expression information and cross
-
references to other sources.
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
Sequence submission
•
Data mainly direct submissions from the
authors.
•
Submissions through the Internet:
–
Web forms.
–
Email.
•
Sequences shared/exchanged between
the 3 centers on a daily basis:
–
The sequence content of the banks is
identical.
10/3/2013 1:47 AM
Derived databases
•
CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
•
Genetic Codes Deviations from the standard genetic code in various
organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
•
TIGR Gene Indices Organism
-
specific databases of EST and gene
sequences
http://www.tigr.org/tdb/tgi.shtml
•
UniGene Unified clusters of ESTs and full
-
length mRNA sequences
http://www.ncbi.nlm.nih.gov/UniGene/
•
ASAP Alternative spliced isoforms
http://www.bioinformatics.ucla.edu/ASAP
•
Intronerator Introns and alternative splicing in C.elegans and
C.briggsae
http://www.cse.ucsc.edu/~kent/intronerator/
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
Nucleic acid structure
databases
•
NDB Nucleic acid
-
containing structures
http://ndbserver.rutgers.edu/
•
NTDB Thermodynamic data for nucleic acids
http://ntdb.chem.cuhk.edu.hk/
•
RNABase RNA
-
containing structures from PDB and
NDB
http://www.rnabase.org/
•
SCOR Structural classification of RNA: RNA motifs by
structure, function and tertiary interactions
•
http://scor.lbl.gov/
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
10/3/2013 1:47 AM
Database searching tips
•
Look for links to
Help
or Examples
•
Try
Boolean
searches
•
Be careful with UK/US
spelling
differences
–
leukaemia vs leukemia
–
haemoglobin vs hemoglobin
–
colour vs color
10/3/2013 1:47 AM
Exercises
•
Study
the
statistics
of
the
three
primary
nucleic
acid
databases
:
Are
they
matching
?
•
Look
for
a
gene
of
your
interest
in
the
three
primary
nucleic
acid
databases
:
compare
the
information
given
in
each
one
of
them
.
•
Read
NAR
DB
paper
and
NAR
DB
index
site
:
search
for
different
nucleic
acid
databases
based
on
different
search
terms
.
•
Self study:
–
http://www3.oup.co.uk/nar/database/c/
–
Download NAR database paper (NARDB2004) from:
ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο