biocyc-data-content - SRI International

fleagoldfishBiotechnology

Oct 2, 2013 (4 years and 1 month ago)

80 views

Data Content of the

BioCyc Databases



BioCyc Tier 1 Databases

SRI International

Bioinformatics

EcoCyc

Project


EcoCyc.org


E.

co
li

En
cyc
lopedia


Review
-
level Model
-
Organism Database for
E. coli


Tracks evolving annotation of the
E. coli

genome and cellular networks


The two paradigms of EcoCyc



“Multi
-
dimensional annotation of the
E. coli

K
-
12 genome”


Positions of genes; functions of gene products


76% / 66% exp


Gene Ontology terms; MultiFun terms


Gene product summaries and literature citations


Evidence codes


Multimeric complexes


Metabolic pathways


Regulation of gene expression and of protein activity

Nuc. Acids Res.

35:7577 2007

ASM News

70:25 2004
Science

293:2040


Karp, Gunsalus, Collado
-
Vides, Paulsen

SRI International

Bioinformatics


EcoCyc = E.coli Dataset +




Pathway/Genome Navigator



Genes: 4,492

Proteins: 4,479

Complexes: 895

RNAs: 285

Reactions:


Metabolic: 1394


Transport: 246

Pathways: 246

Compounds: 1,830

URL: EcoCyc.org

Gene Regulation:


Operons: 3,369


Trans Factors: 196


Promoters: 1,796

TF Binding Sites: 2,205

EcoCyc v13.6


Citations: 19,000


SRI International

Bioinformatics

EcoCyc Gene and Protein Information


Gene locations and protein functions updated
through literature curation and in collaboration
with RefSeq, EcoGene, and UniProt


EcoCyc curators author minireview summaries
for gene products, complexes, pathways, and
transcription units


Gene Ontology terms curated by EcoCyc and
imported regularly from UniProt


Protein features regulatory imported from UniProt


SRI International

Bioinformatics

EcoCyc Regulation


Multiple types of regulatory information present in
EcoCyc


Transcriptional regulation and operon organization


Attenuation


Regulation of translation by small RNAs and proteins


Regulation of protein activity by covalent and non
-
covalent
means



SRI International

Bioinformatics

Other E. coli Genomes in BioCyc


Currently BioCyc contains ~40 other
E. coli

and
Shigella

genomes


New genomes will be included from RefSeq as
BioCyc expands


SRI is building orthology
-
based curation tools
that will allow us to propagate curation from
EcoCyc to these other strains


SRI International

Bioinformatics

EcoCyc Accelerates Science


Experimentalists


E. coli

experimentalists


Experimentalists working with other microbes


Analysis of expression data


Computational biologists


Biological research using computational methods


Genome annotation


Study connectivity of E. coli metabolic network


Study phylogentic extent of metabolic pathways and enzymes in all
domains of life


Bioinformaticists


Training and validation of new bioinformatics algorithms


predict
operons, promoters, protein functional linkages, protein
-
protein
interactions,


Metabolic engineers


“Design of organisms for the production of organic acids, amino acids,
ethanol, hydrogen, and solvents “


Educators


SRI International

Bioinformatics

EcoliHub Resource


www.ecolihub.org



Hub search


Simultaneously searches 12 different
E. coli

databases



EcoliHub Omics


Omics data repository and analysis for
E. coli



EcoliHouse


Queryable MySQL server containing multiple
E. coli

databases



EcoliWiki


Community contributed content about
E. coli

SRI International

Bioinformatics

MetaCyc
:
Meta
bolic En
cyc
lopedia


Describe a representative sample of every experimentally
determined metabolic pathway


Describe properties of metabolic enzymes



Literature
-
based DB with extensive references and
commentary


Pathways, reactions, enzymes, substrates



Jointly developed by


P. Karp, R. Caspi, C. Fulcher, SRI International


L. Mueller, A. Pujar, Boyce Thompson Institute


S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research

2008

SRI International

Bioinformatics

MetaCyc Data
--

Version 14.0

Pathways

1,471

Reactions

8,409

Enzymes

6,198

Small Molecules

8,572

Organisms

1,861

Citations

22,459

SRI International

Bioinformatics

Taxonomic Distribution of

MetaCyc Pathways


version 13.1

Bacteria

883

Green Plants

607

Fungi

199

Mammals

159

Archaea

112

SRI International

Bioinformatics

MetaCyc Pathway Ontology


Provides a classification system for metabolic
pathways

SRI International

Bioinformatics


Biosynthesis [902]


Amino acids Biosynthesis [105]


Aromatic Compounds Biosynthesis [13]


Carbohydrates Biosynthesis [70]


Cell structures Biosynthesis [31]


Cofactors, Prosthetic Groups, Electron Carriers Biosynthesis [160]


Hormones Biosynthesis [40]


Fatty Acids and Lipids Biosynthesis [101]


Metabolic Regulators Biosynthesis [4]


Nucleosides and Nucleotides Biosynthesis [20]


Amines and Polyamines Biosynthesis [32]


Secondary Metabolites Biosynthesis [351]


Antibiotic Biosynthesis [20]


Fatty Acid Derivatives Biosynthesis [7]


Flavonoids Biosynthesis [70]


Nitrogen
-
Containing Secondary Compounds Biosynthesis [64]


Alkaloids Biosynthesis [43]


Phenylpropanoid Derivatives Biosynthesis [46]


Phytoalexins Biosynthesis [25]


Sugar Derivatives Biosynthesis [10]


Terpenoids Biosynthesis [103]


Siderophore Biosynthesis [7]


SRI International

Bioinformatics


Degradation/Utilization/Assimilation [639]


Alcohols Degradation [14]


Aldehyde Degradation [12]


Amines and Polyamines Degradation [40]


Amino Acids Degradation [113]


Aromatic Compounds Degradation [152]


C1 Compounds Utilization and Assimilation [24]


Carbohydrates Degradation [52]


Carboxylates Degradation [30]


Chlorinated Compounds Degradation [39]


Cofactors, Prosthetic Groups, Electron Carriers Degradation [2]


Fatty Acid and Lipids Degradation [18]


Inorganic Nutrients Metabolism [72]


Nitrogen Compounds Metabolism [15]


Phosphorus Compounds Metabolism [3]


Sulfur Compounds Metabolism [54]


Nucleosides and Nucleotides Degradation and Recycling [9]


Secondary Metabolites Degradation [58]


Nitrogen Containing Secondary Compounds Degradation [13]


Sugar Derivatives Degradation [31]


Terpenoids Degradation [10]


SRI International

Bioinformatics


Detoxification [16]


Acid Resistance [2]


Arsenate Detoxification [3]


Mercury Detoxification [1]


Methylglyoxal Detoxification [8]


SRI International

Bioinformatics


Generation of precursor metabolites and energy [124]


Chemoautotrophic Energy Metabolism [14]


Hydrogen Oxidation [2]


Electron Transfer [11]


Fermentation [34]


Glycolysis [6]


Methanogenesis [12]


Pentose Phosphate Pathways [4]


Photosynthesis [6]


Respiration [25]


Aerobic Respiration [9]


Anaerobic Respiration [14]


TCA cycle [9]


Tier 3 Databases

SRI International

Bioinformatics

Curation Level


EcoCyc and MetaCyc have many types of data
that you will not see in Tier 3 databases



Examples:


Regulation


Minireview summaries


Citations


GO terms


Protein features


SRI International

Bioinformatics

BioCyc Ortholog Data


Currently BioCyc ortholog data obtained from
CMR all
-
vs
-
all protein BLAST comparisons


Require bidirectional best BLAST hits, at least
10% identity, at least 40% similiarity, P
-
value
under 1



Not all organisms contain ortholog data currently


CMR lacks entries for some organisms


Some BioCyc genomes not obtained from CMR