Overview - Bioinformatics Research Group at SRI International

raviolirookeryΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

128 εμφανίσεις

Overview of the

Pathway Tools Software

and

Pathway/Genome Databases



SRI International

Bioinformatics

Introductions



BRG Staff


Peter Karp


Tomer Altman


Joe Dale


Fred Gilham


John Myers


Suzanne Paley


Markus Krummenacker


Ingrid Keseler


Ron Caspi


Alex Shearer


Carol Fulcher



Attendees


Where from, what genome?


What do you hope to get out of the tutorial?

SRI International

Bioinformatics

SRI International


Private nonprofit
research institute



No permanent funding
sources



1300 staff in Menlo
Park




Founded in 1946 as Stanford Research Institute



Separated from Stanford University in 1970



Name changed to SRI International in 1977

SRI International

Bioinformatics

SRI Organization

Information and

Computing Sciences

Engineering Systems

And Sciences

Physical

Sciences

Biopharmaceuticals

And

Pharmaceutical

Discovery

Education

and

Policy

Bioinformatics Research Group

SRI International

Bioinformatics

Research in the SRI

Bioinformatics Research Group


BioCyc Database Collection


EcoCyc


MetaCyc


Pathway Tools


BioWarehouse


SRI International

Bioinformatics

Outline for Tutorial


Monday


Introduction


Pathway/Genome Navigator


Introduction to Pathway/Genome Editors


Tuesday


PathoLogic tutorial


PathoLogic lab session


Build initial version of PGDB


Pathway hole filler lecture+lab


Wednesday


PathoLogic: Creating protein complexes, operon predictor, transport inference
parser


Pathway Tools Schema


Model organism database projects


Thursday


Advanced Pathway/Genome Editors


Friday


Overviews and Omics Viewers


Comparative analysis


Structured Advanced Query Form


Metabolite Tracing


Regulation

SRI International

Bioinformatics

Tutorial Goals


General familiarity with Pathway Tools goals and
functionality



Ability to create, edit, and navigate a new PGDB



Create new PGDB for genome(s) you brought with
you



Familiarity with information resources available
about Pathway Tools to continue your work

SRI International

Bioinformatics

SRI’s Support for Pathway Tools


NIH grant finances software development and
user support



Additional grants finance other software
development



Email us bug reports, suggestions, questions



Comprehensive bug reports are required for us to
fix the problem you reported



Keep us posted regarding your progress



SRI International

Bioinformatics

Administrative Details


Please wear badge at all times


Escort required outside this room/hallway


Let us know when you are leaving



Use E
-
Bldg Entrance


Phone numbers to call from entrance



Meals



Restrooms

SRI International

Bioinformatics

Tutorial Format


Questions welcome during presentations



Lab sessions will take different amounts of time
for different people


Refine your PGDB


Read Pathway Tools manuals



Computer logins



Internet connectivity


SRI International

Bioinformatics



Pathway/Genome Database

Chromosomes

Plasmids

Genes

Proteins

RNAs

Reactions

Pathways

Compounds

CELL

Operons

Promoters

DNA Binding Sites

Regulatory Interactions

Sequence Features

SRI International

Bioinformatics

BioCyc Collection of

Pathway/Genome Databases


Pathway/Genome Database (PGDB)


combines information about


Pathways, reactions, substrates


Enzymes, transporters


Genes, replicons


Transcription factors/sites, promoters,
operons



Tier 1: Literature
-
Derived PGDBs


MetaCyc


EcoCyc
--

Escherichia coli
K
-
12



Tier 2: Computationally
-
derived DBs,
Some Curation
--

20 PGDBs


HumanCyc


Mycobacterium tuberculosis



Tier 3: Computationally
-
derived DBs,
No Curation
--

349 DBs


SRI International

Bioinformatics

Terminology


Pathway Tools Software


PathoLogic


Predicts operons, metabolic network, pathway hole fillers, from genome


Computational creation of new Pathway/Genome Databases



Pathway/Genome Editors


Distributed curation of PGDBs


Distributed object database system, interactive editing tools



Pathway/Genome Navigator


WWW publishing of PGDBs


Querying, visualization of pathways, chromosomes, operons


Analysis operations


Pathway visualization of gene
-
expression data


Global comparisons of metabolic networks

Bioinformatics 18:S225 2002

SRI International

Bioinformatics

Pathway Tools Software:

PGDBs Created Outside SRI


1000+ licensees: 75+ groups applying software to 150+ organisms



Saccharomyces cerevisiae, SGD project, Stanford University


pathway.yeastgenome.org/biocyc
/


Mouse, MGD, Jackson Laboratory


dictyBase, Northwestern University


Under development:


CGD (
Candida albicans
), Stanford University


Drosophila,
P. Ebert in collaboration with FlyBase


C. elegans,
P. Ebert in collaboration with WormBase


Planned:


RGD (Rat), Medical College of Wisconsin



Arabidopsis thaliana,
TAIR, Carnegie Institution of Washington


Tomato and Potato, Cornell University


GrameneDB, Cold Spring Harbor Laboratory


Medicago truncatula
, Samuel Roberts Noble Foundation



SRI International

Bioinformatics

Pathway Tools Software:

PGDBs Created Outside SRI


NIAID BRCs: BioHealthBase (
M. tuberculosis, F. tuleremia
), PATRIC, ApiDB
(
Cryptosporidium
)


F. Brinkman, Simon Fraser Univ,
Pseudomonas aeruginosa


V. Schachter, Genoscope,

Acinetobacter


M. Bibb, John Innes Centre,

Streptomyces coelicolor


G. Church, Harvard,

Prochlorococcus marinus,
multiple strains


E. Uberbacher, ORNL and G. Serres, MBL,
Shewanella onedensis


R.J.S. Baerends, University of Groningen,
Lactococcus lactis

IL1403,
Lactococcus lactis

MG1363,
Streptococcus pneumoniae

TIGR4,
Bacillus
subtilis

168,
Bacillus cereus

ATCC14579


Matthew Berriman, Sanger Centre,
Trypanosoma brucei, Leishmania major


Herbert Chiang, Washington University,
Bacteroides thetaiotaomicron


Sergio Encarnacion, UNAM,
Sinorhizobium meliloti


Gregory Fournier, MIT,
Mesoplasma florum


Mark van der Giezen, University of London,
Entamoeba histolytica, Giardia
intestinalis



Michael Gottfert, Technische Universitat Dresden,
Bradyrhizobium
japonicum


Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil,
Chromobacterium violaceum

ATCC 12472


Kenneth J. Kauffman, University of California, Riverside,
Desulfovibrio
vulgaris

SRI International

Bioinformatics

Pathway Tools Software:

PGDBs Created Outside SRI


Mike McLeod, University of British Columbia,
Rhodococcus

sp.
RHA1


Robert S. Munson, Children's Research Institute, Ohio,
Haemophilus ducreyi, Haemophilus influenzae

86
-
026NP


John Nash, Canadian NRC,
Campylobacter jejuni


Christopher S. Reigstad, Washington University,
Escherichia coli

UTI89


Haluk Resat, Pacific Northwest Lab,
Rhodobacter sphearoides


Gary Xie, Los Alamos Lab,
Bacillus cereus



Large scale users:


C. Medigue, Genoscope, 107 PGDBs


G. Burger, U Montreal, 48 PGDBs


Bart Weimer, Utah State University,
Lactococcus lactis, Brevibacterium linens,
Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii,
Listeria monocytogenes



Partial listing of outside PGDBs at BioCyc.org

SRI International

Bioinformatics

Terminology


“Database” = “DB” = “Knowledge Base” = “KB” =
“Pathway/Genome Database” = “PGDB”

SRI International

Bioinformatics

Why Create PGDBs?



Extract more information from your genome



Create an up
-
to
-
date computable information repository
about an organism



Perform analyses on the genome and pathway complement
of the organism


Analyses of omics data


Analyses of cellular systems (dead
-
end metabolites)


Reports generated by Pathway Tools



Perform comparative analyses with other organisms



Generate a genome poster and metabolic wall chart


SRI International

Bioinformatics

Sequence Project Workflow

Raw Sequence

Phred

Phrap

BLAST, BLOCKS

GeneMark/Glimmer

PathoLogic

P/G Navigator

P/G Editors

WWW Publishing

Analyses

Pathway

Tools

SRI International

Bioinformatics

EcoCyc

Project


EcoCyc.org


E.

co
li

En
cyc
lopedia


Review
-
level Model
-
Organism Database for
E. coli


Tracks evolving annotation of the
E. coli

genome and cellular networks


The two paradigms of EcoCyc



“Multi
-
dimensional annotation of the
E. coli

K
-
12 genome”


Positions of genes; functions of gene products


76% / 66% exp


Gene Ontology terms; MultiFun terms


Gene product summaries and literature citations


Evidence codes


Multimeric complexes


Metabolic pathways


Regulation of transcription initiation

Nuc. Acids Res.

35:7577 2007

ASM News

70:25 2004
Science

293:2040


Karp, Gunsalus, Collado
-
Vides, Paulsen

SRI International

Bioinformatics

Paradigm 1:

EcoCyc as Textual Review Article


All gene products for which experimental literature
exists are curated with a minireview summary


Found on protein and RNA pages, not gene pages!


3257 gene products contain summaries


Summaries cover function, interactions, mutant
phenotypes, crystal structures, regulation, and more



Additional summaries found in pages for operons,
pathways



EcoCyc cites 15,880 publications

SRI International

Bioinformatics

Paradigm 2: EcoCyc as

Computational Symbolic Theory


Highly structured, high
-
fidelity knowledge
representation provides computable information


Each molecular species defined as a DB object


Genes, proteins, small molecules


Each molecular interaction defined as a DB object


Metabolic reactions


Transport reactions


Transcriptional regulation of gene expression


220 database fields capture extensive properties
and relationships


SRI International

Bioinformatics

EcoCyc Procedures


DB updates performed by 5 staff curators


Information gathered from biomedical literature


Enter data into structured database fields


Author extensive summaries


Update evidence codes


Corrections submitted by
E. coli

researchers



Four releases per year



Quality assurance of data and software


Evaluate database consistency constraints


Perform element balancing of reactions


Run other checking programs

SRI International

Bioinformatics

MetaCyc
:
Meta
bolic En
cyc
lopedia


Describe a representative sample of every experimentally
determined metabolic pathway


Describe properties of metabolic enzymes



Literature
-
based DB with extensive references and
commentary


Pathways, reactions, enzymes, substrates



Jointly developed by


P. Karp, R. Caspi, C. Fulcher, SRI International


L. Mueller, A. Pujar, Cornell Univ


S. Rhee, P. Zhang, Carnegie Institution

Nucleic Acids Research

2008

SRI International

Bioinformatics

MetaCyc Data
--

Version 11.6

Pathways

1010


Reactions

6,576

Enzymes

4,582

Small Molecules

6,561

Organisms

1,077

Citations

15,875

SRI International

Bioinformatics

Taxonomic Distribution of

MetaCyc Pathways

Bacteria

517

Green Plants

372

Mammals

90

Fungi

89

Archaea

65

SRI International

Bioinformatics

Family of Pathway/Genome

Databases

MetaCyc

EcoCyc

CauloCyc

AraCyc

MtbRvCyc

HumanCyc

SRI International

Bioinformatics

Comparison of BioCyc to KEGG:

The Data


KEGG approach: Static collection of pathway diagrams that
are color
-
coded to produce organism
-
specific views



KEGG vs MetaCyc: Resource on literature
-
derived pathways


KEGG pathway maps are composites of pathways in many organisms
--

do not identify what specific pathways elucidated in what organisms


KEGG pathway maps encompass multiple biological pathways; are 2
-
4
times the size of MetaCyc pathways


KEGG has no literature citations, no summaries, less enzyme detail



KEGG vs BioCyc organism
-
specific PGDBs


KEGG re
-
annotates entire genome for each organism


KEGG does not curate or customize pathway networks for each organism


SRI International

Bioinformatics

Comparison of Pathway Tools to

KEGG: The Software



KEGG has no pathway hole filler or transport inference
parser or operon predictor



KEGG has no interactive editing tools


you cannot refine a
KEGG pathway DB



KEGG has no algorithmic visualization tools


pathway
diagrams are pre
-
drawn


May become out of date


Cannot show pathways at multiple detail levels



KEGG genome browser has very limited functionality


KEGG has one overview diagram with limited functionality


KEGG has no metabolite tracing tool


KEGG has no Structured Advanced Query Tool

SRI International

Bioinformatics

Overviews and Omics Viewers


Genome
-
scale Visualizations


Metabolic map


Transcriptional regulatory network


Genome map



Overlay gene expression, proteomics, metabolomics data


Obtain pathway based visualizations of omics data


Numerical spectrum of expression values mapped to a color spectrum


Steps of overview painted with color corresponding to expression level(s)
of genes that encode enzyme(s) for that step



SRI International

Bioinformatics

Environment for Computational
Exploration of Genomes



Powerful ontology opens many facets of the
biology to computational exploration



Global characterization of metabolic network


Analysis of interface between transport and
metabolism


Nutrient analysis of metabolic network



SRI International

Bioinformatics

Pathway Tools Implementation Details


Allegro Common Lisp


Sun, Linux, Windows, Macintosh platforms



Ocelot object database



370,000+ lines of code



Lisp
-
based WWW server at BioCyc.org


Manages 370+ PGDBs


SRI International

Bioinformatics

The Common Lisp Programming

Environment


Gatt studied
Lisp and Java
implementation
of 16 programs
by 14
programmers
(Intelligence
11:21 2000)


SRI International

Bioinformatics

Survey



Please complete survey at end of each day

SRI International

Bioinformatics

PGDB(s) That You Build




Before you leave


Tar up your PGDB directory and FTP it home, email it home,
or copy it to flash disk


We will create a backup copy of your PGDB directory if the
directory is still there at the end of the tutorial


Delete the PGDB directory if you don’t want us to back it up


We will not give the backed up data to anyone else

SRI International

Bioinformatics

Information Sources


Pathway Tools User’s Guide


/root/aic
-
export/pathway
-
tools/ptools/11.5/doc/manuals/userguide.pdf


NOTE: Location of the aic
-
export directory can vary across different computers



Pathway Tools Web Site


Publications, FAQ, programming examples, etc.


http://bioinformatics.ai.sri.com/ptools/


BioCyc Publications Page


http://biocyc.org/publications.shtml


MetaCyc Guide


http://metacyc.org/MetaCycUserGuide.shtml



Slides from this tutorial


http://bioinformatics.ai.sri.com/ptools/tutorial/



BioCyc Webinars


http://biocyc.org/webinar.shtml


SRI International

Bioinformatics

Reporting Pathway Tools Problems


ptools
-
support@ai.sri.com



Tell us:


What platform you are running on


What version of Pathway Tools you are running


The error message


Result of [1] EC(2)
:zoom :count :all


What operation were you performing when the error occurred?



New patches automatically downloaded and loaded with
PTools starts up



Auto
-
Patch


Tools
-
> Instant Patch
-
> Download and Activate All Patches

SRI International

Bioinformatics

Summary



Pathway Tools and Pathway/Genome Databases


Not just for pathways!


Computational inferences


Operons, metabolic pathways, pathway hole fillers


Editing tools


Analysis tools: Omics data on pathways


Web publishing of PGDBs



Main classes of users:


Develop PGDB to extract more information from genome for
genome paper


Develop a model
-
organism DB for the organism that is
updated regularly and published on the web