General - Microarray Gene Expression Data Society

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 8 months ago)

62 views

SRI International

Bioinformatics

Ontologies for Gene Expression




History of ontologies in bioinformatics


BioOntologies Consortium



Ontologies for the biochemical networks that
control gene expression



SRI International

Bioinformatics

Ontologies



Clear thinking about how to structure information



Clearly understand each field in a database


Formal and informal
definitions

for database
elements


Type of value, range of values


Product

field of
Gene

class can be a
Protein

or an
RNA



Ability to
enforce

data correctness


Ability to
compute

with database elements in a
reliable

fashion

SRI International

Bioinformatics

History of Ontologies in

Bioinformatics



1994 Meeting on Interoperation of Molecular
Biology Databases (MIMBD
-
94)



BioOntologies meetings in 1997, 1998, 1999, 2000,
2001


Ontology tutorials at ISMB conference


BioOntologies Consortium


SRI International

Bioinformatics

BioOntologies Consortium



Concerned with ontology infrastructure for
bioinformatics



Exchange of ontologies


Beware: All bioinformatics ontologies expressed in different
ontology language



Software for constructing, interpreting, applying
ontologies



http://bioontology.ingenuity.com/



SRI International

Bioinformatics

BioOntologies Consortium


ISMB
-
2000 paper evaluating ontology exchange
languages for bioinformatics


SRI International

Bioinformatics

BioOntologies Consortium


ISMB
-
2000 paper evaluating ontology exchange
languages for bioinformatics


Define criteria for evaluating existing languages


No existing languages satisfy all criteria


Desired: XML syntax, frame semantics



1999: Karp and Chaudhri develop XOL language



2000: OIL/DAML succeeds XOL



SRI International

Bioinformatics

BioOntologies Consortium



Potential Interactions



Standards and tools


DAML/OIL


SRI’s GKB Editor ontology editor



Collaborate on ontology development



Post ontologies on BioOntologies web site


SRI International

Bioinformatics

Be Precise About Ontology Uses



Data submission


Data exchange among databases


High
-
level database design



Mapping from ontologies to database
management systems essential


Beware of flatfiles


Beware of XML


SRI International

Bioinformatics

ArrayExpress



Ontology for specifying experiments



MAML import and export



SQL query access



SRI International

Bioinformatics

EcoCyc

Project Overview


E.

co
li

En
cyc
lopedia and model organism database


Tracks the evolving annotation of the
E. coli

genome


Over 3000 literature citations


Collaborative development via internet


Karp (SRI)
--

Bioinformatics architect


Riley (MBL)
--

Metabolic pathways, signal transduction


Saier (UCSD) and Paulsen (TIGR)
--

Transport


Collado (UNAM)
--

Regulation of gene expression



Ontology: 1000 biological classes


Database content: 16,000 instances



Over 3,300 registered users

SRI International

Bioinformatics

Encoding Transcriptional

Regulation in EcoCyc
--

Goals



Capture transcriptional regulatory mechanisms within a well
structured ontology


Provide a training set for inference of gene networks


Interpret gene
-
expression datasets in the context of known
regulatory mechanisms



Compute with regulatory mechanisms and pathways


Summary statistics


Pattern discovery


Complex queries


Consistency checking


SRI International

Bioinformatics

Pathway Tools Extensions

for Transcriptional Regulation



Integration of RegulonDB (Collado
et al.
)



Regulation ontology



Editing tools for regulatory interactions



New visualizations


SRI International

Bioinformatics

EcoCyc Ontology for

Transcriptional Regulation


Terminology: Transcription Unit


Definition: A set of coding regions and associated control
regions that yield a single transcript


“Operons” must have more than one gene


Prokaryotic terminology



Key features of ontology


Model gross structure of transcription units, transcription
factors, RNA polymerase


Model all molecular interactions as biochemical reactions


Binding of transcription factors to ligands and to DNA sites


Binding of RNA polymerase to promoter

SRI International

Bioinformatics

Ontology for Transcriptional

Regulation


Current Limitations




Focused on prokaryotic regulation



Mechanisms based on control of transcription
initiation only, e.g., no attenuation




SRI International

Bioinformatics

Ontology for Regulatory

Interactions


Common slots


Citations, Comment, Common
-
Name, Synonyms


Class DNA
-
Regions


Left
-
End
-
Position, Right
-
End
-
Position, Relative
-
Start
-
Distance


Class Transcription
-
Units


Components (Promoter, transcription
-
factor binding sites, genes,
terminator)


Class Promoters


Component
-
Of


Promoter
-
Strength
-
Exp, Promoter
-
Strength
-
Seq


Promoter
-
Evidence

SRI International

Bioinformatics

Ontology for Regulatory

Interactions


Class DNA
-
Binding
-
Sites


Component
-
Of


Regulated
-
Promoter, Relative
-
Center
-
Distance


Type
-
Of
-
Evidence



Classes Protein
-
Complexes, Polypeptides


Components / Component
-
Of



Class Binding
-
Reactions


Reactants


Activators


Inhibitors

SRI International

Bioinformatics

EcoCyc Ontology for

Transcriptional Regulation


One DB object defined for each biological entity
and for each molecular interaction


site001

pro001

trpE

trpD

trpC

trpB

trpA

trpL

Int003

RpoSig70

TrpR*trp

Int001

trpLEDCBA

trp

apoTrpR

Int005

SRI International

Bioinformatics

Integration of RegulonDB


RegulonDB has been loaded into EcoCyc


RegulonDB originally relational


Lisp loader tools developed for relational table dumps



Statistics:


528 transcription units


620 promoters


617 DNA binding sites


83 transcription factors



SRI International

Bioinformatics

Consistency Checks on

RegulonDB Data



Find transcription units containing:


Undefined components


No gene components


Genes that are not contiguous


Genes with conflicting transcription directions


SRI International

Bioinformatics

Interactive Editing Tools



SRI created interactive tools for creating and
modifying regulatory mechanisms



Ongoing updates to RegulonDB occur in EcoCyc



SRI International

Bioinformatics

Visualization Capabilities



Transcription units


Transcription unit containing a gene: araA


Details of a transcription unit


Regulons: CRP, NARL


Pathway control


Overview: show rxns controlled by a TF (CRP, FNR), show
other rxns controlled by same TF(s) (use a rxn in purine
biosyn)


SRI International

Bioinformatics

Characterization of the E. coli

Genetic Network



551 transcription units include 1115 (25%) genes



Controlled by 86 transcription factors



All experimentally determined





SRI International

Bioinformatics

Genes per Transcription Unit

SRI International

Bioinformatics

Binding Sites per Transcription Unit

SRI International

Bioinformatics

Transcription Factor Reach

SRI International

Bioinformatics

Transcription Units per Pathway

SRI International

Bioinformatics

Pathways per Transcription Unit

SRI International

Bioinformatics

Visualization of the Full

E. coli Genetic Network



Influences of transcription factors on other
transcription factors


50 of 85 TFs do not affect other TFs


Maximum network depth of 3


Only CRP has a branching factor greater than 2


No feedback loops other than autoregulation


Negative auto
-
regulation is the dominant form of
feedback