Database Modeling in Bioinformatics - The Gene Ontology

dasypygalstockingsBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

129 views

The Gene Ontology project

Jane Lomax

Ontology
(for our purposes)


“an explicit specification of some topic”


Stanford Knowledge Systems Lab



Includes:


a vocabulary of terms (names)


defined logical relationships to each






Compile structured vocabularies describing


aspects of molecular biology



Describe gene products using vocabulary terms


(annotation)



Develop tools:



to query and modify the vocabularies and


annotations



annotation tools for curators

GO Project Goals:


Molecular Function



elemental activity or task



Biological Process



broad objective or goal



Cellular Component



location or complex


The Three Ontologies


Molecular Function



elemental activity or task

nuclease, DNA binding, transcription factor


Biological Process



broad objective or goal



Cellular Component



location or complex


The Three Ontologies


Molecular Function



elemental activity or task

nuclease, DNA binding, transcription factor


Biological Process



broad objective or goal

mitosis, signal transduction, metabolism



Cellular Component



location or complex


The Three Ontologies


Molecular Function



elemental activity or task

nuclease, DNA binding, transcription factor


Biological Process



broad objective or goal

mitosis, signal transduction, metabolism



Cellular Component



location or complex

nucleus, ribosome, origin recognition complex

The Three Ontologies

DAG Structure

Directed acyclic graph: each child may
have one or more parents

Every path from a node back to the root
must be biologically accurate

The True Path Rule

True Path Rule

Chitin biosynthesis

Chitin catabolism

chitin metabolism


Cuticle synthesis

Cell wall biosynthesis

GO process

chitin metabolism


Cuticle biosynthesis

Cell wall biosynthesis

New GO Terms


cell wall chitin
biosynth.

cell wall chitin
catab.

cuticle chitin
biosynth.

cuticle chitin
catab

cell wall chitin
metab.

chitin catabo
-
lism

chitin biosynthesis

cuticle chitin
metab.

GO process

cell wall bio
-
synthesis (fungi)

chitin metabolism

cuticle synthesis

cell wall chitin
catab.

chitin catabo
-
lism

chitin metabolism

cell wall chitin
metab.

cell wall bio
-
synthesis



is
-
a

subclass; a is a type of b



part
-
of

physical part of (component)

subprocess of (process)

Relationship Types



Not a way to unify biological databases



Not a dictated standard



Does not define evolutionary relationships



Additional ontologies needed to model


biology and experimentation

What GO is NOT:



Names of gene products



Protein domains



Protein sequence features



Phenotypes; diseases



Anatomical terms
(except as part of terms
generated by cross
-
products)

Terms outside the Scope of GO

Advantages of GO


Cross
-
species comparisons


already used by an increasing number of databases


More comprehensive


many terms per gene product


not a strict hierarchy: many
-
to
-
many relationships possible


Simplify querying


Uses restricted vocabulary developed by curators and
annotators


Use of evidence codes




Database object: gene or gene product



GO term ID



Reference


publication or computational method



Evidence supporting annotation

Annotation Features:

DAG Structure

Annotate to any level within DAG



GO Annotations for:



Human proteins



All SWISS
-
PROT/TrEMBL proteins



Annotation sets for completely sequenced
proteomes

GOA: GO Annotation at EBI



Methods:



Manual curation



SWISS
-
PROT keyword <
-
> GO term mapping



EC number <
-
> GO term mapping



InterPro entry <
-
> GO term mapping

GOA: GO Annotation at EBI



Browsers:



DAG
-
Edit



AmiGO



“QuickGO” at EBI



EP:GO browser



GO Tools



Developmental processes


DAG cross
-


products with anatomy terms



Physiological processes



Relational database


Expand relationship types

The Future of GO:



FlyBase & Berkeley
Drosophila

Genome Project

• WormBase



Saccharomyces

Genome Database



• DictyBase



Mouse Genome Informatics




• Compugen, Inc



The
Arabidopsis

Information Resource



Swiss
-
Prot/TrEMBL/InterPro



Pathogen Sequencing Unit (Sanger Institute)



PomBase (Sanger Institute)



Rat Genome Database



Genome Knowledge Base (CSHL)



The Institute for Genomic Research

www.geneontology.org

The Gene Ontology Consortium is
supported by NHGRI grant HG02273
(R01). The Gene Ontology project thanks
AstraZeneca for financial support. The
Stanford group acknowledges a gift from
Incyte Genomics.