The ChEBI ontology - European Bioinformatics Institute (EBI) Home ...

moredwarfΒιοτεχνολογία

1 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

113 εμφανίσεις




This work is licensed under the Creative Commons Attribution
-
Share Alike 3.0 License. To view a copy
of this license, visit
http://creativecommons.org/licenses/by
-
sa/3.0/

or send a letter to Creative Commons, 543
Howard Street, 5th Floor, San Francisco, California, 94105, USA.

1

Last Edited: Paula de Matos

(
November

200
9
)

Understanding the ChEBI ontology


This tutorial covers
an introduction to the purpose and structure of ontologies within biosciences,
and the ChEBI ontology in particular. You will learn about the organisation o
f the ChEBI
ontology into its four sub
-
ontologies, and you will learn about the particular ontology relationships
which are used within ChEBI.

This is the third of four training blocks in the ChEBI training course.


Block 1


Introduction to ChEBI

Block
2


Searching and browsing ChEBI

Block 3


Understanding the ChEBI ontology

Block 4


Download and programmatic access



Contents

Understanding the ChEBI ontology

................................
................................
................................
..

1

Contents

................................
................................
................................
................................
........

1

Introduction to Ontologies in Bioinformatics

................................
................................
...................

2

What is an ontology?

................................
................................
................................
....................

2

Bioinformatics Data

................................
................................
................................
......................

4

Ontologies in Bioinformatics

................................
................................
................................
........

5

The ChEBI
ontology

................................
................................
................................
.........................

6

Exploring the ChEBI sub
-
ontologies

................................
................................
....................

7

ChEBI ontology relationships

................................
................................
...............................

8

Viewing the ChEBI ontology online

................................
................................
...................

11

Ontology Lookup Serv
ice (OLS)

................................
................................
................................

14

Worked example: tryptophan

................................
................................
..............................

15

The OBO File format

................................
................................
................................
......................

17

For more information

................................
................................
................................
......................

18

Exercises

................................
................................
................................
................................
.........

19








2

Introduction to Ontologies in Bioinformatics

What is an ontology?


The term ‘ontology’ derives from a branch of Philosophy, in which it means the theory or study of
being

as such. The word
ontology

comes from the Greek
ontos

for being and
logos

for word
. It is a
relatively new term in the long history of philosophy, introduced to distinguish the study of being
as such

from the
study of various kinds of being

in the natural sciences. The term in common use
before was Aristotle's word
category
, which he us
ed for classifying anything that can be asserted
about anything. Below is an extract of Aristotle’s classification of animals which he arrived at
through careful observation (Figure 1).



Figure
1

Aristotle's classification of ani
mals


Within the field of computer science, the term
ontology
is used to mean an explicit specification of
a conceptualisation (Tom Gruber, 1993). Within the world of the computer, what can ‘be’ is really
what can be ‘specified’. The explicit specification

of a conceptualisation within a particular
domain is usually represented by a set of objects of that domain (instances) and their attributes,
reflected in a representational vocabulary, organised by relationships into a classification which
may take the f
orm of a hierarchy or graph.

As illustrated in the example below, we can define a vocabulary in the domain of ‘animals’ which
represents the instances and some classes and attributes within the domain, and then define the
relationships between the objects
, classes and attributes. By so doing, we arrive at an explicit
representation of knowledge, from which we can derive conceptual meaning.



Figure
2

Ontology as an explicit specification of
a conceptualisation

The significance of
the hierarchical classification within an ontology is that a class at a higher
level subsumes a class at a lower level. That is, any attribute of the class at the higher level is an
attribute of a class or instance at a lower level. By such organisation, e
ach attribute usually only
has to be specified once, at the highest level of the ontology at which it is relevant, and it is then
understood to apply to all child classes and instances. This results in a very powerful


3

representational structure.


Moving to
wards meaning: Semantics

The current World Wide Web has allowed the distribution and dissemination of information at
previously unheard
-
of levels. Whereas in the past, knowledge building happened largely in
parallel in various different locations, and diss
emination was slow, with the advent of the World
Wide Web, global dissemination may be instantaneous, and knowledge building is increasingly
happening as a global enterprise. However, while the information represented on the web is
usually easily understan
dable by humans, it may not be so for computers. For example, consider
the following table:



Which may be represented on a standard Web page in standard HTML as:



When asked to answer the following question,

“What are

the names of all the animals of t
ype Mammal
?”

humans can answer it easily, but computers may not be able to without very contingent
programming based on accidents of layout, which would be broken if the layout changed even
slightly.

Tim Berners
-
Lee proposed a vision of the semantic web a
s “
An extension of the World Wide Web
in which information is given a well
-
defined meaning, better enabling computers and people to
work in cooperation
”. And when it comes to dealing with very large volumes of data, such as
those found in the field of Bioi
nformatics, most tasks (such as finding all instances of a certain
type of data which correspond to a certain category) require the cooperation of humans and
computers to perform adequately. To better enable this, in contrast to the above example, data
nee
ds to be associated with
semantics
.



The association of semantics (in this case through the use of XML tags) allows computers to
resolve the meaning of the data in addition to humans.


Moving towards meaning: Standards


Even if we associate semantics w
ith our data as discussed above, there is still a problem of
interoperability if different datasets which actually represent the same or similar data are
associated with semantics in different ways. For example,




4



In this situation, a computer will not
immediately be able to determine any relationship between
the two datasets, nor answer questions accurately across the two datasets (such as performing a
search across both datasets). It is clear that we need not only semantics associated with our data,
bu
t also to agree on standard ways in which to represent those semantics.

An ontology is an attempt to explicitly specify terms and their associated semantic meaning, which
represent an agreement or standardisation within a particular field of application or

domain of
knowledge. This has particular relevance in the field of Bioinformatics, where the volumes of data
are so vast and being generated in such geographically dispersed efforts, that the interpretation and
representation of knowledge relies heavily o
n the cooperation of humans and computers.


Bioinformatics Data


The core bioinformatics data is made up of protein and nucleotide sequences, along with
measurements of three
-
dimensional structures. There are many levels above this core data, which
consist

of the representation of knowledge at various different levels, such as the knowledge about
whole genomes, gene expression patterns, protein families and interactions, systems biology and
pathway models.



Figure
3

Bioinformatic
s Data

At each different level and in each different database, knowledge is represented and enhanced by
means of annotations, which are usually captured in the form of human
-
readable free text.




5

Annotation of bioinformatics data


Annotation of bioinformati
cs data is essential for capturing and transmitting the knowledge
associated with data in bioinformatics databases. All data other than the core data (sequences etc)
which is present in the databases, such as names, descriptions, or literature references,
is
annotation.

Annotations are often captured in the form of free text, which is easy for a human audience to read
and understand, but is difficult for computers to parse, and can vary in quality from database to
database, and can use different terminolog
y to mean the same thing (even within the same
database, if for example different human annotators used different terminology).

An example of annotation present within the UniProt knowledgebase, taken from the UniProt
accession number
P00325
, alcohol dehyd
rogenase 1B (
http://beta.uniprot.org/uniprot/P00325
):


Figure
4

Annotation

in free text


Additionally, in many databases, efforts are underway for the assistance by computers in the
extension of the annotation process by
automatic

annotation
. This usually involves the extension
of the human annotation of a core subset of data to a larger set of data based on computer
algorithms for determining the applicability of the human annotations to similar data in the larger
dataset.

Several

ontologies within the field of bioinformatics have been created in order to address the need
for the standardisation and specification of the meaning of terminology which is used in
annotation.


Ontologies in Bioinformatics


Some examples of common ontolo
gies within the field of bioinformatics are discussed below.


NCBI Taxonomy

The NCBI Taxonomy database is a curated controlled vocabulary of the Linnaean names of
organisms which have been genetically sequenced, organised into a hierarchy using
Is A
relati
onships. For example, the abbreviated NCBI taxonomy of Homo Sapiens can be represented
as:



6


Figure
5

Taxonomy of Homo Sapiens


In this case, the taxonomy forms a strict hierarchy of parent
-
child relationships. However, in
general,

ontologies need not be confined to this format, and indeed, structures which allow
relationships to multiple parents (such as the Directed Acyclic Graph structure) are common.


Enzyme Taxonomy

Enzyme classification also takes the form of a hierarchical ta
xonomy in which each enzyme is
classified at four levels of depth. For example, classification of the enzyme Flavonol 3
-
sulfotransferase is given below.


Figure
6

Enzyme Taxonomy

Gene Ontology

The Gene Ontology Consortium (
http://www.geneontology.org/
) develops and maintains the Gene
Ontology, which provides a controlled vocabulary to describe gene and gene product attributes in
any organism. It is organised by three organising principl
es, namely ‘molecular function’,
‘biological process’ and ‘cellular component’.


The ChEBI ontology


The ChEBI ontology is
an
ontology
for biologically interesting

chemistry
. It consists of

three

sub
-
ontologies, namely



Molecular Structure
, in which molec
ular entities or parts thereof are classified according to

their
structure;



Role
,
in
which
entities are
classifie
d

on the basis of their role within a biological context, e.g.
as antibiotics, antiviral agents, coenzymes, enzyme inhibitors
, or on the basi
s of their intended
use
by humans, e.g. as pesticides, detergents, healthcare products, fuel;
and



Subatomic Particle
,
in
which
are
classifie
d

particles which are smaller than atoms.




7



Figure
7

ChEBI ontology for (R)
-
adrenaline


E
xploring the ChEBI sub
-
ontologies


We will take a brief look at the kinds of data you will find classified under each of the three sub
-
ontologies. (By “classified under”, we mean “has an unbroken
is a

relationship path with”).


Molecular structure

Molecul
ar entities with defined connectivity are classified under the
molecular structure

sub
-
ontology. These include the chemical compounds which themselves could exist
in some form in
the real world, such as
drugs, vitamins,
insecticides, and different forms of

alcohol.


In addition, classes of molecular entities are classified under the
molecular structure

sub
-
ontology.
Classes may be structurally defined, but do not represent a single structural definition, but rather a
generalisation of the structural featur
es which all members of that class share.

It is often useful to define the interesting parts of molecular entities as
groups.
Groups have a
defined connectivity
with one or more specified

attachment point
s
.



8


Figure
8

Molecular st
ructure ontology


Role

The
role
sub
-
ontology is further divided into two distinct types of role, namely biological role and
application.
Roles do not themselves have structures, but rather it is the case that i
tems in the
role
ontology are linked to the mo
lecular entities which have those roles
.


Figure
9

Role ontology


Subatomic particle

The subatomic particle sub
-
ontology is the smallest

sub
-
ontology graph
, consisting only of those
particles which are smaller than an atom.

ChEBI

ontology relationships


The ChEBI ontology
uses two generic ontology relationships, namely



9



Is a:

Entity A is an instance of Entity B. For example, chloroform
is a

chloromethanes.



Has part
:

Indicates relationship between part and whole, for example,
tetrac
yanonickelate(2−)
is part of

potassium tetracyanonickelate(2−).


Figure
10

ChEBI ontology generic relationships


In addition, the ChEBI ontology contains several chemistry
-
specific relationships which are used
to convey additional

semantic information about the entities in the ontology. These are:



Is conjugate base of

and
i
s conjugate acid of:
Cyclic relationships used to connect acids
with their conjugate bases, for example, pyruvic acid is the conjugate acid of the pyruvate
anion
, while pyruvate is the conjugate base of the acid.



Is tautomer of:

Cyclic relationship used to show relationship between two tautomers, for
example, L
-
serine and its zwitterion are tautomers.



Is enantiomer of:

Cyclic relationship used in instances when t
wo entities are mirror
images and non
-
superposable upon each other. For example, D
-
alanine is enantiomer of
L
-
alanine and vice versa.



Has functional parent:
Denotes the relationship between two molecular ent
ities or
classes, one of which p
ossesses one or m
ore characteris
tic groups from which the other
can be derived by functional modification. For example, 16α
-
hydroxyprogesterone can be
derived by functional modification (i.e. 16α
-
hydroxylation) of progesterone.



Has parent hydride:

Denotes the relationship between an ent
ity and its parent hydride,
for example
,
1,4
-
napthoquinone has parent hydride naphthalene.



Is substituent group from:

Indicates the relationship between a substituent group/atom
and its parent molecular entity, for example, the L
-
valino group is derived by

a proton
loss from the N atom of L
-
valine.



Has role:

Denotes the relationship between a molecular entity and the particular
behaviour which the entity may exhibit either by nature or by human application, for
example, morphine
has role
opioid analgesic.


Figure
11

ChEBI chemistry
-
specific relationships


The

structural meaning of the
chemical
ontology relationships are further illustrated below.



10

Is Conjugate Base Of




Is Conjugate Acid Of





Is Tautomer Of




Is Enantiomer O
f








11

Has Functional Parent





Has Parent Hydride




Is Substituent Group From



A set of family
relationships



Viewing the ChEBI ontology

online




12

The ChEBI ontology is part of the main entry view of a ChEBI entry. For example, the ChEBI
entry for

L
-
cysteine is shown below.


Figure
12

ChEBI entry for L
-
cysteine


Scrolling down reveals the section titled ‘ChEBI Ontology’, in which the parent and children
relationships are listed. The default view is the parents and children

view, however, clicking on the
link marked ‘Tree View’ results in the display of the full hierarchical tree of relationships to this
term.


Figure
13

Ontology in parents and children view


Tree view



13


Figure
14

On
tology in tree view


The ChEBI ontology may also be browsed. To access the browse facility, select the ‘browse’ link
from the main left
-
hand menu bar.



This link leads to the Ontology Lookup Service, an ontology browsing and searching utility which
prov
ides access to several different ontologies within the bioinformatics field.


Browse the ontology



14

Ontology Lookup Service (OLS)


The Ontology Lookup Service is a facility which provides
a centralised query interface for
ontology and controlled vocabulary lookup
. It is availa
ble online at
http://www.ebi.ac.uk/ontology
-
lookup/
. The link to browse the ChEBI ontology opens the
following screen:



Figure
15

Ontology Lookup Service
-

ChEBI ontology


T
he Ontology Lookup Service can integrate any ontology which is available in OBO format, and
at present (as at the last release) contains
61

ontologies, including



GO



ChEBI



Molecular interaction (PSI MI)



Pathway ontology (PW)



Human disease (DOID)



and many mo
re…


OLS p
rovides
facilities for

the

searching and browsing of ontologies
, as well as displaying a graph
of terms and relationships

between terms similarly to what AmiGO displays for GO.


Browse the three
ChEBI
sub
-
ontologies



15


Figure
16

DOID term 'Mental Retardation'
and graph

Worked example: tryptophan


In this example we will browse the ChEBI ontology surrounding the chemical tryptophan by using
the Ontology Lookup Service.

Step 1: Search

Open the Ontology Lookup Service home page at
http://www.ebi.ac.uk/ontology
-
lookup/
. Select
the Chemical Entities of Biological Interest ontology from the ontology selection drop
-
down box.

Type in the search box the first few letters of the term ‘tryptophan’. You will see that th
e search is
performed in the background and a list of matching terms are displayed in a drop
-
down.



Step
2
:
Select term

Select the term
Tryptophan [CHEBI:27897]
. Additional information about the term is displayed
below the search box.

1. Select ChEBI
ontolo
gy

2. Type first few
letters of search term



16



Step
3
:
Brow
se ontology

After selecting the term, click ‘browse’. You are taken to the ontology browser with the relevant
term displayed as the root.


Step
4
:
Viewing graphical tree for this entry

Scrolling down and to the right reveals the graphical tree for this
entry.




3. Additional information
defined in ChEBI OBO
file

4. Click ‘Browse’

5. Browse full ontology

6. Browse with
tryptophan as root

7. Tryptophan



17

The OBO File format


The Open Biomedical Ontologies (OBO) is a
n umbrella organisation for ontologies and structured
shared controlled vocabularies for use across all biological and biomedical domains
(
http://sourceforge.net/projects/obo
). This organisation has defined the OBO file format, which is
an ontology representation format designed specifically with the following goals in mind:



Human readability



Ease of parsing



Extensibili
ty



Minimal redundancy

The OBO format models a subset of the concepts modelled in OWL (Web Ontology Language)
with extensions for metadata

(such as synonyms).

An extract from the ChEBI Ontology downloaded in OBO format illustrates the overall layout of
dat
a within the OBO file. The first paragraph in file contains general header information about that
particular file, and is then followed by one or more terms separated by blank lines. Each term
contains an identifier, a name, a definition, relationships to
other terms within the ontology, and
may contain additional metadata such as synonyms.


format
-
version: 1.2

date: 28:01:2009 05:57

saved
-
by: pmatos

default
-
namespace: chebi_ontology

remark: ChEBI subsumes and replaces the Chemical Ontology first

remark:

developed by Michael Ashburner & Pankaj Jaiswal.

remark: Author: ChEBI curation team

remark: ChEBI Release version 53

remark: For any queries contact chebi
-
help@ebi.ac.uk

synonymtypedef: IUPAC_NAME "IUPAC NAME"

synonymtypedef: FORMULA "FORMULA"

synonym
typedef: SMILES "SMILES"

synonymtypedef: InChI "InChI"

synonymtypedef: InChIKey "InChIKey"

synonymtypedef: BRAND_NAME "BRAND NAME"

synonymtypedef: INN "INN"


[Term]

id: CHEBI:24431

name: molecular structure

def: "A description of the molecular entity or p
art thereof based
on its composition and/or the connectivity between its constituent
atoms." []


[Term]

id: CHEBI:23367

name: molecular entities

def: "A molecular entity is any constitutionally or isotopically
distinct atom, molecule, ion, ion pair, radic
al, radical ion,
complex, conformer etc., identifiable as a separately
distinguishable entity." []

synonym: "entidad molecular" RELATED [IUPAC:]

synonym: "entidades moleculares" RELATED [IUPAC:]

synonym: "entite moleculaire" RELATED [IUPAC:]

synonym: "mole
cular entity" EXACT IUPAC_NAME [IUPAC:]

synonym: "molekulare Entitaet" RELATED [ChEBI:]

is_a: CHEBI:24431


General header
information

Synonym types
used in terms

Relationships to
other terms

Synonyms in OBO format may
be ‘related’ or ‘exac
t’. In the
ChEBI OBO file, IUPAC
names are considered ‘exact’
synonyms and all others
‘related’.



18

[Term]

id: CHEBI:24870

name: ions

def: "An ion is a molecular entity having a net electric charge."
[]

synonym: "ion" EXACT IUPAC_NAME [IUPAC:]

is_
a: CHEBI:23367



The OBO Foundry

“The OBO Foundry is a collaborative experiment involving developers of science
-
based
ontologies who are establishing a set of principles for ontology development with the goal of
creating a suite of orthogonal interoperable

reference ontologies in the biomedical domain.” The
OBO foundry can be found at
http://www.obofoundry.org/



For more information

For further information,
e
mail

the ChEBI team at
:
chebi
-
help@ebi.ac.
uk
, or log on to the
SourceForge forum at
https://sourceforge.net/projects/chebi/
.

Additional information about using ChEBI can be found by examining the User Manual at

http://www.ebi.ac.uk/chebi/userManualForward.do
.

The latest news, updates and developments are announced via a
RSS Feed
.





19

Exercises

You will need

access to ChEBI online at
http://www.ebi.ac.uk/chebi

to complete these exercises.


1.

Dichlorvos

(
CHEBI:34690)

is a well known insecticide
. Open the ChEBI entry for d
ichlorvos

(
CHEBI:34690)
. Scroll down the entry to the “ChEBI ontology”. Can you determine whet
her
it

can also be used as a fungicide?


________________________________________________________________
______


2.

On the same entry as above (
CHEBI:34690
) click on the “Tree View” to display the entire
ontology tree. Follow the tree path from dichlorvos to
its parent
organophosphate insecticide

(CHEBI:25708). Click on this parent

organophosphate insecticide

(CHEBI:25708). This
brings you to the ontology view of the parent. From looking at the children of this entry can
you write down any other insecticide?


________________________________________________________________
______


3.

The following term appears in the ChEBI OBO file.

[Term]

id: CHEBI:32762

name: L
-
tyrosinium

synonym: "(1S)
-
1
-
carboxy
-
2
-
(4
-
hydroxyphenyl)ethanaminium" RELATED [IUPAC:]

synonym: "L
-
ty
rosine cation" RELATED [JCBN:]

synonym: "L
-
tyrosinium" EXACT IUPAC_NAME [IUPAC:]

synonym: "C9H12NO3" RELATED FORMULA [ChEBI:]

synonym: "[NH3+][C@@H](Cc1ccc(O)cc1)C(O)=O" RELATED SMILES [ChEBI:]

synonym: "InChI=1/C9H11NO3/c10
-
8(9(12)13)5
-
6
-
1
-
3
-
7(11)4
-
2
-
6/h1
-
4,8,11H,5,10H2,(H,12,13)/p+1/t8
-
/m0/s1/fC9H12NO3/h10,12H/q+1" RELATED InChI
[ChEBI:]

xref: Gmelin:1150138 "Gmelin Registry Number"

is_a: CHEBI:32786

relationship: is_enantiomer_of CHEBI:32775

relationship: is_conjugate_acid_of CHEBI:17895


Can you describ
e the relationships that this term has to other entities within the ontology?

______________________________________________________________________

______________________________________________________________________

____________________________________
__________________________________

4. Using the Advanced Search can you find all entries which have the application
‘pharmaceutical’ (CHEBI:
52217
)

?
How many entries are there? Can you name one?


_________________________________________________________
_____________

______________________________________________________________________

______________________________________________________________________


5. Using the Advanced Search can you find all entries which have the
role


epitope’

(CHEBI:
5
3000
)

? How many entries are there? Can you name one?

______________________________________________________________________

______________________________________________________________________

_______________________________________________________________
_______