D5 - EMBRACE

kayakstarsAI and Robotics

Nov 15, 2013 (3 years and 11 months ago)

180 views













LHSG
-
CT
-
2004
-
512092


EMBRACE

A European Model for Bioinformatics Research and Community Education


Network of Excellence


Life Sciences, Genomics and Biotechnology for Health


D5.3.11B: The EMBRACE Data Types and Methods Ontology (EDAM) avail
able on the
portal



Due date of deliverable:

n/a

Actual submission date:

14.9.2010



Start date of project:


1.2.2005


Duration: 66 months

Organisation name of lead contractor for this deliverable: CMBI

D5.3.11


The EMBRACE Data Types and Methods Ontol
ogy

(EDAM) available on the portal




The EDAM document,
which was partly the outcome of the three Amsterdam workshops which
are described in deliverable D5.2.17, is publicly available from
http://edamontology.sourceforge.net and linked to the portal.






Overview of the EDAM ontology (that has been the result of several man
-
years work, including
the three Amsterdam ontology workshops that you know as deliverable D5.2.17).








What is EDAM?

EDAM (EMBRACE Data and Methods) is an ontology for bioinforma
tics tools and data. It includes a set
of defined terms, relationships between terms and rules that govern the terms and relations.

EDAM is available for download from http://sourceforge.net/projects/edamontology/files/ where the
latest release is beta08.

EDAM provides a controlled vocabulary for description in semantic terms of things such as:



Web services e.g. WSDL files



XSD data schema, e.g. associated with a WSDL file



Standalone tools



Web servers



Databases



Ontologies



Data objects



Data syntax an
d file formats

For example, annotation of a web service (WS) might include:



Topic

-

general area(s) the service belongs to



Operation

-

what exactly each operation does



Data type

-

the input and output data (in semantic terms)



Data format

-

the input a
nd output data (in syntactic terms)



Data resources

-

databases and ontologies that are used in the background

The goal is for EDAM to describe, at a coarse level at least, all major bioinformatics databases, data and
tools currently in use.

The "beta" r
elease covers tools (and associated data) listed in the EMBRACE Registry:

http://www.embraceregistry.net/

For background information such as plans for EDAM see the EDAM Wiki:

http://sourceforge.net/apps/mediawiki/edamontology/index.php?title=Main_Page/

Download and Status

A "beta" version is available in OBO (Open Biomedical Ontologies) format:

http://sourceforge.net/projects/edamontology/files/

It provides a starting point for service nomenclature. Coverage is quite broad in general and quite deep fo
r
sequence analysis. It includes:

Over 2000 terms with definitions

16 types of relations

The "beta" version is intended for
testing and feedback
. It uses the terms, relations and rules below and
adheres to the Guidelines for Developers. Suggestions or
requirements are welcome.

EDAM is being actively developed:

Future versions might contain many new terms but should not be a fundamental departure from the term
types, relationships and hierarchy in EDAM_beta04.obo.

Term names, definitions and hierarchy

(
is_a

relations) in all branches are reasonably stable from
EDAM_beta04.obo on.

Relations are defined
but not used

in many term definitions. Relations will be added in the future
depending on requirements.

OBO format uses identifiers (IDs) to uniquely i
dentify terms. EDAM IDs will persist between versions: a
given ID is guaranteed to identify the same concept. This does *not* imply term names, definitions and
other fields will remain constant, but they will remain true to the concept.

Terms that are mad
e obsolete will also persist; they will not be removed and will maintain their ID.

Viewing

EDAM may be viewed in your browser, a text editor, or the OBO Ontology Editor (OBOEdit) Version 2:

http://oboedit.org

To load EDAM, select "File ... Load Ontologi
es".

The most convenient view is to have the "Ontology Tree Editor" (from "Editors" menu) and the "Text
Editor" (from "Editors" menu) side by side.

"Classes" and "Relations" will appear in the "Ontology Tree Editor". To see the EDAM terms, expand
"Classe
s".

The view is much cleaner if you only show the
is_a

relationships. To do this, select the small
f

from the
"Ontology Tree Editor" and then select "Show a single relationship..." and then "is_a".

EDAM is available in the following web
-
based browsers:

NCBO Ontology Browser

EBI SRS server

It will soon be available in:

EBI Ontology Look
-
up Service

License

EDAM is made available to all without any constraint or license on its use or redistribution other than:

EDAM is clearly acknowledged as the source

of the product.

EDAM files displayed publicly include the publication date and/or version number.

EDAM files are not altered and subsequently redistributed under their original name or with the same
term identifiers.

Contacts

All enquiries to Jon Ison
(jison@ebi.ac.uk)

Thanks to Peter Rice, Mahmut Uludag, Hamish McWilliam, Matus Kalas, James Malone and others for
valuable contributions.

Mailing lists

Please feel free to subscribe to either of the mailing lists for the EDAM Ontology:

https://lists.sou
rceforge.net/lists/listinfo/edamontology
-
users

https://lists.sourceforge.net/lists/listinfo/edamontology
-
developers

Once subscribed, you can mail the lists:

edamontology
-
users@lists.sourceforge.net

edamontology
-
developers@lists.sourceforge.net

edamont
ology
-
developers is for technical discussions between EDAM developers / contributors

edamontology
-
users for general discussions and announcements.

Please feel free to let other folks know. Traffic will be kept to a minimum.

Motivation

Biological data an
d analytical tools are increasingly available and diverse. Meanwhile, researchers
demand evermore convenient means to identify, connect and reuse the available resources. As providers
of data and tooling move over to service
-
based architectures, there is a

need to identify, compare, select,
compose and reuse services. Semantic annotations will be key to automating such tasks; EDAM aims to
support efforts in service discovery and interoperability.

Scope

Namespaces

EDAM includes 6 ontologies (branches of ter
ms in their own namespace):

entity

-

Any biological thing (or part of a thing) with a physical existence, a physical part, region or
feature that can be mapped to such a thing, a collection of such things or an observable phenonema or
occurence.

topic

-

A general field of bioinformatics study, data, processing and analysis or technology.

operation

-

A specific, singular function or process performed by a tool, for example a WS operation.
What is done, but not (typically) how or in what context.

resource

-

A category of content of a data resource including databases, ontologies and servers.

data

-

A semantic description of a data entity (datum) commonly used in bioinformatics.

format

-

A reference (typically a URL) of a data format specification.

EDAM
will (eventually) only include terms strictly in the domain of "bioinformatics tool and data
description", as defined by the concepts above. Terms not specific to this domain but required for tool
description, e.g.
"Email address"

might (eventually) be rem
oved. The
entity

branch (which provides
biological context for other branches), might also (eventually) be removed.

Conceptual model

Terms in a namespace may be related to one another according to the model below. Bold text within a
box indicates a namesp
ace (top
-
level term), non
-
bold text within a box indicates a minor branch, text next
to lines indicates a relation between two terms

Principles

These are:



Clearly defined scope (see above)



A purpose
-
independent design, not tied to a particular use case



Relevant to annotation of current:



WSDL files



XSD schema



Standalone databases, servers and tools



Comprehensive, with enough terms to be useful



Comprehensible, with terms and relations that are simple and intuitive



Uncluttered, including only commonly

used terms use and with as few relation types as possible



Navigable, with a simple class (
is_a
) hierarchy



General, including terms of general use and excluding fine
-
grained specialised concepts.



Complement (not duplicate) other established ontologies.
Overlap with relevant, specialised ontologies
will be minimised.



Compatible (e.g. cross
-
referenced) with existing resources



Integrity, compatible (so far as possible) with "upper level" ontologies (e.g. BioTop, SUMO and
OpenCyC)



Extensible, with clear g
uidelines for developers



Convenient, with clear guidelines for annotators



Ideally, support automated logical inference (reasoning software)



Validatable


Limitations

EDAM is/does not:



Describe syntax or file formats in detail (although the
format

names
pace will provide references)



Define data structures. Although
has_part

/
is_part_of

relations are defined they are not currently used.



Include terms for every conceptual part of things. For example, as a rule (with exceptions) a datatype is
only listed
if it known to be in common use



A catalogue of individual data structures, databases and so on. EDAM terms correspond to classes;
specific instances of the semantic types are not included.



A full
-
strength ontology. For example disjoints, unions and inters
ections of terms are not defined. Many
features of the domain that could be expressed, e.g. in OWL format, are not modelled.



A way (in itself) to identify or unify all services and data (but it might help).



Complete (and arguably never can be).

Sources

Current version

Various sources were considered in constructing the "beta" version.



Software collections and registries:



EMBRACE Web Services



EBI Web Services



EBI databases and retrievable fields known to the EB
-
eye web services ()



EMBOSS including EM
BASSY packages (more than 200 applications)



WHAT
-
IF data and services (see also WHAT
-
IF help)



Lists of tools from the Web



Domain ontologies:



myGrid ontology



NAR Databases



NAR web servers



Sequence (sequence
-
related terms)



Sequence service (sequence
service terms)


For database
-
related terms:



dbxref.txt (databases cross
-
referenced in UniProtKB/Swiss
-
Prot)



List of databases collated by the ELIXIR project



Lists of databases from the web


Some resources were considered but were not a significant sou
rce of terms:

MI (molecular interactions)

MIRIAM Resources

bio2rdf


Structure

EDAM Components

EDAM has 4 components:



Terms



Hierarchy (is_a and intersection_of relations)



Relations



Rules

Terms
-

Each EDAM term corresponds to a well established conce
pt or class with one or more intrinsic
properties. The class must have these properties to retain its identity. Individuals (unique instances) of
these classes are
not

included.

Hierarchy

-

Every term (excluding top
-
level terms) is related to (typically)
one other term by an is_a
(subclass) relationship. These relations define the term hierarchy. All "child" terms must share the
intrinsic property of their "parent", in addition to having their own intrinsic property. In the future,
intersection_of might be

used instead of is_a in the few cases where there are two "parent" terms.

Relations
-

Terms are related to each other by defined relationships. Relations (other than is_a and
intersection_of) can be considered as properties of the classes.


Rules
-

There

are rules dictating how different terms may be related. They define which relations must (or
may) be specified for which terms. They reflect well established or self
-
evident principles.



Term structure


An OBO term consists of:



Unique identifier



Name



Namespace



Definition



Comment (optional)



Synonym(s) (optional)



Cross
-
reference(s) (optional)



Relationships to other terms


For example:


[Term]

id: EDAM:0000970

name: Citation

namespace: data

def: "A bibliographic citation providing references to sci
entific article, book or other published material."
[EDAM:EBI "EMBRACE definition"]

comment: A citation might include the authors, title and journal name, date and (possibly) an abstract of
the publication or link to the full
-
text if it's freely availabili
ty.

synonym: "Reference" EXACT []

xref: Moby:GCP_SimpleCitation

xref: Moby:Publication

is_a: EDAM:0000006 ! Data

is_attribute_of: EDAM:0000008 ! Undefined

is_input_of: EDAM:0000223 ! Undefined

is_output_of: EDAM:0000223 ! Undefined

has_source: EDAM:0000606

! Literature data resource

has_identifier: EDAM:0000841 ! Undefined


Unique identifier

This 7 digit number uniquely identifies an EDAM term. From the release of the "beta" version, identifiers
will persist between EDAM versions.


Name

There are rules for

naming terms.


Namespace

This is one of:



entity



topic



operation



resource



data



format

The namespace defines the sub
-
ontology and clarifies the term type (topic, operation etc) where this is not
obvious from the term name.


Definition

There are rul
es for defining terms.

Comment

The comment is optional and (typically) clarifies the definition.

Synonyms

These include related phrases, alternative spellings or true synonyms.

Cross references

The following resources are cross
-
referenced (example in pa
rentheses):



WHAT
-
IF operations (xref: WHATIF: CorrectedPDBasXML)



Abbreviations used for database cross
-
references (xref: http://www.geneontology.org/doc/GO.xrf_abbs:
FB)



BioMoby datatypes (xref: Moby:GI_Gene)



BioMoby namespaces (xref: Moby_namespace:iH
OPorganism)



Sequence Ontology terms (xref: SO:0000348)

Relationships

Several types of relation are defined. They are:



Defined between pairs of terms



Directional



Transitive (propagated from child to parent terms), e.g. if A is_a B is_a C we can infer A

is_a C.

Obsolete terms

Obsolete terms use these fields:



is_obsolete:
-

specifies whether a term is obsolete



replaced_by:
-

a term which replaces an obsolete term



consider:
-

specifies an alternative for an obsolete term


Terms

Summary

Top
-
level terms

correspond to individual namespaces and corresponding branches of terms. There are also
a few sub
-
branches worth highlighting (term names are given):

"Topic"


"Biological entity"


"Discrete entity"


"Entity feature"


"Entity collection"


"Phenomena"


"Op
eration"


"Data resource"


"Data"


"Identifier"


"Data format"


Biological entity

Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that
can be mapped to such a thing, a collection of such things or an
observable phenonema or occurence.

e.g.
"Gene"
,
"Amino acid residue cluster"
,
"Active site"
,
"Atom
-
atom interaction"
.

They include:

Discrete entity
-

Any biological thing with a distinct, discrete physical existence

e.g.
"Atom"
,
"Amino acid"
,
"Nucleic
acid"
,
"Protein"


Entity feature
-

A physical part or region of a discrete biological entity, or a feature that can be mapped to
such a thing

e.g.
"Gene"
,
"Restriction site"
,
"Protein domain"


Entity collection
-

A collection of discrete biological entiti
es

e.g.
"Genome"
,
"Proteome"


Phenomena
-

A physical, observable biological occurence or event.

e.g.
"Metabolic pathway"
,
"Mutation"



Entity terms:

Provide biological context of what is / isn't covered.


Topic

A general field of bioinformatics study,
data, processing and analysis or technology.

e.g.
"Sequence analysis"
,
"Alignment"
,
"Sequencing"
,
"Microarrays"
.

A topic might concern a group of related biological entities, databases or tool operations.


Operation

A specific, singular function or proc
ess performed by a tool, for example a WS operation. What is done,
but not (typically) how or in what context.

e.g.
"Sequence alignment"
,
"Pairwise sequence alignment"
,
"Sequence database search"
.


Data resource

A category of content of a data resource i
ncluding databases, ontologies and servers.

e.g.
"Sequence data resource"
,
"Plant genome data resource"



Data

A semantic description of a data entity (datum) commonly used in bioinformatics.

e.g.
"Sequence alignment"
,
"Comparison matrix"
,
"Phylogenetic
tree"

etc

They include:

Identifier
-

Something that identifies (typically uniquely) another concept, for example an
accession number.


Data terms:

Include primitive types and derived types that are composites of primitives or other derived types

Cover

everything from simple parameters, basic bioinformatics datatypes through to complex derived
types

Might reflect but do not describe in detail how the data is specified/represented (syntax)

Can be somewhat (necessarily) overlapping

Can reflect quite br
oad or imprecisely defined concepts. Such higher level terms serve as placeholders for
other, more specific terms lower down in the tree


Data format

A reference (typically a URL) of a data format specification.

e.g.
"FASTA format"
,
"PDB format"
,
"mmCIF
format"

etc

This branch includes file:

Basic flat file formats (text files with a specific layout)

Other formatted reports of data such as a presentational (web) formats used for database entries and tool
outputs.

XSD schema types (complex and simple t
ypes)


Relations

Note that most of relations defined below are not currently used in all term definitions.



is_a

This is an OBO core relation. Defines a term as a sub
-
class of another term, relating a term to a single
parent. A is a specialisation of B,
and B a generalisation of A. A is_a B if any object that instantiates A
also instantiates B. The is_a relationship is transitive: if termA is_a termB, and termB is_a termC, then
every instance of termA is also an instance of termB and termC.

e.g.
"Pairwis
e sequence alignment"

is_a
"Sequence alignment"


e.g.
"Protein sequence data resource"

is_a
"Sequence data resource"



intersection_of

This relation is not currently used.


This is an OBO core relation. Defines a term as a sub
-
class of all of two or more o
ther terms, relating a
term to more than one parent.


has_part / is_part_of

This relation is not currently used.



Defines a term (
"Biological entity"

or
"Data"
) as having a conceptual part (
"Biological entity"

or
"Data"
).

e.g.
"Enzyme"

has_part
"Active
site"


e.g.
"Sequence entry"

has_part
"Sequence"


Or conversely, defines that a term is a conceptual part of some other term.

e.g.
"Active site"

is_part_of
"Enzyme"


e.g.
"Sequence"

is_part_of
"Sequence entry"



concerns / is_concern_of

Defines a term (
"T
opic"
) as concerning (covering) another term (
"Biological entity"
,
"Data resource"
, ,
"Data"

or
"Operation"
)

e.g.
"topic:Nucleic acid sequence analysis"

concerns
"operationPolyA signal identification"


Or conversely, that a term is a concern of another te
rm

e.g.
"operation:PolyA signal identification"

is_concern_of
"topic:Nucleic acid sequence analysis"



has_input / is_input_of

Defines a term (
"Operation"
) as reading (inputting) another term (
"Data"
).

e.g.
"operation:Sequence alignment"

has_input
"data:
Sequence"


Or conversely, defines a term as an input of another term.

e.g.
"data:Sequence"

is_input_of
"operation:Sequence alignment"



has_output / is_output_of

Defines a term (
"Operation"
) as writing (outputting) another term (
"Data"
).

e.g.
"operation:
Sequence alignment"

has_output
"data:Sequence alignment"


Or conversely, defines a term as an output of another term.

e.g.
"data:Sequence alignment"

is_output_of
"operation:Sequence alignment"



has_source / is_source_of

Defines a term (
"Data"
) as having
a source (
"Data resource"
).

e.g.
"Sequence"

has_source
"Sequence data resource"


Or conversely, defines a term is a source of another term.

e.g.
"Sequence data resource"

is_source_of
"Sequence"



has_identifier / is_identifier_of

Defines that a term (
"Bi
ological entity"
,
"Data resource"

or
"Data"
) has (or can have) an identifier
(
"Identifier"
).

e.g.
"Sequence database entry"

has_identifier
"Sequence accession number"


Or conversely, defines that a term is an identifier of another term.

e.g.
"Sequence ac
cession number"

is_identifier_of
"Sequence database entry"



has_attribute / is_attribute_of

Defines that a term (
"Biological entity"
) has (or can have) an attribute (
"Data"
).

e.g.
"Protein"

has_attribute
"Protein sequence"


e.g.
"Protein"

has_attribute
"
Isoelectric point"


Or conversely, defines that a term is an attribute of another term.

e.g.
"Protein sequence"

is_attribute_of
"Protein"


e.g.
"Isoelectric point"

is_attribute_of
"Protein"



has_format / is_format_of

Defines that a term (
"Data"
) has a da
ta format specification (
"Data format"
).

e.g.
"Sequence record"

has_format
"Sequence format"


Or conversely, defines that a term is a format of another term.

e.g.
"Sequence format"

is_format_of
"Sequence record"



Rules

Rules define how terms may be rela
ted. They are organised below by term and relationship type.


Rules by term type


"Topic"

"Topic"

is_a
"Topic"

(1 only)

... a specialisation of a topic.

"Topic"

intersection_of
"Topic"

(if given, at least 2)

... a specialisation of two or more topics (
use instead of is_a).

"Topic"

concerns
"Biological entity"

(0 or more)

... concerns (covers) an entity.

"Topic"

concerns
"Operation"

(1 or more)

... concerns a tool operation.

"Topic"

concerns
"Data resource"

(0 or more)

... concerns a data resource.


"Topic"

concerns
"Data"

(0 or more)

... concerns a datatype.

"Biological entity"

"Biological entity"

is_a
"Biological entity"

(1 only)

... a specialisation of an entity.

"Biological entity"

intersection_of
"Biological entity"

(if given, at least 2)

... a specialisation of two or more entities (use instead of is_a).

"Biological entity"

is_concern_of
"Topic"

(0 or more)

... within a topic.

"Biological entity"

has_identifier
"Identifier"

(0 or more)

... with an identifier.

"Biological entity"

has_a
ttribute
"Data"

(0 or more)

... with an attribute.

"Biological entity"

has_part
"Biological entity"

(0 or more)

... with a conceptual part.

"Biological entity"

is_part_of
"Biological entity"

(0 or more)

... which is a conceptual part of another entity
.

"Operation"

"Operation"

is_a
"Operation"

(1 only)

... a specialisation of a tool operation.

"Operation"

intersection_of
"Operation"

(if given, at least 2)

... a specialisation of two or more tool operations (use instead of is_a).

"Operation"

is_conc
ern_of
"Topic"

(1 or more)

... within a topic.

"Operation"

has_input
"Data"

(0 or more, typically 1 or more)

... inputs a semantic datatype.

"Operation"

has_output
"Data"

(0 or more, typically 1 or more)

... outputs a semantic datatype.

"Data resourc
e"

"Data resource"

is_a
"Data resource"

(1 or more, typically 1 only)

... a specialisation of a data resource.

"Data resource"

intersection_of
"Data resource"

(if given, at least 2)

... a specialisation of two or more data resources (use instead of is_a
).

"Data resource"

is_concern_of
"Topic"

(0 or more)

... within a topic.

"Data resource"

is_source_of
"Data"

(1 or more)

... a source of a semantic datatype.

"Data resource"

has_identifier
"Identifier"

(0 or more)

... with an identifier.

"Data resou
rce"

has_part
"Data resource"

(0 or more)

... with a conceptual part.

"Data resource"

is_part_of
"Data resource"

(0 or more)

... as a conceptual part of another data resource.

"Data"

"Data"

is_a
"Data"

(1 or more, typically 1 only)

... a specialisatio
n of a semantic datatype.

"Data"

intersection_of
"Data"

(if given, at least 2)

... a specialisation of two or more topics (use instead of is_a in
exceptional circumstances only
).

"Data"

is_concern_of
"Topic"

(0 or more)

... within a topic.

"Data"

is_a
ttribute_of
"Biological entity"

(0 or more)

... an attribute of an entity.

"Data"

is_input_of
"Operation"

(0 or more)

... input to a tool operation.

"Data"

is_output_of
"Operation"

(0 or more)

... output of a tool operation.

"Data"

has_source
"Data r
esource"

(0 or more)

... a source in a data resource.

"Data"

has_identifier
"Identifier"

(0 or more)

... with an identifier.

"Data"

has_part
"Data"

(0 or more)

... with a conceptual part.

"Data"

is_part_of
"Data"

(0 or more)

... as a conceptual part

of another datatype.

"Data"

has_format
"Data format"

(0 or more)

... a data format specification.

"Data format"

"Data format"

is_a
"Data format"

(1 only)

... a specialisation of a data format.

"Data format"

is_format_of
"Data"

(1 only)

... a format
specification of a datatype.

"Identifier"

"Identifier"

is_identifier_of
"Biological entity"

(1 or more)

... identifier of an entity.

"Identifier"

is_identifier_of
"Data resource"

(1 or more)

... identifier of a data resource.

"Identifier"

is_identifie
r_of
"Data"

(1 or more)

... identifier of a datatype.


Rules by relation type


is_a

"Biological entity"

is_a
"Biological entity"


"Topic"

is_a
"Topic"


"Operation"

is_a
"Operation"


"Data resource"

is_a
"Data resource"


"Data"

is_a
"Data"


intersection_o
f

"Biological entity"

intersection_of
"Biological entity"


"Topic"

intersection_of
"Topic"


"Operation"

intersection_of
"Operation"


"Data resource"

intersection_of
"Data resource"


"Data"

intersection_of
"Data"


has_part

"Biological entity"

has_part
"Biol
ogical entity"


"Data resource"

has_part
"Data resource"


"Data"

has_part
"Data"


This relation is not currently used.

is_part_of

"Biological entity"

is_part_of
"Biological entity"


"Data resource"

is_part_of
"Data resource"


"Data"

is_part_of
"Data"


Thi
s relation is not currently used.

concerns

"Topic"

concerns
"Biological entity"


"Topic"

concerns
"Operation"


"Topic"

concerns
"Data resource"


"Topic"

concerns
"Data"


is_concern_of

"Biological entity"

is_concern_of
"Topic"


"Operation"

is_concern_of
"T
opic"


"Data resource"

is_concern_of
"Topic"


"Data"

is_concern_of
"Topic"


has_input

"Operation"

has_input
"Data"


is_input_of

"Data"

is_input_of
"Operation"


has_output

"Operation"

has_output
"Data"


is_output_of

"Data"

is_output_of
"Operation"


has_sour
ce

"Data"

has_source
"Data resource"


is_source_of

"Data resource"

is_source_of
"Data"


has_identifier

"Biological entity"

has_identifier
"Identifier"


"Data resource"

has_identifier
"Identifier"


"Data"

has_identifier
"Identifier"


is_identifier_of

"Ident
ifier"

is_identifier_of
"Biological entity"


"Identifier"

is_identifier_of
"Data resource"


"Identifier"

is_identifier_of
"Data"


has_attribute

"Biological entity"

has_attribute
"Data"


is_attribute_of

"Data"

is_attribute_of
"Biological entity"


has_format

"Data"

has_format
"Data format"


is_format_of

"Data format"

is_format_of
"Data"



Guidelines for Developers


EDAM development should adhere to the guidelines below.


Adding Terms

Only add terms that are within scope. If a term is strictly required but ou
t of scope, add a comment to the
term definition.

Only add terms that are in common use and/or re
-
used in multiple contexts.

Ensure terms are added to the correct branch/namespace.


Terms should:



Correspond to well established concepts



Have at least o
ne unique intrinsic property; terms must be distinct from each other. Specify phrases
describing essentially the same thing using synonym: (see term structure).



Correspond to classes, not individuals. Terms should typically have a single is_a (subclass) r
elation
or

(in
exceptional cases only) more than one. In future, intersection_of might be used to handle multi
-
parent
terms.



Be correctly assigned, particularly, do not use is_a when what is really meant is is_part_of (a common
mistake). Terms for parts o
f a data entity are (typically) of different class and should be located
appropriately; not necessarily near the term for the composition! EDAM defines but does not currently
use has_part / is_part_of relations.



Be correctly constituted. Do not erroneousl
y place into one class terms of a fundamentally different nature
(e.g. physical entity, tool operation and database) which in EDAM or "upper level" ontologies are kept
separate.



Ideally, have all appropriate relations specified; relationships should be se
lf
-
evident and well established.



Be well named and defined (see below).


Naming Terms

Term names should:



Be unique within a namespace (the same name may be used in different namespaces)



Reflect their definition: the meaning of a term should be reasonab
ly obvious from its name



Ideally, reflect their type (operation, datatype, identifier etc) following the patterns in use



Be short and simple


Defining terms

Term definitions should:



Describe at least one unique intrinsic property



Be clear and simple.
Avoid jargon and obscure acronyms



Informative



Unambiguous: avoid words (e.g. "can", "may", "should") that introduce modality or ambiguity



Be short: use the comment: field for extended comments


New relationship types

These should not be needed, but wou
ld have:



Clearly stated partner terms (class membership)



A clearly stated type (transitive, symmetric, inverse_of etc.)



A name that makes clear the nature of the relationship


Guidelines for Annotators


General guidelines

Use
"Topic"

terms for coarse
-
g
rained annotation of tools

Use

"Operation"

terms for fine
-
grained annotation of tool functions

Use
"Data resource"

terms for annotating data resources such as databases and servers into broad
categories based on content
-
type

Use
"Data"

and
"Data format"

terms for annotating data (and data formats) in semantic and syntactic
terms respectively


Picking Terms

If you have many annotations to do, it will help to familiarise yourself with EDAM first. Use a text editor
or OBOEdit.

Identify the correct branch/
namespace ("Operation", "Data" etc) of terms considering what is being
annotated (see above)

Search EDAM using keywords to find candidate terms. Multiple searches using synyonyms, alternative
spellings and so are preferable.

Pick the most specific term(s
) available, bearing in mind some concepts are necessarily overlapping or
general.

Only pick a correct term. If it doesn't exist, request it's added to EDAM


Use of other ontologies

The expectation is for EDAM to be used alongside other ontologies for an
notation where possible and
desirable. For example, an operation that predicts specific features of a molecular sequence could be
annotated with GO terms for the features.


Annotation of Web Services

Model of a Web Service

A WS is considered as an arbitra
ry (but usually related) set of one or more operations, reducing the
problem of WS interoperation to one of compatibility between operations.


Operation

Discrete unit of functionality performing (typically) one or more definite functions

Reads an input

Writes an output

Uses zero or more data resources


Input

Payload of SOAP message passed in operation call

Name and (ideally) description is given in WSDL file

Input has one or XML elements which must be set (input values)


Output

Payload of SOAP me
ssage returned from operation call

Name and (ideally) description is given in WSDL file

Output has one or XML elements which are written (output values)


XML elements

Simple or complex XSD types given in XSD schema associated with a WSDL file

Correspo
nd to values that are input or output by a service

Name and (ideally) description of element is given in schema

Element values are instances of a particular datatype with a semantic type and a specific syntax.

Most element values have a syntax fully spe
cified by the schema

Some element values correspond to text in a specific file format which is not specified by the schema.
Such reports may be a composite of different semantic types.


Data resources

Databases or ontologies used in the background

Not
passed in a WS call

Might be specified indirectly via a parameter. For example an operation reads a database, the name of
which is specified


Levels of annotation


Annotation of a WSDL file or associated XSD schema is possible at several levels. Assuming

SAWSDL
annotation, the XML elements that may be annotated are:


Operation

(
<wsdl:operation>
)

Ideally one
"Operation"

term for each WSDL operation (more than one in exceptional circumstances)


Input (parameter) values

(
<xs:element>
,
<xs:complexType>
,
<x
s:simpleType>
,
<xs:attribute>
)


One
"Data"

term


One
"Data format"

term


Output values

(
<xs:element>
,
<xs:complexType>
,
<xs:simpleType>
,
<xs:attribute>
)


One
"Data"

term

One
"Data format"

term


The following annotations might be useful but are not su
pported by SAWSDL:


Web service
(
<wsdl:service>
)

One or more
"Topic"

terms to describe the general area(s) the service operates in

One or more
"Data resource"

terms to describe the data resources used by the service


Operation input

(
<input>
)

One or m
ore
"Data"

terms for the input(s) of each operation (if needed)

Operation output (
<output>
)

One or more
"Data"

terms for the output(s) of each operation (if needed)

The expectation is for annotation of operation inputs and outputs to go into XSD schema
although the
WSDL file ((<input>) and <output> elements) might also be used.


Annotation of sequences

When annotating sequence data, the following terms (and their children) may be used:


id: EDAM:0000849

name: Sequence record

namespace: data

def: "A mol
ecular sequence and associated metadata (possibly including a feature table), typically
corresponding to a full entry from a molecular sequence database."
[EDAM:EBI "EMBRACE
definition"]

is_a: EDAM:0002044 ! Sequence data


id: EDAM:0002043

name: Sequence r
ecord lite

namespace: data

def: "A molecular sequence and minimal associated metadata, typically an identifier of the sequence
and/or a comment."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0000849 ! Sequence record


id: EDAM:0000848

name: Sequence

namespac
e: data

def: "A raw molecular sequence which might include ambiguity, unknown positions and non
-
sequence
characters."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002044 ! Sequence data


id: EDAM:0002176

name: Cardinality

namespace: data

def: "The number o
f a certain thing."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0000006 !
Data


The basic term to use is either:

"Sequence record"

or

"Sequence record lite"



Terms under
"Sequence"

may be used to annotate the basic sequence type, e.g.:

"Pure sequence"


"Nucleotide sequence"


"DNA sequence"


"Unambiguous pure protein sequence"


etc.


Terms under
"Cardinality"

may be used to annotate the number of sequences:

"Exactly 1"


"1 or more"


"Exactly 2"


"2 or more"



See however the caveats in annotating cardi
nality.


Sequence sets

Sets of sequences, where these correspond to the typical database entry records, would be annotated in the
way indicated above, i.e. using:

One of
"Sequence record"

or
"Sequence record lite"


A term under
"Sequence"

(for sequence t
ype)

A term under
"Cardinality"

(for number of sequences).

For sets of sequences which do not correspond to the typical database entry records, such as sets of
sequences produced from analysis with derived metadata, terms under
"Sequence set"

might be
ap
propriate:


[Term]

id: EDAM:0000850

name: Sequence set

namespace: data

def: "Any collection of multiple molecular sequences and associated metadata that do not (typically)
correspond to common sequence database records or database entries." [EDAM:EBI "EMB
RACE
definition"]

comment: This term may be used for arbitrary sequence sets and associated data arising from processing.

is_a: EDAM:0002044 ! Sequence data


[Term]

id: EDAM:0001233

name: Protein sequence set

namespace: data

def: "Any collection of multipl
e protein sequences and associated metadata that do not (typically)
correspond to common sequence database records or database entries." [EDAM:EBI "EMBRACE
definition"]

is_a: EDAM:0000850 ! Sequence set


[Term]

id: EDAM:0001234

name: Nucleotide sequence se
t

namespace: data

def: "Any collection of multiple nucleotide sequences and associated metadata that do not (typically)
correspond to common sequence database records or database entries." [EDAM:EBI "EMBRACE
definition"]

is_a: EDAM:0000850 ! Sequence set


For example:


id: EDAM:0001238

name: Proteolytic digest

namespace: data

def: "A protein sequence cleaved into peptide fragments (by enzymatic or chemical cleavage) with
fragment masses." [EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0001233 ! Protein sequenc
e set

Sequence annotation

When annotating data for annotation of a sequence, such as sequence features, the following terms may
be used:


id: EDAM:0000855

name: Sequence annotation

namespace: data

def: "A report of general information, properties or featu
res of one or more molecular sequences."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002048 ! Annotation


id: EDAM:0001254

name: Sequence property

namespace: data

def: "A report on general properties (non
-
positional) of molecular sequence(s) derived from s
equence
data." [EDAM:EBI "EMBRACE definition"]

synonym: "Sequence properties report" EXACT []

is_a: EDAM:0000855 ! Sequence annotation


[Term]

id: EDAM:0001255

name: Sequence feature annotation

namespace: data

def: "Annotation of features of molecular sequ
ence(s) that can be mapped to position(s) in the sequence."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0000855 ! Sequence annotation


The basic distinction between
"Sequence property"

and
"Sequence feature annotation"

is that the latter is
used positional
features, the former for non
-
positional properties.

Under
"Sequence feature annotation"

there are:


id: EDAM:0001270

name: Sequence feature table

namespace: data

def: "Annotation of molecular sequence features, organized in a feature table in any known s
equence
feature table format." [EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0001255 ! Sequence feature annotation


id: EDAM:0001271

name: Sequence feature report

namespace: data

def: "Sequence feature annotation not in standard feature table format, includin
g the location or matches
of motifs and patterns in a sequence." [EDAM:EBI "EMBRACE definition"]

comment: Is a source of sequence feature table information although internal conversion would be
required.

is_a: EDAM:0001255 ! Sequence feature annotation


Th
e distinction here is between feature tables and everything else (non
-
standard reports on features).
There are many terms under
"Sequence feature report"

to capture the outputs of the various operations
listed in EDAM.

Handling cardinality

EDAM terms do n
ot (typically) reflect cardinality (the number of a certain thing). It is assumed that this
will (typically) be handled elsewhere, for example, the permissible number of data elements (such as
input sequences) is specified explicitly in an XSD schema.

Som
e applications however may require to annotate cardinality. The following basic terms are provided:


id: EDAM:0002176

name: Cardinality

def: "The number of a certain thing." [EDAM:EBI "EMBRACE definition"]

...


id: EDAM:0002176

name: Exactly 1

def: "A sin
gle thing."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002176 !
Cardinality


[Term]

id: EDAM:0002176

name: 1 or more

def: "One or more things."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002176 !
Cardinality


[Term]

id: EDAM:0002176

name: Exactly 2

def:
"Exactly two things." [EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002176 ! Cardinality


[Term]

id: EDAM:0002176

name: 2 or more

def: "Two or more things."
[EDAM:EBI "EMBRACE definition"]

is_a: EDAM:0002176 !
Cardinality


They should be used conservatively
and never in cases which might lead to conflict with any explicit
statement of cardinality.


SAWSDL annotation


The proposed format of SAWSDL annotation includes the term namespace, unique identifier and URN
pointing to the term definition:



<element na
me="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id">

Where ...



element

is the XML element being annotated



elementName

is the name of the XML elemente



namespace

is the namespace of the EDAM term, e.g.
"operation"




id

is the unique
identifier of the term, e.g.
"0000295"



The term name, if required, could be given as an XML comment after the annotated element:



<element name="elementName" sawsdl:modelReference="http://purl.org/edam/namespace/id">
<!
--

term_name
--
>


This is
not

r
ecommended however as term names are not guaranteed to remain constant.

The value of the
sawsdl:modelReference

attribute is a URN pointing to the term definition. The proposal
is to use PURLs (Persistent Uniform Resource Locators) which include the term n
amespace.


So for these 3 terms:


[Term]

id: EDAM:0000182

name: Sequence alignment

namespace: topic

...


[Term]

id: EDAM:0000292

name: Sequence alignment

namespace: operation

...


[Term]

id: EDAM:0000863

name: Sequence alignment

namespace: data

...


We'd
have


http://purl.org/edam/topic/0000182

http://purl.org/edam/operation/0000292

http://purl.org/edam/data/0000863


Which can be used in SAWSDL annotation,

e.g.


<service name="water" sawsdl:modelReference="http://purl.org/edam/topic/0000182">

<operation
name="runAndWaitFor" sawsdl:modelReference="http://purl.org/edam/operation/0000292>

<xs:element name="outfile" sawsdl:modelReference="http://purl.org/edam/data/0000863>


If more than one annotation of an element is required, these can be given in the
sawsd
l:modelReference

attribute delimited by space characters:


<service name="water" sawsdl:modelReference="http://purl.org/edam/topic/0000182
http://purl.org/edam/topic/0000181">


Such multiple annotations need not be in the same namespace.


EDAM term end
-
p
oints

Note PURLs for EDAM terms have not not (yet
-

May 2010) been created!


When pasted into a browser, the PURLs in the examples above:


http://purl.org/edam/topic/0000182

http://purl.org/edam/operation/0000292

http://purl.org/edam/data/0000863


... wil
l (eventually) resolve to:


http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000292

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863


These are complete OBO term statements in plai
n text (OBO format). It will also be possible to give a
format specifier. For example these PURLs:


http://purl.org/edam/topic/0000182?style=html

http://purl.org/edam/operation/0000292?style=html

http://purl.org/edam/data/0000863?style=html


... will reso
lve to OBO term statements in HTML such that terms referred to in the statements (via
relations) will be clickable to allow navigation:


http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000182?style=html

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/e
dam/0000292?style=html

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000863?style=html


The eventual final list of end
-
points will provide other formats/views:

Plain text in OBO format (default)

HTML

OBO
-
XML

JSON

The term in a web browser, e.g
. NCBO Ontology Browser.


http://ebi.ac.uk/edam/0000182%format=txt

http://ebi.ac.uk/edam/0000182%format=html (the default)

http://ebi.ac.uk/edam/0000182%format=xml

http://ebi.ac.uk/edam/0000182%format=json

http://ebi.ac.uk/edam/0000182%format=browser


For

now, you can see this in action for this term:

http://purl.org/edam/entity/0000002

http://purl.org/edam/entity/0000002?style=html


EDAM in SRS

EDAM is available in the EBI SRS server:

http://srs.ebi.ac.uk/srsbin/cgi
-
bin/wgetz?
-
page+LibInfo+
-
lib+EDAM

And from the EBI
dbfetch
:

http://wwwdev.ebi.ac.uk/Tools/dbfetch/

Which allows the terms to be addressed in the ways above.

http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/edam/0000352 (plain text view)
or


http://wwwdev.ebi.ac.uk/Tools/dbfetch/dbfetch/eda
m/0000352?style=html (HTML view)

Note
http://wwwdev.ebi.ac.uk/

will eventually be migrated to
http://wwwdev.ebi.ac.uk/
.


Applications (annotation examples)


EMBOSS

EMBOSS applications have been annotated using EDAM and these annotations appear in corresp
onding
web services.


Annotated WSDL files (and associated XSD data schema) are available from:


http://wwwdev.ebi.ac.uk/soaplab/typed/services/list


You will see a list of service end
-
points with WSDL URLs. For example:


http://wwwdev.ebi.ac.uk/soaplab
/typed/services/alignment_consensus.cons.sa?wsdl


To see the data schema associated with a WSDL, you must replace
"?wsdl"

with
"?xsd=1"
,
"?xsd=2"

or
"?xsd=3"
. For example:


http://wwwdev.ebi.ac.uk/soaplab/typed/services/alignment_consensus.cons.sa?xsd=1


BioXSD

The BioXSD XML schema (XSD) defines exchange formats of everyday bioinformatics data types.
BioXSD aims to serve as the common, canonical data model for bioinformatics Web services. It includes
commonly used types including sequences, sequence annot
ations, alignments and references to resources:


http://bioxsd.org/


iHOP Web Services

The iHOP web services are annotated using EDAM terms, either directly or via their use of BioXSD:


http://ubio.bioinfo.cnio.es/biotools/iHOP/


References:

Pettifer S.,

Ison J, Kalas M., Thorne D., McDermott P., Valencia A., Salzemann J., Blanchet C., Breton
V., Uludag M., Rice, P., Bartaseviciute E., Rapacki K., Hekkelman M., van Helden J., Stockinger H.,
Clegg A., Bongcam
-
Rudloff E., Attwood T., Vriend G. And Cameron G
. The EMBRACE web service
collection, Nucleic Acids research, 2010, Vol 37, No. 12, 1
-
5.