downloading - Projects at ARCS

assistantashamedData Management

Nov 29, 2012 (4 years and 6 months ago)

368 views

Ontology Annotation Service (OAS) (based on “PODD
-
OAS overview.pdf”)


Goals:



Annotate items in current systems and/or RDF documents using references to
community/domain accepted vocabularies



Provide browsing and search facilities for a range of vocabularie
s that are useful based
on the users/applications domain


Deliverables



Annotation database



Links ontologies with data items



Can be specific or non
-
specific. (Note, the form for these annotations was not
specifically not in scope for the project description
, but the design still needs to
know about the different options to effectively produce an interface)


Specific tag format using ontology term in a known domain specific context



Gene/Protein/Plant/Animal : Could include references to the object
along with
other items


Generic tag using ontology term in a generic context :



A publication contains the same basic Dublin Core metadata as
an experiment


Generic tag using ontology term in a specific context:



Any ontology term can be used, however, the context is lim
ited to
items from a particular dataset



Web Service to be used by computers and client applications to annotate items using
SOAP/REST/Other methods



Fetch the super and sub classes for ontology classes



Fetch the entire RDF description for an object using CB
D semantics as is
common with SPARQL DESCRIBE in many implementations


http
://
www
.
w
3.
org
/
Submission
/
CBD
/


This would include the rdf:type and any annotations



Search through a set of ontologies w
ith a search term and return the top matches



Apply annotation from ontology to object URI



Web Interface to be used by humans to directly annotate items using web pages



Show super and sub classes for ontology classes



Show rdf:type for OWL Individuals and an
y other rdf triples that are known



Show rdfs:label/dc:title as human readable label



Autocomplete or another search facility for searching through ontologies with a
search term to return the top matches



Apply annotation from ontology to object URI



Fetching
ontologies from inside a browser safely likely requires a proxy, see
proxy.php provided in rdfquery for an example



https://code.google.com/p/rdfquery/source/browse/trun
k/proxies/proxy.ph
p




Configurable lists of accepted ontologies for fields related to PODD projects and/or ALA
Biodiversity collections that will be used as the basis for the product



Keep the date retrieved and the location it was retrieved from for each o
ntology



Manipulating or developing these ontologies further is out of scope per the
project description


Background:



Semantic Web:



Semantic Web technologies such as Linked Data and RDF provide the basis for
standardised domain data to provide for data inte
rchange between peers, as well
as standardise data translation between semantically compatible, but
syntactically incompatible datasources. Rule based systems such as the
semantic rule based R2R framework (by Free University of Berlin) and the
Bio2RDF syst
em which includes a range of syntactic and semantic normalisation
rules [my implementation and design], enable scientists to encode their data
translation methods in a textual form. Other scientists can then reuse and
integrate different datasources using
a combination of rules from different
scientists, lowering the barriers to collaboration.



The use of HTTP URIs in Linked Data enables scientists to resolve data to
Human Readable and RDF documents using HTTP Accept header based
Content Negotiation. However
, HTTP URIs also have permanent utility as distinct
keys, without relying on the DNS system and the HTTP specifications, as they
can be used by discovery tools such as Federated SPARQL to discover the
locations of data based solely on the URI. This ability

may not always extend to
Human Readable documents if they do not encode semantic information using
RDFa or similar standards. RDF enables scientists to reference both official and
alternative representations in their investigations using different
--
linked
--
URIs,
making it possible to replicate experiments and publish the replicated results
without overlapping the original results.



These innovations, when integrated into a data management system, enable
scientists to collaborate using distributed computing
resources and datasets,
while enhancing the trustworthiness of past publications as a result of automated
replication.



Ontology annotations using owl:AnnotationProperty



AnnotationProperties are completely separate from data, ie, they cannot be used
to add
semantic information to Classes or Individuals. If they are, OWL
Reasoning will fail, or at best ignore the statements



OWL defines some inbuilt annotation properties that cannot be used for semantic
information


owl:versionInfo


rdfs:label


rdfs:comment


rdfs:
seeAlso


rdfs:isDefinedBy



http
://
www
.
w
3.
org
/
TR
/
owl
-
ref
/#
Annotations



Simple tagging ontologies



Rely on plain text to annotate items



http
://
www
.
holygoat
.
co
.
uk
/
owl
/
redwood
/0.1/
tags
/



Semantic tagging ontologies



http
://
moat
-
project
.
org
/
ontology



Can define a URI to give the tag a link to an existing URI



Does no
t rely on AnnotationProperty, so the results can be mixed into the RDF
Triples for the item that is being annotated.




Annotea



Domain independent



Uses non
-
standard URIs. For example, could have used Dublin Core for author,
modified, created etc. However, no
thing stopping us selecting some Annotea
terms and reusing DC for the rest, it isn’t XML after all!



http://www.w3.org/2000/10/annotation
-
ns#



Any form of permanent annotation requires storage, either
on users computers, or
ideally on an internet server. Access to annotations is then through either RDF based
methods, such as RDFa/RDFXml/N3/RDF
-
JSON or through non
-
RDF based methods
such as SOAP Web Services (including non
-
RDF
-
JSON based web services)


On
tology browsers:



Ontology browsers generally take an ontology and display it in a tree
-
like form based on
sub
-
class and/or property hierarchies



NCBO BioPortal:



Repository for ontologies with basic metrics and term browser


http
://
bioportal
.
bioontology
.
org
/
ontologies
/1128


The following link is a version of the above ontology, so it has a different
ontology identifier


http
://
bioportal
.
bioontology
.
org
/
ontologies
/40648?
p
=
terms



Free text annotation


Paragraphs of text are matched aga
inst text found in ontology properties and
classes


http
://
bioportal
.
bioontology
.
org
/
annotator
#



Annotation web service


Methodology:


http
://
www
.
bioontology
.
o
rg
/
wiki
/
index
.
php
/
Annotator
_
Web
_
service


http
://
www
.
bioontology
.
org
/
wiki
/
index
.
php
/
Annotator
_
User
_
Guide


Example code:


https
://
code
.
google
.
com
/
p
/
genewiki
/
source
/
browse
/
java
/
Miner
/
src
/
org
/
gnf
/
ncb
o
/
web
/
AnnotatorClient
.
java


GeneWiki is used as part of the Wikipedia gene wiki project


https
://
secure
.
w
ikimedia
.
org
/
wikipedia
/
en
/
wiki
/
Portal
:
Gene
_
Wiki



Gene Ontology



Annotation file format for Genes


http
://
www
.
geneonto
logy
.
org
/
GO
.
format
.
gaf
-
2_0.
shtml


Very limited, relies on the sole use of genes for annotations and the
reco
gnition of each column in the annotation format


Could easily be converted to RDF triples, using each row as a record and
each column as a property



Uses the Amigo ontology browser by default


Limited to use with GO


http
://
wiki
.
geneontology
.
org
/
index
.
php
/
AmiGO


Amigo is a perl script


http
://
wiki
.
geneontology
.
org
/
index
.
php
/
AmiGO
_
Manual
:_
Installation



Other Gene Ontology browser clients


http
://
www
.
geneontology
.
org
/
GO
.
tools
_
by
_
type
.
browser
.
shtml



CO
-
ODE Ontology browser



https
://
code
.
google
.
com
/
p
/
ontology
-
browser
/



Supports RDF and OWL code to
gether, as long as the RDF code can be loaded
into an OWL repository using OWLAPI



Java Servlets



EBI ontology browser



http
://
www
.
ebi
.
ac
.
uk
/
ontology
-
lookup
/
init
.
do



https
://
code
.
google
.
com
/
p
/
ols
-
ebi
/



Requires ontologies to be available in the OBO format



Java Servlets, with Lucene and RDBMS backend



Pellet ontology browser



OwlSight



http
://
pellet
.
owldl
.
com
/
ontology
-
browser
/



No sourcecode to download and no license for javascript code that is
downloaded



JOWL



Javascript owl query/visualiser



Extension for JQuery javascript framework



https
://
code
.
google
.
com
/
p
/
jowl
-
plugin
/



Last development was in 2008


Javascript frameworks



Dojo Toolkit



Well developed



Plugins available



RDF library for dojo



https://code.google.com/p/dojos/source/browse/




Not very extensive, but provides the ability to parse RDF/JSON, NTriples
and Turtle



http
://
www
.
dojotoolkit
.
org
/



AFL/BSD licenses http://dojotoolkit.org/license



JQuery



Well developed, but sparse on inbuilt features



Plugins available



RDF library for jquery



https://code.google.com/p/rdfquery/




Extensive support for RD
F, including RDF/XML, RDF/JSON,
Turtle and RDFa



Good Documentation and examples



Seems to be based on jquery
-
1.3.2

(Update: works with jquery
1.6.2)



Integrated an example of its usage into PODD
-
Webapp to debug
the new RDFa annotations from within the page i
tself



See oas
-
rdf.js



Auto
-
Complete plugin



http://view.jquery.com/trunk/plugins/autocomplete/demo/



See oas
-
autocomplete.js



Tree plugin



http://www.j
stree.com/



Tree control for

ontology browser



See oas
-
ontologybrowser.js



MIT/GPL Only essentially need to keep the copyright license intact with MIT
http://jquery.org/license/



YUI



Widely used



Well developed



http
://
developer
.
yahoo
.
com
/
yui
/



BSD license :
http://developer.yahoo.com/yui/license.html



Javascript unit tests



QUnit



http://docs.jquery.com/Qunit



See working for rdfquery at
https://code.google.com/p/rdfquery/source/browse/trunk/tests/jquery.alltes
ts
.html



JSTestDriver



https://code.google.com/p/js
-
test
-
driver/wiki/GettingStarted



Seems useful, but cannot get to work yet



Advantages: Tries to automatically launch multiple browser
s and run tests
simultaneously across them



DOH



http://dojotoolkit.org/reference
-
guide/util/doh.html





Ontologies



OBO



A wide range of independently developed ontologies



Each ontology has a

unique license, many are free for use by academics, but not
clear how many are not



PODD



A scientific workbench management ontology



License: GPLv3?



DC



Basic descriptions for document metadata



CC
-
BY license http://dublincore.org/



FOAF



Person to person relat
ed metadata with the exception of foaf:page which has
been widely used outside of person related scopes to describe the webpage for
any “thing”



CC
-
BY license http://xmlns.com/foaf/spec/



Tag



Simple folksonomy tagging, using RDF






Annotea



http://www.w3.org/2000/10/annotation
-
ns#



MOAT



Designed to provide a URI from an ontology to give a link for each textual tag



Bio2RDF data or ontologies



A range of biological and chemical ontologies



Different license for

each : Can find license using
http
://
bio
2
rdf
.
org
/
license
/
namespace
:
identifier

where namespace is one of the
namespaces, for example geneid and identifier is the id inside of the database


Ontology storage database



Vir
tuoso Open Source version



GPL v2 only
http
://
virtuoso
.
openlinksw
.
com
/
dataspace
/
dav
/
wiki
/
Main
/
VOSLicense



Database connectors, including sesame and jena providers, are available
outside
of the GPL if used separately



Sesame native storage



Java database engine



BSD license : http://www.openrdf.org/download.jsp



Sesame with database (Mysql/Postgresql/etc.)



BSD license : http://www.openrdf.org/download.jsp



Jena with database



SDB: Uses r
elational databases
http
://
www
.
openjena
.
org
/
SDB
/

Not considered to
scale as well a
s TDB



TDB: pure Java database engine
http
://
www
.
openjena
.
org
/
TDB
/

Considered to
sc
ale better than SDB



BSD license, although soon to be Apache License per
http
://
incubator
.
apache
.
org
/
jena
/



Some others available if necessary


Questions:


1.

What does this
goal

mean “
Develop or extend internal OWL/RDFS ontologies that
support the architecture of this functionality.”

2.

What se
rver and database resources are available to support this goal “Maintain the
ontology annotation for the life of the system.”

3.

Where is the source code going t
o be stored? Okay to use GitHub or will PODD
repository be mandatory?

4.