Project Stage 3 - Exercises

bewgrosseteteSoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

89 views

Project Stage 3
-

Exercises

Chris Maloney, 2013
-
07
-
22

This paper is divided into five sections, describing five aspects of my project, with examples and mockups as
needed.

1. NCBI RDF URI Standards document

As described in the project proposal, I
have written a document to standardize NCBI RDF URIs. This is attached
as
a separate PDF file, NCBIRDFURIStandards.pdf
.

The biggest difficulty with this task was getting the decision
-
makers at NCBI to spend the time to look at it and
approve it, since it w
as not high on their agenda. It has, however, been tentatively approved for use at NCBI.

2. Perl CGI Script

The backbone of my project is a driver script, written in Perl, that acts as a proxy and transformation engine
between NCBI E
-
utilities and a web cl
ient. The figure from the project proposal is reproduced here, for reference.


The main functions of the script have been implemented, and you can see the results on GitHub:

latest
,
snapshot as of 7/22
.

This script is deployed to my home server, at the URL
http://chrismaloney.org/eutils/einfo.cgi

(but note that this
server might not be active at any given time).

The functions that this script performs now are:



Responds to HTTP get requests for E
-
utilities resources,



Checks that a valid E
-
utilities

script name is given,



Invokes the remote NCBI E
-
utilities service,



Passes the result through an XSLT transformation,



Serializes the result to the client.

There are several things that still need to be done. These include the items marked with “FIXME” in t
he script, as
well as:



Error handling, in general,



Dispatching to the correct XSLT stylesheet depending on the request,



Handling of the “retmode=rdf” query
-
string parameter.

3. Example XSLT conversion: einfo

I have written the first XSLT stylesheet, that c
onverts the top
-
level NCBI EInfo output into RDF. You can see this
stylesheet on GitHub:
latest
,
snapshot as of 7/22
.

The input to this XSLT is from the NCBI EInfo API, at
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi
.
(
snapshot as of 7/22
).

The results are converted into RDF/XML, and are available from
http://chrismaloney.org/eutils/einfo.cgi

(again,
this server

may or may not be available at any given time, see this
snapshot from 7/22
)

I checked these results with the
W3C RDF validation service
, which reports

that it is valid RDF/XML.

4. Example of integrating an existing ontology: Dublin core

Another E
-
utilities output that I will convert into RDF, as described in the project proposal, is the NCBI
ESummary output for a PMC article, for example, from
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pmc&id=3159421&version=2.0

(see this
snaps
hot
from 7/22
).

For this example, I wrote a preliminary mockup of some of the elements that will be included in that RDF output,
using some terms from the Dublin Core ontology[1]. You can see this mockup on GitHub:
latest
,
snapshot as of
7/22
.

This mockup is not intended to be a complete representation of the ESummary output. If you look at the original
XML and the RDF side
-
by
-
side, you will see that there is a lot of information missing. Rather, the data that maps
into Dublin Core terms has bee
n identified, and written into the mockup.

Further work will include mapping
remaining terms to other vocabularies (such as
SPAR
) and to custom ontologies.

5. Example of a new Entrez ontology

The final

example is a preliminary ontology file, written in RDF/XML, for some of the classes and properties that
have been identified related to the NCBI Entrez API. You can see this ontology file on GitHub:
latest
,
snapshot as
of 7/22
.

In this ontology, I have identified several classes, including:



NCBI Entrez Dat
abase



Entrez database record



Entrez link
-

a subclass of rdf:Property



A nuccore (nucleotide) sequence



A gene record



A PMC article

I’ve also specified some of the hierarchy among these, with
rdfs:subClassOf

relationships.

Finally, I’ve encoded information a
bout two Entrez links. These are relationships between items within NCBI
Entrez databases, and map very nicely to RDF properties.

An example that illustrates a link between a nuccore record and the gene records (i.e., it answers the question,
what gene cor
responds to this nucleotide sequence) is
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nuccore&db=gene&id=312836839
.

This is mapped

to the RDF property
entrez:nuccore_gene
, which is a
n
rdfs:
subProperty

of
entrez:link
, and the
ontology specifies its domain and range.

The reciprocal link is
entrez:gene_nuccore

(see this query result
, for example
:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=nuccore&id=159
). This is specified as the
owl:inverseOf

the
entrez:nuccore_gene

link.

References

[1]
Expressing Dublin Core metadata using the Resource Description Framework (RDF)