C-BRASS Canadian Bioinformatics Resources as Semantic Services

tastelesscowcreekΒιοτεχνολογία

4 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

88 εμφανίσεις

Mark Wilkinson UBC (Lead PI)

Michel
Dumontier

Carleton (Co
-
PI)

Christopher J. O. Baker UNBSJ (Co
-
PI)

C
-
BRASS

Canadian Bioinformatics Resources
as Semantic Services

Mandate


Expose

Canadian

bioinformatics

Web

resources

in

a

unified

and

automatable

manner

using

Semantic

Web

Services

framework
.





Bioinformatics

data

and

tools

will

be

easier

to

discover

and

utilize,

and

integrate

to

hasten

discovery
.





First

widespread

deployment

of

a

grid
-
framework

where

the

messages

are

“meaningful”

to

the

machine,

and

can

be

interpreted/re
-
interpreted

under

a

wide

range

of

scenarios
.



Goals


Utilize

novel

SWS

technologies

to

expose

Canadian

informatics

resources

on

the

emergent

Semantic

Web



Create

toolkits

for

semantically

“lifting”

legacy

resources

into

a

SWS

framework



Create

prototype

applications

demonstrating

a

variety

of

ways

of

constructing,

utilizing,

visualizing,

and

interpreting

the

services,

analytical

pipelines,

and

resulting

semantically
-
enriched

datasets
.


Web Service Adoption

The

low

uptake

of

modern

Web

integration

frameworks

by

the

bioinformatics

community

stems

from

two

primary

facets
:



Challenges in implementing these solutions



A gap between the abilities of existing
technologies and the needs and skills of the
target end
-
user.


SOAP


Simple Object Access Protocol (SOAP)
messaging only successful within well
-
defined,
often project
-
specific situations.



Lack of Semantics" in the Web Service
interface descriptions which precludes the
automated discovery of appropriate services,
and automated pipelining of data between
those services.


Semantic Web Service (SWS)


Achieved modest level of automated
interoperability due to limitations in the way
the semantics of Web Services are modeled:


SWS frameworks are implemented to support
legacy data representation frameworks, in
particular XML and XML Schema.


SWS have annotated XML Schema
components describing services based on
"meaning" of various input and output fields.

Semantic Web Services (SWS)


Automating workflow construction and
semantically validating the "sensibility" of the
connections between services (often referred
-
to as Schema
-
mapping)



XML Schema is semantically opaque, Applying
semantics to it through annotation is
extremely limited;


semantically
-
annotated XML tag can have only
one interpretation

SWS Frameworks describe:


Input and output data
-
structures


Operations of a Web Service.


BioMoby

Service Type ontology


a vocabulary describing analytical operations.


OWL
-
S and WSMO/WSML
Process Model


Before and After


Transformations during that state
-
change.


Single
-
term semantics
-

too simplistic


Process Models too complex,
-

No adoption


In transition


Data on the Semantic Web is encoded in RDF,
while data in most Web Service frameworks is
encoded in XML


From

XML/Schema
-
based to OWL/RDF
-
based
data representation


SAWSDL W3C
Rec

in 2008


inputs and outputs of Web Services can be
described in terms of ontological models.

User Communities (I)


End
-
user community does not usually have a
"process model" or "business model" in
-
mind
when searching for a Service.


Biologists execute a BLAST alignment


NOT
because they wish to run a sequence
similarity matrix over their input data;


BUT because
they are interested in finding
sequences that are related to their input
sequence by homology.


Key

is the
relationships

between the input and
output data.

Bioinformatics Community

Needs:


New metadata, i.e.


Bioinformatics Web Service annotations that
describes the biological properties between
input and output that are generated by that
Web Service.







SADI facilitates novel data discovery, interoperability,
and integrative behaviours that closely mirror the
needs and expectations of our end
-
user community
simply by indexing services based on this predicate
.



Semantic Web data
vs

data derived from Web Service.







SADI simply comprises a set of standards
-
compliant
conventions and suggested best
-
practices for data
representation and exchange between Web Services that
fully utilizes Semantic Web technologies.



SADI mandates the inclusion of a single required
annotation in the Web Service metadata that describes the
biological relationship ("predicate") that is created between
the input and output data of that Service


SADI Web Service Discovery


hasProteinSequence

Predicate
-
based web service invocation. Using the
hasProteinSequence

predicate in a
query automatically invokes a web service capable of obtaining the amino acid
sequence for
UniProt

entry P04637.

SADI:
Standards
-
compliant
recommendations for implementation



SADI consists of several bioinformatics services


SADI Services are stateless and atomic.


SADI Services consume and provide data via HTTP, POST and GET.


SADI Services consume and produce data in RDF format.


SADI Service interfaces are defined in terms of OWL
-
DL classes;


the property restrictions on these OWL classes define what specific
data elements are required by the Service and what data will be
provided by the Service, respectively.


Input RDF data


data is compliant / classifies into Input OWL Class
-

is "decorated" or
"annotated" by the service provider to include new properties
reflecting activities performed by the Web Service.


Output RDF data


is an instance of the OWL Class that defines the output of the service.

SADI Registry


Predicate Map


What can it do ?


SADI

provides

the

functionality

to

automatically

and

dynamically

discover,

access,

and

integrate

relevant

data

from

distributed,

non
-
uniform

data
-
sources

using

disparate

ontologies
.

Key

promises

of

the

Semantic

Web

!



SHARE

implementation

allows

users

to

query

over

data

that

might

not

exist

at

the

time

they

pose

their

query
.

A

query
-
specific

database

is

dynamically

generated

as

a

query

is

being

processed
;

effectively,

the

database

required

to

answer

the

question

is

automatically

generated

as

a

result

of

the

question

being

posed
.


Find
Gene Ontology terms (biological process,
cellular component, and molecular function
annotations) for proteins associated with
Parkinson's disease:

PREFIX
pred
: <http://es
-
01.chibi.ubc.ca/~benv/predicates.owl#>

PREFIX
ont
: <http://ontology.dumontierlab.com/>

PREFIX keyword:
<http://biordf.net/moby/Global_Keyword/>


SELECT ?term ?name

WHERE {



?protein
ont:hasTag

keyword:parkinson

.



?protein
pred:hasGOTerm

?term .



?term
pred:hasTermName

?name

}



SHARE connects SADI middleware to Pellet SPARQL query engine and DL
Reasoner
.

Semantic Health And Research
Environment (SHARE) prototype.

SADI Toolkit

"
RDFizing



Virtuoso Sponger
:




Bio2RDF
:


Native Service Provision and
"Wrapping" legacy CGI and
WSDL


Seahawk:




Dashboard:




Core SADI Service Codebase


SADI::Service::Core
:




jSADI
:





Quality of Service Testing


myGrid
/Moby unit
-
Test and
the Testing Agent:




Ontology Development Tools


Protege

4 and Top Braid
Composer
:




Client Applications


Taverna
:




SHARE:



IO Informatics Sentient
Knowledge Explorer plug
-
in
:



SADI Training Course Curriculum

Target Audience
-

The target audience for the training sessions includes primary or secondary data / service
providers as well as the full spectrum of bioinformatics students and professionals from academia and
industry.


Syntactic Web vs. Semantic Web:


Interoperability:


Knowledge
reprsentation

Standards:


RDF 101
-



OWL 101
-



Ontology Editors and Ontology
Design:


Inference and Reasoning:


Reasoning Engines:


Web Service Description
Languages



Web Service Registries and
Service Discovery:


Service
Ontologies
:


Workflow composition:


SAWSDL:


MyGrid
:


SADI 101


Bioinformatics Web Service
Requirements:


SADI Enabled services:


SADI toolkit:


Action Plan


Tier 1 involves active, hands
-
on migration of
native resources to a Semantically
-
enabled
Service.


Tier 2 involves “wrapping” resources from
non
-
participating providers via Services
hosted on C
-
BRASS servers.


Tier 3 involves on
-
site training in Semantic
Web Service technologies, and support for
their self
-
directed resource migration.


Success Criteria


Number of Services created/migrated, and
their use by consumers worldwide; (Minimum
400 in Canada)



Number of software tools created, and their
use by third
-
parties;



Number of Canadian HQP trained in
construction of Semantic Web Services.


Deliverables


A fully
-
documented definition of the SADI Semantic
Web Service framework, including submission of this to
an appropriate standards body (e.g. OASIS or OMG)



A set of core
ontologies

describing properties and
relationships for entities in the biomedical domain



A costing
-
model, for use by future Semantic Web
Service providers, outlining the establishment and
maintenance costs for the migration from legacy Web
or Web Service resources to a Semantic Web Service
framework.


Mark Wilkinson UBC (Lead PI)

Michel
Dumontier

Carleton (Co
-
PI)

Christopher J. O. Baker UNBSJ (Co
-
PI)

C
-
BRASS

Canadian Bioinformatics Resources
as Semantic Services