Life Sciences and Other Domains

farmpaintlickInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

75 εμφανίσεις

Contact:

©2011 Cambridge Semantics Inc. All rights reserved. Company Confidential.

Semantic Web Technologies on HPC for
Life Sciences and Other Domains

Sean Martin

Founder & CTO

Cambridge Semantics
Inc.

sean@cambridgesemantics.com

+1 617 606 341


Contact:

©2011 Cambridge Semantics Inc. All rights reserved. Company Confidential.

Semantic Web Technologies on HPC
for
Life Sciences and Other Domains

Sean Martin

Founder & CTO

Cambridge Semantics
Inc.

sean@cambridgesemantics.com

+1 617 606 341


©2011 Cambridge Semantics Inc. All rights reserved.

What is/are Semantic Technologies anyway?


Semantics

(from
Greek

sēmantiká
, neuter plural of
sēmantikós
)

is
the study of
meaning
.


10 Semantics experts in a room = 11 opinions








©2011 Cambridge Semantics Inc. All rights reserved.


Usually proprietary, mostly heuristics/statistics based


Search (not query)


Usually extract meaning from unstructured data
(text/video
etc
)


Examples:



or


Enterprise search e.g. or


Entity extraction,
a
utomated tagging, text analytics


Natural Language Processing Technologies (NLP)


Automated Translation e.g. Google Translate


SMILA & UIMA
o
pen source frameworks











Little “s” semantics

©2011 Cambridge Semantics Inc. All rights reserved.


W3C recommendations (open data standards)


Machine readable, query (not search) & instant data
integration


The Semantic Web


Also known as “Linked Open Data”


Also known as “Web 3.0


Examples:


Google “rich snippets”



OpenGraph


The Good Relations Ontology e.g.


Public Government Data (USA, Europe, UK)



All sorts of startup activity







Big “S” Semantics


Paint starting to dry

©2011 Cambridge Semantics Inc. All rights reserved.

What are the W3C’s Open Data Standards?


RDF


OWL


SPARQL


There are others, but these are the key ones

©2011 Cambridge Semantics Inc. All rights reserved.

RDF


Self describing (tagged) instance data


Facts or Triples :

<subject> <predicate> <Object/Value>


Collections of triples creates a directed labeled graph

<subject> and <predicate> are globally unique strings or URIs
e.g.
http://www.cambridgesemantics.com/people/sean



©2011 Cambridge Semantics Inc. All rights reserved.

OWL


OWL (Web Ontology Language)


Describe data models in a way that
domain expert
would


What triples or facts are needed to properly describe
something
and its relationship to other similarly described things?










Relationships for inference and other kinds of
reasoning

©2011 Cambridge Semantics Inc. All rights reserved.

SPARQL


The first standards based distributed
query language
for RDF data & the
Web


Wow!




©2011 Cambridge Semantics Inc. All rights reserved.

Important properties of RDF


Machine readable model / programs can “understand”


Unique Identity of every data element


Subject is a unique identifier


Predicates (the relationship) is also a unique identifier


Object can be a unique identifier pointing to another subject


That’s how we get directed graphs


Allows annotation (the unique subject string provides an
“anchor” for 3
rd

party metadata)


Allows provenance (especially useful when data travels beyond
its source system or needs to be updated)


Semantic Type (not just primitive data types)


Lets programs immediately know what type of data they are
dealing with, allowing automated contextualization of
information

©2011 Cambridge Semantics Inc. All rights reserved.

So what does any of this change?


Adoption of the semantic standards will be disruptive
in at least two ways that create enormous value

1.
Who can do what.
Much easier.


Pushing the bar further and further towards end user self
-
service

2.

How long it takes.
Much faster.


Each new wave of technology brings at least an order of
magnitude productivity increases, often more

Recent waves: Web Services/SOA; Java (no memory
management); Virtualization etc.


S
emantic technology is another wave

©2011 Cambridge Semantics Inc. All rights reserved.


Where
do these benefits come from?



Using Semantic Technologies, the
end
users
understanding of their data need
be the only system
or application
model required


This allows the construction of applications &
systems to move
from
what have until now been
carefully planned, structure dependent “all up front”
designs over to malleable conceptual representations
that can be evolved quickly



Systems
go from being brittle to
flexible



Systems can change at the speed the business does



End Users can increasingly make more of these changes
directly themselves



©2011 Cambridge Semantics Inc. All rights reserved.

Preserving the end users model

Traditional middleware


Relational Model Physical



Relational Model Logical



Object Relational Model



Business Objects Model



User Interface Model



Users idea of the Model

Semantic middleware


Users Model*










*Warning: dramatically over simplified to make a point

©2011 Cambridge Semantics Inc. All rights reserved.


Exploding data volumes


tagging creates 10x more data


Random Access is expensive


>35 Years of optimization around RDBMS is not helping


too many “self
-
joins” on a three column table


No index support


Adding an additional layer of indirection is expensive


every time you want to display a value you need to
dereference it



Paying the price for all this flexibility

©2011 Cambridge Semantics Inc. All rights reserved.

Paying the price for all this flexibility


enabling trends


W3C Semantic standards


A decade of semantic
middleware+storage

R&D


Multi
-
core CPUs


Fast networking


Cheap RAM


Web 2.0 blazing
the trail
with a
new RAM based
application model?

Disk is the new tape?

Twitter
, Facebook, LinkedIn and
iostat


SSD


The changing cost of the sub 4k random access read and
what it means to transaction processing systems and the
applications that run on them



©2011 Cambridge Semantics Inc. All rights reserved.



Spot the difference


Then..


Now

©2011 Cambridge Semantics Inc. All rights reserved.

And finally, so what does any of this have to do with HPC?


Cray’s XMT Systems

+

Very large quantities of RAM arranged in a contiguous block

+

Very low latency memory access

+

Large number of CPUs

+

Large number of cheap threads

=

Full pipelines




Great for interactive applications creating random
access queries patterns, particularly complex ones
requiring many joins




©2011 Cambridge Semantics Inc. All rights reserved.

Other HPC related Semantic efforts


Raytheon BNN’s SPARQL on
MapReduce

clusters


WebPie



VU University of Amsterdam’s OWL Horst
Inference on
MapReduce


Clustered RDF triple stores


Open Link’s
Virtuosa

data store


Ontotext’s

Big OWLIM


Franz
Inc’s

AllegroGraph

©2011 Cambridge Semantics Inc. All rights reserved.

Semantics & the Enterprise


not waiting for the network effect



Overview of Cambridge Semantics Middleware Platform

A W3C
-
based semantic middleware for real
-
time user
driven operational intelligence

Allow business users
& customers
/partners to
:


Discover &
connect to any data
in databases & other systems on
the fly


Create
dashboards &
applications
on demand

Allows IT to:


Rapidly integrate data
across
silos and firewalls


Expose
business
policies, rules &
workflow to business users


Implement manual intervention
with automated response


Enterprise
-
class security,
governance, provenance, …


©2011 Cambridge Semantics Inc. All rights reserved.

Thanks for listening


Further Interest and a completely different view


Sir Tim Berners
-
Lee’s TED Talk on the next web


Questions/Objections?


Stop me & ask/state


Contact details again


Sean Martin

Cambridge Semantics Inc.

sean@cambridgesemantics.com

+1 617 606 341