Open Conceptual Data Models

sounderslipInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

95 εμφανίσεις

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Making the Conceptual Layer Real

via

HTTP based

Linked Data (aka. Linked Data)

© 2008 OpenLink Software, All rights reserved

Situation Analysis

Linked Data Vision:


The transition of the HTTP based Webs (Intranet, Extranet,
or Internet)


from

a Webs of
Linked Documents


to

Webs of
interlinked Structured Data Items


(aka: entities, data objects, resources)


Concurrent trend in the IT industry:


A recognition of the benefits of
conceptual data models

over logical data models


The Big Question:


To what extent does Linked Data support conceptual level
data models ?

© 2008 OpenLink Software, All rights reserved

Open Conceptual Data Models

Topics:



Conceptual & Logical Data Models


Conceptual Models for the Semantic Web


Realizing Conceptual Models through Ontologies &
Linked Data


Virtuoso’s RDF based Linked Data Views


ADO.NET Data Services & the Entity Data Model

© 2008 OpenLink Software, All rights reserved

Data Model Layers




Physical


How data is physically represented on disk




Logical

(aka logical schema)


Expresses problem domain in terms of data
management technology (tables / columns)


e.g. relational schema




Conceptual

(aka conceptual schema)


Purely
semantic

description of problem space


Describes
things

(entities), their
characteristics

(attributes) &
associations

between things (relationships)

© 2008 OpenLink Software, All rights reserved

Logical Data Model


Most prominent of the three data model types



Main focus of database driven applications


Due to pervasiveness of relational database driven
applications within the enterprise and across the Web


Weaknesses


Impedance mismatch


Loss of semantics during development process


Heterogeneous databases & interoperability

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Impedance Mismatch


SQL expresses queries in terms of tables / views



=> targets logical schema


Normalization fragments the data model


Entities & their attributes may be split across several
tables


Navigation between objects requires relational joins over
two or more tables


Table rows must be reconstituted into higher level
conceptual entities


Conceptual level data model is desirable to:


Remove impedance mismatch


Isolate application from changes to logical data model


Provide framework for productive
human level

interaction

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Loss of Semantic Fidelity During Development


Process:


Develop conceptual model (E
-
R modelling)


Transform to logical model for implementation


DBMS generates physical model



Problems:


Each move to a lower level model depreciates semantic
fidelity of the higher level model


Conceptual Model semantics fragmented across schema /
business rules / application code


Application & Users must understand logical data model


Must be hardcoded or inferred (imperfectly) from system
tables

© 2008 OpenLink Software, All rights reserved

Logical Data Model Weaknesses

Heterogeneous Databases & Interoperability


Logical data model


Describes problem domain in terms of tables/columns


Requires costly table joins to navigate model


Application


Exposed to specifics of a particular vendor’s RDBMS

In heterogeneous database environment
, must handle


Different SQL dialects


Different schemas


No explicit data model. No explicit semantics.


Interoperability/integration = perpetual problem for IT depts



© 2008 OpenLink Software, All rights reserved

Conceptual Models for Linked Data Webs

Explosion of User Generated Data from Web 2.0 applications
and their Data Silos is driving the recognition of the need to
move from logical to conceptual models, exemplified by:



Microsoft’s Entity Data Model / Entity Framework


W3C’s Semantic Web Project which includes powerful
technologies for this paradigm shift such as:


Resource Description Framework (RDF Data Model and
Data Representation Formats)


Web Ontology Language (OWL)


SPARQL (Query Language, RESTful Interface, and
Query Result Serialization Formats)




© 2008 OpenLink Software, All rights reserved

Benefits of Conceptual Models



More faithfully represents human view of domain of
interest


Conceptual model & semantics


Explicit & available globally


Not implicit & fragmented across business logic /
UI etc


Better / explicit semantics facilitates move from
“search” to “esoteric precision find”


Much easier heterogeneous data integration


User Generated Data is inherently
heterogeneous & disparately located


© 2008 OpenLink Software, All rights reserved

Application Areas


Present & Future


Social Media, eCommerce, Distributed Collaborative Apps.


Require shareable, standards
-
based, cross
-
platform
conceptual views of data


Data portability


Needed as users maintain multiple points of presence &
identity across


blogs, social network accounts etc.


Open business models


Require exchange & integration of large amounts of data


Scientific research



sharing of knowledge & findings


Requires transparent access to distributed
heterogeneous data


Requires database integration using global schema


Autonomous intelligent agents


Free humans from large
-
volume data processing


© 2008 OpenLink Software, All rights reserved

Semantic Web Project Technologies

These technologies offer:


Ontologies


For representing common semantics


Spanning databases, applications, enterprises, on
-
line communities


Deliver shared conceptual model


Provide common schemas (Dublin Core, FOAF, SIOC, GoodRelations
etc)


Common Semantics (Ontologies) & Common Data Representation (RDF)


Enable cross data source querying using SPARQL


Data across several databases (or data spaces) can be meshed,
expanded, and explored


Querying using proprietary APIs unnecessary


Brute force data merging via code is unnecessary


Open Data Formats, Platform Independence, Common Models


Facilitate data portability, accessibility, and integration.

© 2008 OpenLink Software, All rights reserved

Realizing Conceptual Models

Ontologies


Provide the building blocks for conceptual models


Define the concepts and their relationships in a domain of
interest (or world view)


Describing Classes & Properties


Ontology Languages


RDFS


Introduces the notions of concepts (classes) & instances


OWL


Adds more vocabulary for describing:


relations between classes


cardinality


richer typing of properties, etc.


© 2008 OpenLink Software, All rights reserved

Goodness of Fit



RDF was designed from the ground up as a metadata data
model


RDF / RDFS / OWL work directly at the level of conceptual
models


Conceptual model terminology matches RDF/OWL
terminology


Concepts, entities, attributes, relationships.


A natural fit!

RDF lends itself naturally to describing conceptual models.



© 2008 OpenLink Software, All rights reserved

Semantic Expressivity Comparison

Data Definition Language (DDL)
-
based Relational Model


Relationship between two entities isn’t explicit


Foreign key relating two rows in separate tables doesn’t
express the nature of the relationship


Semantics must often be inferred from table definitions


RDF
-
based Conceptual Model


Relationship between two entities is stated explicitly by
predicate in subject
-
predicate
-
object triple


Semantic expressivity of RDF/RDFS/OWL is much better
than DDL


Has richer semantic content than equivalent DDL
-
based
logical/relational model

© 2008 OpenLink Software, All rights reserved

RDF Conceptual Model


Artist / Records / Tracks

© 2008 OpenLink Software, All rights reserved

Global Granular Information Sharing

Traditional Logical/Relational Data Model


Schema described by DDL is internal to DBMS


Primary keys identifying an individual table row

(i.e. entity instance) not globally unique, not easily usable
outside host DBMS


Gives rise to
‘data silos’


RDF’s use of Generic HTTP
-
based URIs


Externalises the data and schema


Makes both globally accessible & scalable


Provides globally unique IDs for entities/relations/classes


A vehicle for
granular, global information sharing
down to
the equivalent of the record level.




© 2008 OpenLink Software, All rights reserved

Linked Data


What is It?

A method for exposing, sharing & connecting data on
HTTP based Data Networks.



A term coined by Tim Berners
-
Lee that describes a
RESTful mechanism for HTTP based
Data Access
& Manipulation by Reference


A record level HTTP based Open Data Access

&
Connectivity mechanism


A richer hyperlinking mechanism that takes us from
Hypertext Links (Document to Document) to
Hyperdata

Links

(Data Item to Data Item).



© 2008 OpenLink Software, All rights reserved

Linked Data


Why Is It Important


It exposes the compound nature of Data
Containers (e.g., Documents) such that


Data Containers are uniquely identified &
referenceable


Data Items within Data Containers are uniquely
identified & referencable


It provides a conceptual model oriented
Open Data
Access & Connectivity

mechanism


It delivers a powerful mechanism for meshing
disparate and heterogeneous data sources.

© 2008 OpenLink Software, All rights reserved

Linked Data Model

Changes the focus from linked documents to linked entities

The document as a data container becomes less relevant

© 2008 OpenLink Software, All rights reserved

Hyperdata Links Between Data Items

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits


Data Exploration

Natural Navigation Through Typed Links


RDF entities (instance data, classes, and properties) are
identified by
dereferencable

HTTP URIs


Navigating from one data item to another is easy via:


Single LINK click from any HTTP user agent
commences
data item

relationship navigation


Linked Data Browers such as OpenLink Data Explorer


Relational/Logical Model


Cumbersome


Requires SQL joins + typically Object
-
Relational mapping


e.g. in C# :

track = lennonAlbum.Tracks[“Imagine”]


© 2008 OpenLink Software, All rights reserved

Linked Data Benefits
-

Aggregatable Data

Often desirable to have an
integrated view

of all the data
available about an item or topic


Database Realm


Integration problematic, difficult to combine logical schemas


Semantic Web


Data aggregation is easy: every resource has a unique URI


Individual items can be linked


Conceptual models can be linked


Cross
-
domain links enrich domain knowledge


Different facets of the same
data item

may be described by
different URIs minted by different authors


Can be linked. e.g. owl:sameAs, rdf:type predicates


May expose facts not directly represented in any one
source

© 2008 OpenLink Software, All rights reserved

Linked Data


Data Aggregation

© 2008 OpenLink Software, All rights reserved

Linked Data Benefits
-

Self Describing Data

Resource Description Framework (RDF)


A technology for creating self
-
describing Web resources


Data Item’s type definition ‘accompanies’ it via
rdfs:type
relations


An RDF based data can be queried using SPARQL without
knowing anything beforehand about the data definition
(schema comes last in this realm)


Provides the
basis for powerful deductive data exploration
tools


Logical / Relational Schema


Users / applications need a detailed understanding of the
schema to use and navigate the data


Application’s knowledge of the schema typically hardcoded


Ad
-
hoc end
-
user data exploration potentially error prone




© 2008 OpenLink Software, All rights reserved

Linked Data Benefits
-

SPARQL

If a user agent has no built
-
in knowledge of a particular Data
Item, it can dereference its Generic HTTP URI to obtain
such information


The Power of SPARQL

Discover what sorts of things a data source contains


select distinct ?URI ?ObjectType where { ?URI a
?ObjectType }

Determine all the properties of an data item’s class


select * where { <http://my.org/resourceTypes/Department>






?property ?hasValue }

Determine all the properties and values of an data item
instance


DESCRIBE <http://my.org/resource/Accounts>







No prior knowledge of the RDF data source is needed

© 2008 OpenLink Software, All rights reserved

Virtuoso
-

Linked Data Generation Options

Conceptual layer insulates Linked Data consumers from RDFization infrastructure & data source heterogeneity

© 2008 OpenLink Software, All rights reserved

Virtuoso RDF based Linked Data Views


Expose relational model data as RDF graph model data


Provide the means to move from a logical model to a
conceptual model view



Available for querying through SPARQL or SPASQL
(SPARQL embedded in SQL)


No physical regeneration of relational data



RDF Views =


Virtuoso RDF Meta
-
Schema (MSL) +


Meta
-
Schema Language


MSL =


A domain specific, declarative language for mapping a
logical SQL data model to a conceptual RDF data model


© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:

RDF View Definition Extract

Customer
ID

Company


Name

Contact
Name

Contact

Title

Address

City

Postal
Code

Country

Phone

Fax

prefix northwind: <http://www.openlinksw.com/schemas/northwind#>



create iri class northwind:Customer

<http://^{URIQADefaultHost}^/Northwind/Customer/%U#this> (in customer_id varchar not null)



alter quad storage virtrdf:DefaultQuadStorage



from Demo.demo.Customers as customers

from Demo.demo.Orders as orders … {


Demo.demo.Customers

Northwind RDF View Definition

create virtrdf:NorthwindDemo as graph iri (“http://^{URIQADefaultHost}^/Northwind”) {



northwind:Customer(customers.CustomerID) a foaf:Organization as virtrdf:Customer
-
CustomerID ;

northwind:companyName customers.CompanyName as … ;



northwind:fax customers.Fax as virtrdf:Customer
-
fax .



} }

northwind:Customer(orders.CustomerID)

northwind:has_order northwind:Order(orders.OrderID) as virtrdf:Order
-
has_order .

© 2008 OpenLink Software, All rights reserved

Northwind Demo Database:

Customer Table to RDF
data item

Mapping

Customer
ID

Company


Name

Contact
Name

Contact

Title

Address

City

Postal
Code

Country

Phone

Fax

ALFKI

Alfreds
Futterkiste

Maria
Anders

Sales
Represe
-
ntative

Obere Str.
57

Berlin

12209

Germany

030
-

0074321

030
-

0076545

companyName

contactName

contactTitle

address

city

PostalCode

country

phone

fax

Alfreds

Futterkiste

Maria

Anders

Sales

Representative

Obere Str. 57

Berlin

12209

Germany

030
-
0074321

030
-
0076545



Order/10643#this

has_order

Order/10692#this

has_order

Customer/ALFKI#this

prefix <http://demo.openlinksw.com/Northwind/>

has_customer

has_customer

Order


ID

Customer

ID



10643

ALFKI



10692

ALFKI



Orders Table

© 2008 OpenLink Software, All rights reserved

LinqToRdf + Virtuoso

© 2008 OpenLink Software, All rights reserved

LinqToRdf to MusicBrainz
-

Conceptual Model Veneer

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

A framework for exposing ‘pure data’ service over HTTP


No support for RDF


Fails to imbibe any of RDF’s inherent benefits


Lack of platform independence & standards compliance


Supports REST
-
style interfaces


Supports Atom, JSON and XML payloads

But


Server
-
side: Windows only


Consuming Astoria services at a higher level requires
Windows .NET client or Silverlight
-
supported browser

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

Server
-
side only conceptual model


Powerful URL addressing to query/navigate/sort/filter etc


Customers collection:

http://myserver/data.svc/Customers


Customer ALFKI:

http://myserver/data.svc/Customers('ALFKI')


Customer ALFKI's orders:
http://myserver/data.svc/Customers('ALFKI')/Orders

But


Client must know conceptual schema


e.g. to construct above URIs


Lack of Deferencable Entity IDs


Ability to discover entities and dereference their
descriptions (attributes/relations) is confined to the facilities
offered by .NET


c.f. SPARQL’s ability to handle unknown data sources

© 2008 OpenLink Software, All rights reserved

ADO.NET Data Services & Entity Data Model

No Support for Non
-
SQL Data Sources


Astoria is aimed exclusively at making relational data Web
accessible


c.f. Linked Data

Realm


Recognize that vast amounts of data resides in
unstructured and semi
-
structured data sources


Support for embedding RDF into existing (X)HTML


RDFa, GRDDL, eRDF


Emerging tools for converting non
-
RDF data to RDF model
data


Emerging tools for exposing Relational data as RDF Graph
Model data


Astoria lacks scalability & scope of Semantic Web technologies