Relational

burnwholeInternet και Εφαρμογές Web

5 Φεβ 2013 (πριν από 4 χρόνια και 7 μήνες)

145 εμφανίσεις

Bx

Tutorial, Database Flavor:

Updatable or Invertible Mappings

James F. Terwilliger

Microsoft Corporation

How Does the DB Field
Use Mappings?

DB

DB’

DB

DB

DB

DB

DB

DB

DB

DB

DB

DB

DB

DB

DB

App Model
Over Store

Data
Warehouse,
Schema
Versioning

Federated
System

Exchanged
Data Between
Applications

Model (Metadata) Management

S

T

M

Model?

Virtual?

Model?

Virtual?

Language?

Capabilities?

In General, Two Approaches

S

T

M

S

T

M

The View Update Problem

S

T

M

Concrete
Database

Application Model,
External Schema


Early work abstracted away the exact language of M, focusing
on what it means to be an updatable view


As work progressed, focus shifted somewhat to a choice of M


SQL


and deciding when an update policy can be computed

The View Update Problem

S

T

M

Relational

(Concrete)

Relational

(Tables only)

(Virtual)

SQL

Query

Query

Update

Let’s use the declarative query tool
we know and love


SQL


as a way to
express views!

(What could possibly go wrong!)

View Update References


Francois
Bançilhon

and Nicolas
Spyratos
.
Update
semantics
of relational views.
TODS
,
6(4):557

575,
December 1981.


Umeshwar

Dayal

and Philip A. Bernstein. On
the
correct
translation of update operations on
relational
views
. TODS, 7(3):381

416, September 1982
.


G
.
Gottlob
, P.
Paolini
, and R.
Zicari
. Properties
and
update
semantics of consistent views.
TODS,

13(4
):486

524, 1988
.


Jens
Lechtenbörger
. The impact of the
constant
complement
approach towards view updating.
PODS,
49

55, 2003.

V

u(V)

D

u(D)

u

f


f


u

View Updates: The Basics

View definition

Update
statement

(
Unique
) Transformed
update against the
physical database

Update translations available for some
syntactic restrictions on f

Constant Complement

(Semantics of View Updates)

D

V

V’

D


Updates leave the view
complement unchanged


Complement may not be
unique (must be chosen to
determine update semantics)

Great! Where Can I Get It?


Most database vendors do not implement past
the SQL92 standard


View must have:


No set operators


No distinct, no grouping


No joins or multiple FROM items


No smoking, talking, or chewing gum


Basically, only simple select/project queries

View Update Limitations (Among Many)


Large
queries are hard to debug (and read!)


Given a large query, how to report to the user
why a query is not updatable?


DB


Tables, not DB


DB


Syntactic restrictions are very strict



It is assumed that a query language can make
a good view expression language

Outline


A brief look at updatable views


Transformation
-
based approaches


PRISM/PRIMA


Guava/Channels


Holistic approaches


Object
-
relational mappings


Data
exchange and model management



Whole schemas on both ends


Interesting and notable mappings


Support for interesting update/bidirectional scenarios

PRISM/PRIMA (PANTA RHEI)

Slides adapted with permission from Carlo
Curino

PRISM

S

T

M

Application
Model
Version N

Application
Model
Version N+1

PRISM

S

T

M

Relational

(Virtual)

Relational

(Concrete)

SMOs

(Schema Modification
Operators)

Query

Update

Schema Evolution: common practice




Starting point
:



a Schema
S1
,



a database
DB1

under
S1
, and



a set of queries
Q1

formulated over
S1




Evolution in the
real
world:



The DBA defines an SQL DDL script modifying
S1

into
S2



The DBA defines an SQL DML script migrating data from
DB1

to
DB2



Queries
in
Q1

might
fail
, the DBA
adapts
them
manually:

Q2 = Q1’
+ Q2_new

(new queries added on
S2
)

Schema Evolution: common practice


Schema Evolution: common practice




Evolution in the
real
world:



The DBA defines an SQL DDL script modifying
S2

into
S3



The DBA defines an SQL DML script migrating data from
DB2

to
DB3



Queries
in
Q2

might
fail
, the DBA
adapts
them
manually
as in
Q3 =
Q2’ + Q3_new

(new queries added on
S3
)

Schema Evolution: common practice




DB Administrator (DBA) nightmares:



Data Migration:

Data loss, redundancy, efficiency of the migration,
efficiency of the new design



Impact on Queries
and applications




What is the real impact of schema evolution? [
iceis2008
]

Schema Evolution: Ideal
World




Evolution in an
ideal
world:



Evolution design is
assisted
and
predictable



Data migration
scripts are
generated
automatically



Legacy
Queries

(and updates, views, integrity constraints,…)
are
automatically adapted

to fit the new schema

Prism: from 30,000 feet above

Desiderata

PRISM proposed solutions

Support Evolution Design

Schema Modification Operators (SMO)

(a language to express changes to the schema)

Increase Predictability of
evolution

SMO static analysis: foresee impact on schema,
data and queries

Automate data migration

Generate SQL scripts from SMO sequences

Automate query support

Derive from
SMOs

logical mapping between
schema versions



Automatic query rewriting (
more recently work
integrity constraints propagation and updates
rewriting
)


Schema Modification Operators




SMOs
:



Atomic changes



SQL
-
inspired syntax



Operates on both schema and data



Practical completeness (coverage of available evolution scenarios)

Data Migration




SMOs to SQL translation:



Each SMO has a clear semantics that can be represented by a set
of SQL statements (DML + DDL)



Optimization issues (single/multi SMO)





SMO to SQL translation

RENAME TABLE
R

TO
S
;

CREATE TABLE
T
(
x

varchar,
z

varchar);

INSERT INTO
T
(SELECT
x,z
FROM S);

ALTER TABLE
S

DROP COLUMN
z
;

DECOMPOSE TABLE
R

INTO
S
(
x,y
),
T
(
x,z
)

Logical Mapping




SMOs to logical mapping translation:



Language: Disjunctive Embedded Dependencies (DED
)



DED is a subset of First Order Logic with expressions in the
following form:



Used to capture the relationships between (instances of)
consecutive schema versions


SMO to DED translation

JOIN TABLE
R, S
INTO
T

WHERE
cond

Query Rewriting Semantics




Data Migration:



d
2

is equivalent to d
1

migrated
by M and modified by
U
2



d
2

= M(d
1
)
±

U
2




Query Answering:



Goal: answer q1 on d2



q
1
(
M
-
1
(d
2
))





Query Rewriting Semantics:



Executing the rewritten query
q
1


is equivalent to migrating
d
2

back to schema
S
1
, by applying the
M
-
1
,

and executing
q
1



q
1

(d2) =
q
1
(
M
-
1
(d
2
))

Inverting
SMOs




The Invertibility problem (How do we get M
-
1
?)
:



DEDs are not invertible!



We deal with the
inversion at the SMO
-
level
(easier but not free)



Inverting SMOs:



Not
every SMO is
perfectly invertible




in this case we use SMO
quasi
-
inverse
1

(intuitively “the best you
can do”
-
inverse)



Not
every SMO has a
unique inverse




in this case the DBA must disambiguate these (rare) cases



1

we apply the notion of quasi
-
inverse by Fagin,
Kolaitis
,
Popa
, Tan, in
[
fkpt2006
,
fkpt2007b
]

SMO
Invertibility
: tricky cases




Quasi
-
Inverse example:


JOIN TABLE R, S INTO T WHERE
cond
;

a valid (and reasonable) quasi
-
inverse is:


DECOMPOSE TABLE T INTO R(x,y), S(x,z);




Multiple Inverses example:


COPY TABLE R INTO S;


the system provides the following candidates:

1.
DROP TABLE S;

2.
DROP TABLE R; RENAME TABLE S INTO R;


3.
MERGE R,S INTO R;

Query Rewriting



The Rewriting Process:



From the Inverse SMOs we
derive the inverse

DEDs mapping
M
-
1



Rewrite queries by
chase&backchase

(using
M
-
1
) [popa2000,
deutsch2006]



MARS [mars2003] optimized
chase&backchase

implementation


MARS Query Rewriting

Q1: SELECT * FROM R

Q1’: SELECT *


FROM S,T


WHERE
S.x

=
T.x

References

Panta

Rhei

Website


schemaevolution.org


PRISM


Carlo
Curino
, Hyun Jin Moon, Carlo
Zaniolo
: Graceful database schema
evolution: the PRISM workbench. PVLDB 1(1): 761
-
772 (2008
)


Carlo
Curino
, Hyun Jin Moon,
Alin

Deutsch, Carlo
Zaniolo
: Update Rewriting
and Integrity Constraint Maintenance in a Schema Evolution Support System:
PRISM++. PVLDB 4(2): 117
-
128 (2010)

GUAVA, CHANNELS

Slides used with permission by James
Terwilliger
… hey wait, that’s me!

Channels

S

T

M

Concrete
Database

Application
Model

Channels

S

T

M

Relational

(Concrete)

Relational

(Virtual)

“Channel”

Query

Update

Schema
Δ

33

Example:
Unpivot

ID

Attribute

Value

1

Name

Joe

1

Gender

male

1

Exp

1

2

Name

Sue

2

Age

31

2

Exp

4







ID

Name

Age

Gender

Exp

1

Joe

male

1

2

Sue

31

4











Schema
: eliminate Name, Age, … columns

Schema
: add Attribute, Value Columns

Instance
: for each row, produce one row for


each non
-
null attribute value, with the


attribute name placed in the Attribute


column and the value in the Value column.


Employee

Employee’

The unpivot CT


operationally

input DB

input schema

current input instance

input DB

output schema

with output instance

CT

unpivot the Employee table

where ID is the key for the table

(unpivot all remaining attributes)

insert into Employee

delete from Employee

update Employee

add column to Employee

add domain element to


Exp domain

query that uses Employee


input DML/DDL/queries

insert into Employee’

delete from Employee’

update Employee’

add column to Employee’

add domain element to


Exp domain

query that uses Employee’


output DML/DDL/queries

DDL/DML operations accepted by a
channel

VLDB 2010

35

CTs are composable into a channel

(CTs simply pass statements on untouched that
they don’t need to change)

36

Native
Schema

Horiz
.
Merge

Vert.
Part.

Apply

Unpivot

Horiz
.
Merge

Unpivot

Natural
Schema

Query,

Insert,

Update,

Delete,

Create,

Alter, Drop


statements

Query,

Insert,

Update,

Delete,

Create,

Alter, Drop

Error, Loop
statements

List of CTs that we have defined/implemented


VPartition



vertically partition into two tables with a 1:1
relationship


VMerge



vertically merge two tables with a 1:1 relationship


HPartition



horizontally partition based on the value in a column


HMerge



horizontally merge, and introduce a provenance
column


Pivot/Unpivot


move data to schema (pivot) or schema to data
(unpivot) (Move from/to generic/triple form to regular form.)


Apply


apply an invertible function to data in each tuple


Adorn


add
columns,
specify
data to be held in the new
columns, such as environment data


Audit


add
start/end timestamps, and ensure that data is
deprecated rather than
deleted

HMerge

(operationally) working on a query:

Replace
P_Client

to Person appropriately

FName

LName

Age

Bob

Smith

19







FName

LName

T

Age

Cert

Bob

Smith

P_Client

19



Ted

Jones

P_Staff



X











FName

LName

Cert

Ted

Jones

X







HMerge

P_Staff
:

P_Client
:

Person:

Query Q =
P_Client


(SELECT * FROM
P_Client
)

π
FName
,
Lname
, Age

σ
T=
P_Client

Person

FName

LName

Cert

Ted

Jones

X

Gail

Brown

X







HMerge

(operationally) working on DML
-


Insert statement

FName

LName

Age

Bob

Smith

19







FName

LName

T

Age

Cert

Bob

Smith

P_Client

19



Ted

Jones

P_Staff



X











FName

LName

Cert

Ted

Jones







HMerge

P_Staff
:

P_Client
:

Person:

FName

LName

T

Age

Cert

Bob

Smith

P_Client

19



Ted

Jones

P_Staff



X

Gail

Brown

P_Staff

X












I

(
P_Staff
,

{
FName
,
LName
, Cert},


{(Gail, Brown, X)})


I

(Person,


{
FName
,
LName
, T, Age, Cert},


{Gail, Brown,
P_Staff
,
null
, X})

HMerge

(operationally) working on DDL:

Create table (
P_Admin
)


add column and add domain
element statements

FName

LName

Age

Bob

Smith

19







FName

LName

T

Age

Cert

Bob

Smith

P_Client

19



Ted

Jones

P_Staff



X











FName

LName

Cert

Ted

Jones

X







HMerge

P_Staff
:

P_Client
:

Person:

FName

LName

Pay







P_Admin
:

FName

LName

T

Age

Cert

Pay

Bob

Smith

P_Client

19



Ted

Jones

P_Staff



X











Add “
P_Admin
” to the
domain of the column

Add “Pay”

column

Formal definition of a CT


A channel transformation is a 4
-
tuple (
S, I, Q, U
)


A schema transformation
S


An instance transformation
I


A query transformation
Q


An update transformation
U


S

and
I

define
semantics


Never implemented, but define what the CT means


Q

and
U

define
operation


Defines the translation of statements against the
natural schema to statements against the native
schema


d

d

I

q

=
Q
(s, q)

Q

q

q

Query Correctness

Instance transformation
for the channel transform

Q
uery against the virtual
database

T
ransformed query against
physical database

T
he query transformation is correct with
respect to the instance transformation

Query transformation for
the channel transform

s

u(s)

s

u(s)

u

S


u =
U
(s, u)

S


u

Update Correctness for Schema

Schema transformation
for the channel transform

Update
statement
(DML or DDL)

Update transformation for
the channel transform

Transformed update
against the physical
database

T
he update transformation is correct with
respect to the schema transformation

Query:

“SELECT * FROM t”

s

u(s)

s

u(s)

Q

u

S

u =
U
(s, u)

q
t

=
Q
(u(s),
q
t
)

q
t

u

q
t

Update Correctness for Instance

Update
statement
against table
“t”

The update transformation is correct with
respect to the instance transformation

PutGet

References

Overall Architecture (Guava):


James F.
Terwilliger
, Lois M. L.
Delcambre
, Judith
Logan: Querying
through
a user interface. Data
Knowl
. Eng. 63(3): 774
-
794 (2007)


Channels:


James F.
Terwilliger
, Lois M. L.
Delcambre
, David Maier, Jeremy Steinhauer,
Scott
Britell: Updatable
and Evolvable Transforms for Virtual Databases.
PVLDB 3(1): 309
-
319 (2010)

OBJECT
-
RELATIONAL MAPPINGS

Object
-
Relational Mappings: Overview


Applications written in an object
-
oriented
language have object
-
oriented data tiers


Persistence is a relational database



I
mpedance mismatch”


M
ap object constructs to relational constructs

Object
-
Relational Mappings

S

T

M

Concrete
Database

Application
Model

(Arrow Direction Could Go
Either Way)

Object
-
Relational Mappings

S

T

M

Relational

(Concrete)

Object
-
Oriented

(Virtual)


Specification


R
elational
equivalences


Mapping strategies

Query

Update

(Schema
Δ
)

An O
-
R Mapping Is…


… generally an operational specification rather
than a declarative query or set of queries


… tailored more to the purpose of mapping
inheritance and relationships to relations
rather than a general
-
purpose mapping

Mapping Patterns

(TPT)

(TPC)

(TPH)

Mapped to

Mapping Patterns:

TPH Sub
-
Categories

Name (string)

Salary (integer)

Name (string)

Office (integer)

Name1 (string)

Name2 (string)

Salary (integer)

Office (integer)

Name (string)

Salary (integer)

Office (integer)

String1 (string)

Integer1 (integer)

Fully disjoint

Reuse by column

Reuse by domain

Clear column
provenance

Clear name reuse

Maximum data
density

Mapping Patterns: Etc.

Horizontal Partitioning

Vertical Partitioning

Association Join Tables

Origin = ‘A’

Origin = ‘B’

0..1

*

OR

?

ORM Product Space


Ruby on Rails


Hibernate/
NHibernate


SQLAlchemy


Microsoft Entity Framework


Oracle
TopLink



Some major tradeoffs:


Expressiveness


Specification style

Hibernate Example

<hibernate
-
mapping>


<class name="
eg.hibernate.mapping.dataobject.Person
" table="
TB_PERSON
" polymorphism="implicit">


<id name="id" column="ID">


<generator class="assigned"/>


</id>


<set name="rights" lazy="false">


<key column="REF_PERSON_ID"/>


<one
-
to
-
many class="
eg.hibernate.mapping.dataobject.Right
" />


</set>


<
joined
-
subclass

name="
eg.hibernate.mapping.dataobject.Individual
"


table="TB_INDIVIDUAL">


<key column="id"/>


<property name="
firstName
" column="FIRST_NAME" type="
java.lang.String
" />


<property name="
lastName
" column="LAST_NAME" type="
java.lang.String
" />


</joined
-
subclass>


<
joined
-
subclass

name="
eg.hibernate.mapping.dataobject.Corporation
"


table="TB_CORPORATION">


<key column="id"/>


<property name="name" column="NAME" type="string" />


<property name="
registrationNumber
" column="REGISTRATION_NUMBER" type="string" />


</joined
-
subclass>


</class>

</hibernate
-
mapping>

Client Class

Store Table

TPT
-
Style Mapping

XML fragments almost correspond to
individual O
-
to
-
R transformations

TPT
-
Style Mapping

Entity Framework (EF):

A Brief Overview

Client
-
side (Objects):

Store side (Relations):

Classes

Tables

Q1 = Q1’

Q2 = Q2’

Q3 = Q3’




(select
-
project
only)

Query view V
Q

Update view V
U

Merge view V
M

Object Queries
(LINQ)

Object Updates

Mapping specified at
schema level

Mapping compiled to
views

Preserve fidelity of the
source data

56

Person:

id

name

title

EF Simple Example

Client
-
side (Classes):

Store side (Relations):

Person1(


id integer PRIMARY KEY,


name
varchar
(50),

)

Person2(


id integer PRIMARY KEY,


title
varchar
(50),


details
varchar
(2000)

)

π
id, name

Person =
π
id, name

Person1

Person =
π
id, name, title

Person1



Person2



57

π
id, title

Person =
π
id, title

Person2



Entity Framework: Major Results


Validation procedure ensures that a collection of
mapping fragment
roundtrips


Each client state maps to a valid state


Client state travel to store and back is invariant


Guarantees query and update safety



Mapping compilation procedure expressive
enough for common mapping scenarios, and
many uncommon ones


All of the mapping schemes previously noted

PutGet

ORM Challenges


Generally not extensible


Limited mapping expressiveness


Cannot add new mapping capabilities


EF mappings must be validated


Not always easy to write correct mappings


Error detection and correction is hard

References

ORM overview:


Scott Ambler:
http
://
www.agiledata.org/essays/mappingObjects.html


Entity Framework formalism:


Sergey
Melnik
, Atul Adya, Philip A. Bernstein: Compiling mappings to bridge
applications and databases. ACM Trans. Database Syst. 33(4): (2008
)


Entity Framework overview:


Atul Adya, José A. Blakeley, Sergey
Melnik
, S.
Muralidhar
: Anatomy of the
ADO.NET entity framework. SIGMOD Conference 2007:
877
-
888




References

Hibernate:

hibernate.org/


Oracle
TopLink
:
oracle.com/
technetwork
/middleware/
toplink
/overview/index.html


SQLAlchemy
:

sqlalchemy.org/


Entity
Framework:

msdn.microsoft.com/en
-
us/library/bb399572.aspx


Ruby on Rails:

http://rubyonrails.org
/



DATA EXCHANGE

Slides adapted with permission from Jorge Perez

Data Exchange

S

T

M

Concrete
Instance

Concrete
Instance

Data Exchange

S

T

M

Relational

(Concrete)

Relational

(Concrete)

FO Predicate Calculus

(source
-
to
-
target
tuple generating
dependencies)

Query

M
-
1

References

Another
Dagstuhl

Session! All About Data Exchange!


http://
www.tks.informatik.uni
-
frankfurt.de/events/deis10/topics


Data Exchange Tutorial


Marcelo Arenas, Pablo Barceló, Leonid
Libkin
,
Filip

Murlak: Relational and XML
Data Exchange Morgan & Claypool Publishers
2010


Includes a discussion of composition and inversion of mappings


Foundations of Schema Mapping Management


Marcelo Arenas, Jorge Pérez, Juan L. Reutter, Cristian
Riveros
: Foundations of
schema mapping management. PODS 2010:
227
-
238


Details of Maximal Recovery


Marcelo Arenas, Jorge Pérez, Cristian
Riveros
: The recovery of a schema
mapping: Bringing exchanged data back. ACM Trans. Database Syst. 34(4): (2009)