The Rise and Fall and Rise of Dependency
Theory
Part II: The Rise from the Ashes
Ronald Fagin
IBM Almaden Research Center
2
Dependencies were Considered Harmful
Dependencies were undesirable
Except for keys and referential integrity constraints
Database normalization eliminated dependencies
BCNF: each FD is a logical consequence of keys
4NF: each MVD is a logical consequence of keys
5NF: each JD is a logical consequence of keys
3
But then:
Dependencies took on a new, very positive role!
4
Data Integration and Data Exchange
Data integration
:
Describe data in a global schema in terms
of data in local schemas
Data exchange
:
Describe data in a target schema in terms
of data in a source schema, and actually
produce the target database
5
Data Integration and Data Exchange
These are old, but recurrent, database problems
Phil Bernstein
–
2003
“
Data exchange is the oldest database problem
”
EXPRESS
: IBM San Jose Research Lab
–
1977
for transforming data between hierarchical databases
The universal relation model is
an early case of data
integration
We will focus mainly on data exchange
6
Schema
Mappings & Data Exchange
Source
S
Target
T
Schema Mapping
M
= (
S
,
T
,
Σ
)
Source
schema
S
,
Target
schema
T
High

level, declarative assertions
Σ
that specify the
relationship between
S
and
T
Data Exchange
via the schema mapping
M
= (
S
,
T
,
Σ
):
Transform a given
source
instance
I
to a
target
instance
J,
so that <
I, J
> satisfy the specifications
Σ
of
M
I
J
Σ
7
Schema Mapping Specification Language
The relationship between source and target is typically
given by
source

to

target tgds
(
x
)
y
(
x
,
y
)
where
(
x
)
is a conjunction
of atoms over the source
(
x
,
y
)
is a conjunction of atoms over the target
(Student(s)
Enrolls(s,c))
t
g (Teaches(t,c)
Grade(s,c,g))
There may also be target tgds and egds:
Grade(s,c,g))
Grade(s,c,g’))
(g = g’)
8
New Role of Dependencies
In data exchange, dependencies play a crucial role in
describing how to transform data from one format to another
9
Solutions in Schema Mappings
Definition
: Schema Mapping
M
= (
S
,
T
,
Σ
)
If
I
is a source instance
, then a
solution
for
I
is a
target instance
J
such that
<
I, J
>
satisfy
Σ
Fact:
In general, for a given source instance
I,
there may be
no solutions
at all
or
there may be
multiple solutions
; in fact there may be
infinitely many solutions
10
Universal Solutions in Data Exchange
[Fagin, Kolaitis, Miller, Popa
–
ICDT 2003] introduced
universal solutions
as the “best” solutions in data exchange
By definition, a solution is
universal
if it has
homomorphisms
to all other solutions
Thus, it is a “most general” solution
Constants
: entries in source instances
Variables
(
labeled nulls
): entries besides constants in
target instances
Homomorphism
h: J
1
→
J
2
between target instances:
h(c) = c
, if
c
is a constant
If
P(a
1
,…,a
m
)
is in
J
1
,
, then
P(h(a
1
),…,h(a
m
))
is in
J
2
11
How to Obtain a Universal Solution?
Answer: Use our old friend the
chase
!
Theorem
[Fagin, Kolaitis, Miller, Popa
–
ICDT 2003]:
If there is a solution, then the chase produces a
universal solution
12
Standard schema mappings
[Fagin, Kolaitis, Miller, Popa
–
ICDT 2003] define a
weakly
acyclic set of tgds
[Deutsch, Tannen

ICDT 2003] have a slightly more
restrictive notion
Let a
standard schema mapping
be one specified by s

t tgds,
target egds, and a weakly acyclic set of target tgds.
Theorem
[Fagin, Kolaitis, Miller, Popa
–
ICDT 2003]:
For standard schema mappings, the chase runs in
polynomial time (data complexity)
13
Query Answering in Data Exchange
Schema
S
Schema
T
I
J
Σ
q
Question:
What is the semantics of target query
answering?
Definition:
The
certain answers
of a query
q
over
T
on
I
certain
(
q,I
) = ∩ {
q
(
J
):
J
is a solution for
I
}
Note:
It is the standard semantics in data integration
14
Computing the Certain Answers
Theorem
[Fagin, Kolaitis, Miller, Popa
–
ICDT 2003]:
Assume a standard schema mapping. Let
q
be a union of
conjunctive queries over the target.
If
I
is a source instance and
J
is a universal solution for
I
:
certain
(
q
,
I
) = the set of all “
null

free
” tuples in
q
(
J)
.
Hence,
certain
(
q
,
I
) is computable in polynomial time
1.
Compute a universal
solution
J
, using the chase, in
polynomial time
2.
Evaluate
q
(
J
) and remove tuples with nulls
15
Composing Schema Mappings
Given
M
12
= (
S
1
,
S
2
,
12
)
and
M
23
= (
S
2
,
S
3
,
23
)
, derive a
schema mapping
M
13
= (
S
1
,
S
3
,
13
)
that is “
equivalent
” to
the sequence
M
12
and
M
23
Schema
S
1
Schema
S
2
Schema
S
3
M
12
M
23
M
13
What does it mean for
M
13
to be “
equivalent
” to the
composition of
M
12
and
M
23
?
16
Semantics of Composition
13
has to have the property that:
<
I
1
,I
3
>
⊨
13
if and only if there exists
I
2
such that
<
I
1
,I
2
>
⊨
12
and <
I
2
,I
3
>
⊨
23
17
Result of the composition
Question:
If
M
12
and
M
23
are each specified by s

t tgds,
what language is needed for specifying the composition of
M
12
and
M
23
?
Answer:
[Fagin, Kolaitis, Popa, Tan
–
PODS 2004]:
second

order tgds
18
Second

Order Tgds
Definition:
Let
S
be a source schema and
T
a target schema.
A
second

order tuple

generating dependency
(SO

tgd) is a
formula of the form:
f
1
…
f
m
( (
x
1
(
1
1
))
…
(
x
n
(
n
n
)) ),
where
f
i
is a function symbol
i
is a conjunction of atoms over
S
and equalities of terms
i
is a conjunction of atoms from
T
Example:
f
(
e( Emp(e)
Mgr(e,
f(e)
)
e( Emp(e)
(
e=f(e)
)
SelfMgr(e) ) )
19
Composition and SO

Tgds
Theorem
[Fagin, Kolaitis, Popa, Tan
–
PODS 2004]:
The composition of any finite sequence of schema mappings
specified by s

t tgds can be specified by an SO

tgd
Conversely, every SO

tgd specifies the composition of a finite
sequence of mappings that are each specified by s

t tgds.
Recently [Arenas, Fagin, Nash
–
ICDT 2010] showed that
the sequence need only be of size 2
20
Composition with Target Constraints
[Arenas, Fagin, Nash
–
ICDT 2010] defined
s

t SO
dependencies
, which generalize SO tgds by allowing not only
target atoms but also equalities in the conclusion
Theorem
[Arenas, Fagin, Nash
–
ICDT 2010] :
•
The composition of any finite sequence of standard
schema mappings can be specified by an s

t SO
dependency (along with target egds and target tgds)
•
Conversely, every s

t SO dependency specifies the
composition of a finite sequence of standard schema
mappings
–
In fact, again, the sequence need only be of size 2
The chase procedure can be extended to schema mappings
specified by s

t SO dependencies, so that it produces
universal solutions in polynomial time (data complexity)
21
Conclusions
Dependencies now play a crucial role in data integration and
data exchange
We even have second

order dependencies, which have in
fact been implemented in IBM Infosphere Data Architect.
Dependency theory is alive and well!
22
Extra slides
23
The Smallest Universal Solution
Fact:
Universal solutions need not be unique
Question
:
Is there a “best” universal solution?
Answer:
[Fagin, Kolaitis, Popa
–
PODS 2003] took a
“
small is beautiful
” approach:
There is a
smallest
universal solution (if solutions exist); hence,
the most
compact
one to materialize
Definition:
The
core
of an instance
J
is the smallest
subinstance
J’
that is homomorphically equivalent to
J
Fact:
Every finite relational structure has a core
The core is unique up to isomorphism
24
Core: The smallest universal solution
Theorem
[Fagin, Kolaitis, Popa
–
PODS 2003]
:
All universal solutions have the same core
The core of the universal solutions is the smallest
universal solution
If the target constraints are egds, then the core is
polynomial

time computable (data complexity)
Theorem
[Gottlob and Nash
–
PODS 2006]:
If the target constraints are egds and a weakly acyclic set of
tgds, then the core is polynomial

time computable
25
Old Conclusions
Dependencies now play a crucial role in data integration and
data exchange
We even have second

order dependencies, which have in
fact been implemented in practice!
Lately, even probabilistic dependencies have been studied
[Dong, Halevy, Yu
–
VLDB 2007]
[Das Sarma, Dong, Halevy
–
SIGMOD 2008]
[Fagin, Kimelfeld, Kolaitis
–
ICDT 2010]
Probabilistic dependencies on probabilistic databases
Dependency theory is alive and well!
Comments 0
Log in to post a comment