Relational Algebra
SQL specifies what to retrieve but not
how to retrieve it.
Need a process to translate a
descriptive statement into a collection
of activities.
This is “behind the scenes”, but
important to know nonetheless.
If a CS student doesn’t know it
–
who
will?
SQL
Statement
High level logic
involving table
(relation)
manipulation
Mid level logic
involving loops
and comparisons
Low level logic
involving B

Tree,
index, hash table
access, etc.
Result
SQL queries relations and generates
a new relation.
Need something that manipulates
relations in order to create other
relations.
Need to do it efficiently.
Relational Algebra:
a collection of operations (activities) to
construct new relations from given
relations.
Typical process:
SQL
relational algebra operations (or
something similar)
mid/low level
logic
result
Reference: First few slides of
[
http://www.cs.wayne.edu/~shiyong/c
sc6710/slides/kiferComp_348761_ppt
11.ppt
], though the notation may be
intimidating.
Background:
Tuple
(n

tuple
): collection of
n
things
written as (
a
1
,
a
2
,…,
a
n
).
Typically a
tuple
represents an entity
in a relation (a row in a table).
Definition:
2 relations are
Union

compatible
if they
have same degree (number of attributes)
and the
i
th
attributes of each are defined
on same domains.
The first three definitions that follow
apply only to Union

compatible
relations.
Set operations:
Union
A
B is the set of elements
belonging to either A or B.
Venn Diagram
A
B
SQL Server:
SELECT *
FROM S
WHERE status
=
20
UNION
SELECT *
FROM S
WHERE status
= 50
Difference (Minus)
:
A
–
B is the set of elements
belonging to A but not to B.
Venn Diagram.
A
–
B
SQL Server uses
Except
instead of
Minus
Supplier table: Create a view defined
by that below (all suppliers not in
Paris).
SELECT *
FROM S
EXCEPT
SELECT *
FROM S
WHERE City = 'Paris'
Intersection
A
B is the set of elements belonging
to both A and B.
Venn Diagram
A
B
SQL Server:
SELECT *
FROM S
WHERE status > 20
INTERSECT
SELECT *
FROM S
WHERE status < 50
A
×
B
is the set of all elements of the
form (
x, y
) where
x
belongs to
A
and
y
belongs to
B
.
Cartesian Product
:
A = {a, b}
B = {1, 2, 3}
A
×
B =
{(a, 1), (a, 2), (a, 3), (b, 1),
(b, 2), (b, 3) }
Supplier table: Create a view defined
by
SELECT * FROM S CROSS JOIN SP
Selection (Restriction)
R where
predicate
where the
predicate is some condition that
evaluates to true or false.
A row subset of relation R.
Select * from R where
predicate
.
Let R be a relation and X, Y, …, Z be
among R’s attributes.
Then R[X, Y, …, Z] is the set of
elements from R restricted to
attributes from X, Y, … , Z with
duplicates removed.
Column subset of R
Select distinct
attribute list
from R
Projection
Join
Cartesian Product followed by a
selection
A
×
B where
predicate
.
SELECT * FROM S CROSS JOIN SP
where S.S# = SP.S#
Or
SELECT * FROM S, SP where
S.S#=SP.S#
A
join
B
Let
C
i
, … ,
C
j
be attribute names
common to A and B. Then A JOIN B
is
(A
×
B)[all attributes with duplicates
removed] where
A.C
i
=
B.C
i
and
:
:
A.C
j
=
B.C
j
Natural Join
:
Example, Consider
S join SP
.
Suppose S9 is a supplier but S9
supplies no parts, yet. An entry for S9
will NOT appear in
S join SP
.
S left
outerjoin
SP
will include every
row from the left table (S) with NULLS
filled in if there’s no matching S# in
the SP table.
Outer Join:
Inner Join
Conventional join operation but the
phrase is used if there’s the potential
for confusion with outer join.
Examples
select * from S inner join SP on
S.S#=SP.S#
select * from S left outer join SP on
S.S#=SP.S#
Get supplier names for suppliers who
supply part P2.
( ( SP where P# = ‘P2) join S) [
SName
]
1.
Find a row subset (Selection) of SP
where P#=‘P2’
2.
Do a natural join with S
3.
Find a column subset of the result
(project on
Sname
)
Get supplier names for suppliers who
supply at least one red part.
( ( ( P where color = ‘Red’) Join SP) [S#] Join S)
[
Sname
]
1.
Find a row subset (Selection) of P where
color=‘Red’
2.
Do a natural join with SP
3.
Project on S#
4.
Do a natural join with S
5.
Project on
SName
Get supplier names for suppliers who
do not supply part P2.
S[
Sname
] Minus ( ( SP where P# = ‘P2) join S)
[
Sname
]
1.
Project S on
SName
2.
Find a row subset of SP where P#=‘P2’
3.
Join the result of 2 with S
4.
Project the result of 3 on
Sname
5.
Calculate the difference between the result of 1 and the
result of 2
Let A be a relation of degree
m+n
and B be a
relation of degree
n
Visualize an
m+n
tuple
(entry) of A as a pair
(x, y) where
x
is an m

tuple
(1
st
m
attributes) and
y
is an
n

tuple
(last
n
attributes)
Also suppose the
n
attributes of B are
defined on the same domains as the last
n
attributes of A.
Then
C = A
dividedby
B
is a relation
consisting of m

tuples
x
where for all
y
in B
there is a pair (x, y) in A.
Division
:
NOTE
: only 5 operations: restriction,
projection, product, union, and
difference are primitive.
A
B = A
–
(A
–
B)
A
dividedby
B = A[x]
–
(A[x]
×
B
–
A)[x].
SP
dividedby
B
B is: 1) P1 or 2) P2 and P4 or 3) P1
through P6
Supplier numbers that supply all parts in B
SP
Dividedby
B is: 1) S1 and S2 or 2) S1
and S4 or 3) S1
Example:
S#
P#
S1
P1
S1
P2
S1
P3
S1
P4
S1
P5
S1
P6
S2
P1
S2
P2
S2
P3
S3
P2
S3
P3
S4
P2
S4
P3
S4
P4
A
dividedby
B = A[x]
–
(A[x]
×
B
–
A)[x].
Apply to previous example where A =
SP.
B = P1
B = P2 and P4
B = P1 thru P6
A[x]
S1 thru S4
S1 thru S4
S1 thru S4
A[x]
×
B
(S
i
, P1) i = 1…4
(S
i
, P2) i = 1…4
(S
i
, P4) i = 1…4
(S1, Pi) i = 1…6
(S2, Pi) i = 1…6
(S3, Pi) i = 1…6
(S4, Pi) i = 1…6
=
A[x]
×
B

A
(S3, P1)
(S4, P1)
(S2, P4)
(S3, P4)
(S2, P
i
) i=3,4,5,6
(S3, P
i
) i=1,3,4,5,6
(S4, P
i
) i=1,3,6
(A[x]
×
B
–
A)[x]
S3 and S4
S2 and S3
S2, S3, S4
A[x]

(A[x]
×
B
–
A)[x]
S1 and S2
S1 and S4
S1
Get supplier names for suppliers who
supply all parts.
( ( SP
dividedby
P[P#]) Join S)[
Sname
]
or
( ( SP[S#]

(SP[S#] times P[P#]
–
SP)[S#])
Join S)[
Sname
]
Recall
previous n
otes in SQL where finding
ALL of something required double “not
exist”
subqueries
.
Example
Select S.name
From S, SP
Where S.S# = SP.S#
And SP.P# = ‘P2’
Suppose 100 suppliers and 10,000 shipments
Consider
(S times SP) where S.S#=SP.S# and SP.P#=’P2’
vs
(S times (SP where SP.P#=’P2’) where S.S# = SP.S#
First option creates a VERY large intermediate table.
Query Optimization
: Introduction to
Database Systems by C. J. Date 8
th
ed
Slide 4 of
[
http://www.cs.wayne.edu/~shiyong/csc6710/
slides/kiferComp_348761_ppt11.ppt
]
Internal representation of a relational algebra
expression (e.g. a query tree)
Stages of query optimization:
S
SP
SP
S
Where SP.P# = ‘P2
join
Project on name
join
Where SP.P# = ‘P2
Project on name
(S join (SP where P#=’P2’))[name]
((S join SP) where P#=’P2) [name]
Canonical forms
Set of queries, called C, such that for
every possible query there is a query in
C that is equivalent.
Reason for this?
Get suppliers who supply P2 can be
expressed in 8 different ways. Ideally,
the efficiency should not depend on the
query form.
Where p or (q and r)
where (p or q)
and (p or r)

Conjunctive normal form
(A where p) where q
A where (p and q)
(A[attributes]) where p
(A where
p)[attributes]
A.F1 > B.F2 and B.F2 = 3
A.F1 > 3
Examples:
Consider: Get suppliers not in London
or Paris.
SELECT
S#,
Sname
, Status, City
FROM
dbo.S
WHERE (City <> 'Paris') OR
(City <> 'London')
Does this yield the expected result?
NOT (p and q)
NOT p or NOT q
(
DeMorgan’s
laws
NOT (p or q)
NOT p and NOT q
(
DeMorgan’s
laws
If system knew that SP.P# is a foreign key
matching a primary key P.P# in P then
(SP join P) [S#]
SP[S#]
Query optimizers do not optimize
–
just try to find “reasonably good”
evaluation strategies. Might take
longer to find optimal strategy than to
do brute force method.
[
http://www.cs.wayne.edu/~shiyong/c
sc6710/slides/kiferComp_348761_ppt
11.ppt
]
Summaries and References:
“
MySQL's
query optimization engine
isn't always the most efficient when it
comes to
subqueries
.
”
often a good idea to convert a
subquery
to a join
[
http://www.databasejournal.com/features/mysq
l/article.php/3813821/Five

Query

Optimizations

in

MySQL.htm
]
Some additional information
[
http://avid.cs.umass.edu/courses/645/s2
009/lectures/Lec12

QueryOptimizer

x6.pdf
]
SQL Server: Query Analysis:
Select a database
Create a query window (right click on
the database and select
new
query)
and enter an query
Query
Display Estimated
Execution Plan
Query
Include Actual Execution
Plan.
When the query is executed there will be
an execution plan tab on the result pane.
Some example queries to analyze:
select * from S
select * from S order by S#
select * from S order by status
select * from SP where Qty = 200
select * from S, SP where S.S#=SP.S# and P#='P2'
select * from S where S# in (select S# from SP where P#='P2')
S2 Replacement View
Any Red part
View.
Only Red Parts1
view.
Do some views from the genealogy database (see next slide for family
tree)
Mike
Sarah
Tom
Ellen
Jeanne
Joseph
Tim
Daniel
Ira
Brian
Kathy
Jim
Alice
John
Joanne
Jason
Sally
Linda
Intro to SQL Server Optimization
[
http://www.c

sharpcorner.com/UploadFile/john_cha
rles/QueryoptimizationinSQLServer20
0512112007154303PM/Queryoptimiz
ationinSQLServer2005.aspx
]
Some useful terms:
[
http://technet.microsoft.com/en

us/library/cc917672.aspx
]
Clustered index
A clustered index is organized as a B

tree, where the
nonleaf
nodes are index
pages and the leaf nodes are data
pages.
A table can have only 1.
Nonclustered
index
A
nonclustered
index is organized as a
B

tree. Unlike a clustered index, a
nonclustered
index consists of only
index pages. The leaf nodes in a
nonclustered
index are not data pages,
but contain row locators for individual
rows in the data pages.
Hash match inner join
:
[
http://blogs.msdn.com/craigfr/archive/2006
/08/10/687630.aspx
].
Left Semi Join
“
returns rows from one table that would
join with another table without
performing a complete join
”
[
http://blogs.msdn.com/craigfr/archive/20
06/07/19/671712.aspx
]
Left Anti Semi Join
Similar to above but returns those that
would NOT join.
Stream Aggregates
:
[
http://blogs.msdn.com/craigfr/archive/
2006/09/13/752728.aspx
]
Index scan
vs
index seek
:
[
http://blogs.msdn.com/craigfr/archive/
2006/06/26/647852.aspx
Spooling:
Temporary “caching” of data for use
by another activity:
[
http://technet.microsoft.com/en

us/library/ms191221.aspx
]
Summaries at
[
http://blogs.msdn.com/craigfr/attach
ment/8508493.ashx
] and
[
http://technet.microsoft.com/en

us/library/ms191158.aspx
]
Construct query plans from low level
routines and make some attempt at
optimizing.
True optimization is sometimes more
costly than the savings.
Generating
low level routines
Brute Force
:
Cardinality of R is m, cardinality of S is n
For
i
= 1 to m do
For j = 1 to n do
If R[
i
].C = S[j].C then
Add joined
tuple
, R[
i
]:S[j]
to result table
Join of relations R and S (on attribute
C): Intro to Database Systems by C. J.
Date
Time is proportional to m*n
Worse
depending on many factors, number of
disk I/Os could be proportional to m*n
Potentially large if tens or hundreds of
thousands of
tuples
in each table or if
tuples
are very large and require a lot
of disk I/Os.
Assume index on S.C
For
i
= 1 to m do
{
Find all records S[j], in S, where S[j].C=R[
i
].C
//requires a search through the index which is
//far less work than looking through all records
in S
Join R[
i
] with corresponding
tuples
and add to
result table.
}
Indexed method
Assume (or build) a hash table on S.C
For
i
= 1 to m do
{
k = hash(R[
i
].C)
search hash list at H[k] looking for
matches to R[
i
].C
if found, add joined
tuple
to result
table.
}
Hash Table:
R… relation; A…
attributes
; R[i]
tuple
i of R;
Chosen
is
a
boolean
vector
, one bit for
each
R[i];
initially
all False.
Nested Loop
:
For
i
= 1 to m do
//look for alternative
If Chosen[
i
] is true then
{
add R[
i
].A to result table
for j = i+1 to n
if R[j].A = R[
i
].A then set Chosen[j]=true
}
Projection:
For
i
= 1 to m do
{
dup = false
k = hash(R[
i
].A)
search all
tuples
at
H[k], looking for R[
i
].A;
if not found add R[
i
].A to
tuples
at H[k]
}
Gather all
tuples
from hash table to store
into result table.
Hashing: using a hash table H
CreateView
P’
As Select * from P where Color = ‘Red’ and Weight < 20
CreateView
SP’
As Select * from SP where Qty > 200
CreateView
S’
As Select * from S where City = ‘London’
Select name from S’, SP’, P’
Where S’.S# = SP’.S# and
SP’.P# = P’.P#
Could create these three views concurrently with multiprocessors or
multi

core processors.
Find London suppliers who supply red
parts with weight < 20 in quantities of
200
Comments 0
Log in to post a comment