A×B

nostrilshumorousInternet και Εφαρμογές Web

18 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

85 εμφανίσεις

Relational Algebra

SQL specifies what to retrieve but not
how to retrieve it.

Need a process to translate a
descriptive statement into a collection
of activities.

This is “behind the scenes”, but
important to know nonetheless.

If a CS student doesn’t know it


who
will?

SQL
Statement

High level logic
involving table
(relation)
manipulation

Mid level logic
involving loops
and comparisons

Low level logic
involving B
-
Tree,
index, hash table
access, etc.

Result

SQL queries relations and generates
a new relation.

Need something that manipulates
relations in order to create other
relations.

Need to do it efficiently.

Relational Algebra:

a collection of operations (activities) to
construct new relations from given
relations.

Typical process:

SQL


relational algebra operations (or
something similar)


mid/low level
logic


result

Reference: First few slides of
[
http://www.cs.wayne.edu/~shiyong/c
sc6710/slides/kiferComp_348761_ppt
11.ppt
], though the notation may be
intimidating.


Background:

Tuple

(n
-
tuple
): collection of
n

things
written as (
a
1
,
a
2
,…,
a
n
).

Typically a
tuple

represents an entity
in a relation (a row in a table).


Definition:

2 relations are
Union
-
compatible

if they
have same degree (number of attributes)
and the
i

th

attributes of each are defined
on same domains.

The first three definitions that follow
apply only to Union
-
compatible
relations.





Set operations:

Union

A


B is the set of elements
belonging to either A or B.




Venn Diagram




A


B

SQL Server:

SELECT *

FROM S

WHERE status
=
20

UNION


SELECT *


FROM S


WHERE status
= 50


Difference (Minus)
:

A


B is the set of elements
belonging to A but not to B.

Venn Diagram.




A


B

SQL Server uses
Except

instead of
Minus

Supplier table: Create a view defined
by that below (all suppliers not in
Paris).

SELECT *


FROM S


EXCEPT



SELECT *



FROM S



WHERE City = 'Paris'

Intersection

A


B is the set of elements belonging
to both A and B.

Venn Diagram




A


B

SQL Server:

SELECT *

FROM S

WHERE status > 20

INTERSECT


SELECT *


FROM S


WHERE status < 50


A
×
B

is the set of all elements of the
form (
x, y
) where
x

belongs to
A

and
y

belongs to
B
.


Cartesian Product
:

A = {a, b}

B = {1, 2, 3}

A
×
B =
{(a, 1), (a, 2), (a, 3), (b, 1),
(b, 2), (b, 3) }

Supplier table: Create a view defined
by

SELECT * FROM S CROSS JOIN SP

Selection (Restriction)

R where
predicate

where the
predicate is some condition that
evaluates to true or false.

A row subset of relation R.

Select * from R where
predicate
.


Let R be a relation and X, Y, …, Z be
among R’s attributes.

Then R[X, Y, …, Z] is the set of
elements from R restricted to
attributes from X, Y, … , Z with
duplicates removed.

Column subset of R

Select distinct
attribute list

from R



Projection

Join

Cartesian Product followed by a
selection

A
×
B where
predicate
.

SELECT * FROM S CROSS JOIN SP
where S.S# = SP.S#



Or

SELECT * FROM S, SP where
S.S#=SP.S#



A
join

B

Let
C
i
, … ,
C
j

be attribute names
common to A and B. Then A JOIN B
is


(A
×
B)[all attributes with duplicates

removed] where



A.C
i

=
B.C
i

and



:



:



A.C
j

=
B.C
j


Natural Join
:

Example, Consider
S join SP
.
Suppose S9 is a supplier but S9
supplies no parts, yet. An entry for S9
will NOT appear in
S join SP
.

S left
outerjoin

SP

will include every
row from the left table (S) with NULLS
filled in if there’s no matching S# in
the SP table.



Outer Join:


Inner Join

Conventional join operation but the
phrase is used if there’s the potential
for confusion with outer join.


Examples


select * from S inner join SP on
S.S#=SP.S#


select * from S left outer join SP on
S.S#=SP.S#

Get supplier names for suppliers who
supply part P2.


( ( SP where P# = ‘P2) join S) [
SName
]


1.
Find a row subset (Selection) of SP
where P#=‘P2’

2.
Do a natural join with S

3.
Find a column subset of the result
(project on
Sname
)

Get supplier names for suppliers who
supply at least one red part.


( ( ( P where color = ‘Red’) Join SP) [S#] Join S)
[
Sname
]


1.
Find a row subset (Selection) of P where
color=‘Red’

2.
Do a natural join with SP

3.
Project on S#

4.
Do a natural join with S

5.
Project on
SName



Get supplier names for suppliers who
do not supply part P2.


S[
Sname
] Minus ( ( SP where P# = ‘P2) join S)
[
Sname
]


1.
Project S on
SName

2.
Find a row subset of SP where P#=‘P2’

3.
Join the result of 2 with S

4.
Project the result of 3 on
Sname

5.
Calculate the difference between the result of 1 and the
result of 2


Let A be a relation of degree
m+n

and B be a
relation of degree
n

Visualize an
m+n

tuple

(entry) of A as a pair
(x, y) where

x

is an m
-
tuple

(1
st

m

attributes) and
y

is an
n
-
tuple

(last
n

attributes)

Also suppose the
n

attributes of B are
defined on the same domains as the last
n

attributes of A.

Then
C = A
dividedby

B

is a relation
consisting of m
-
tuples

x

where for all
y

in B
there is a pair (x, y) in A.


Division
:

NOTE
: only 5 operations: restriction,
projection, product, union, and
difference are primitive.



A

B = A


(A


B)

A
dividedby

B = A[x]


(A[x]
×
B


A)[x].


SP
dividedby

B

B is: 1) P1 or 2) P2 and P4 or 3) P1
through P6

Supplier numbers that supply all parts in B

SP
Dividedby

B is: 1) S1 and S2 or 2) S1
and S4 or 3) S1


Example:

S#

P#

S1

P1

S1

P2

S1

P3

S1

P4

S1

P5

S1

P6

S2

P1

S2

P2

S2

P3

S3

P2

S3

P3

S4

P2

S4

P3

S4

P4

A
dividedby

B = A[x]


(A[x]
×
B


A)[x].

Apply to previous example where A =
SP.

B = P1

B = P2 and P4

B = P1 thru P6

A[x]

S1 thru S4

S1 thru S4

S1 thru S4

A[x]
×
B

(S
i
, P1) i = 1…4

(S
i
, P2) i = 1…4

(S
i
, P4) i = 1…4

(S1, Pi) i = 1…6

(S2, Pi) i = 1…6

(S3, Pi) i = 1…6

(S4, Pi) i = 1…6
=
A[x]
×
B
-

A

(S3, P1)

(S4, P1)

(S2, P4)

(S3, P4)

(S2, P
i
) i=3,4,5,6

(S3, P
i
) i=1,3,4,5,6

(S4, P
i
) i=1,3,6

(A[x]
×
B


A)[x]

S3 and S4

S2 and S3

S2, S3, S4

A[x]
-
(A[x]
×

B


A)[x]

S1 and S2

S1 and S4

S1

Get supplier names for suppliers who
supply all parts.

( ( SP
dividedby

P[P#]) Join S)[
Sname
]

or

( ( SP[S#]
-

(SP[S#] times P[P#]


SP)[S#])
Join S)[
Sname
]


Recall
previous n
otes in SQL where finding
ALL of something required double “not
exist”
subqueries
.


Example

Select S.name

From S, SP

Where S.S# = SP.S#

And SP.P# = ‘P2’


Suppose 100 suppliers and 10,000 shipments

Consider


(S times SP) where S.S#=SP.S# and SP.P#=’P2’

vs

(S times (SP where SP.P#=’P2’) where S.S# = SP.S#


First option creates a VERY large intermediate table.


Query Optimization
: Introduction to
Database Systems by C. J. Date 8
th

ed

Slide 4 of
[
http://www.cs.wayne.edu/~shiyong/csc6710/
slides/kiferComp_348761_ppt11.ppt
]

Internal representation of a relational algebra
expression (e.g. a query tree)


Stages of query optimization:

S

SP

SP

S

Where SP.P# = ‘P2

join

Project on name

join

Where SP.P# = ‘P2

Project on name


(S join (SP where P#=’P2’))[name]

((S join SP) where P#=’P2) [name]

Canonical forms

Set of queries, called C, such that for
every possible query there is a query in
C that is equivalent.


Reason for this?

Get suppliers who supply P2 can be
expressed in 8 different ways. Ideally,
the efficiency should not depend on the
query form.



Where p or (q and r)


where (p or q)
and (p or r)
--

Conjunctive normal form


(A where p) where q


A where (p and q)


(A[attributes]) where p


(A where
p)[attributes]


A.F1 > B.F2 and B.F2 = 3


A.F1 > 3



Examples:

Consider: Get suppliers not in London
or Paris.


SELECT
S#,
Sname
, Status, City

FROM
dbo.S

WHERE (City <> 'Paris') OR


(City <> 'London')

Does this yield the expected result?




NOT (p and q)


NOT p or NOT q
(
DeMorgan’s

laws


NOT (p or q)


NOT p and NOT q
(
DeMorgan’s

laws


If system knew that SP.P# is a foreign key
matching a primary key P.P# in P then


(SP join P) [S#]


SP[S#]


Query optimizers do not optimize


just try to find “reasonably good”
evaluation strategies. Might take
longer to find optimal strategy than to
do brute force method.

[
http://www.cs.wayne.edu/~shiyong/c
sc6710/slides/kiferComp_348761_ppt
11.ppt
]


Summaries and References:


MySQL's

query optimization engine
isn't always the most efficient when it
comes to
subqueries
.


often a good idea to convert a
subquery

to a join

[
http://www.databasejournal.com/features/mysq
l/article.php/3813821/Five
-
Query
-
Optimizations
-
in
-
MySQL.htm
]


Some additional information

[
http://avid.cs.umass.edu/courses/645/s2
009/lectures/Lec12
-
QueryOptimizer
-
x6.pdf
]



SQL Server: Query Analysis:

Select a database

Create a query window (right click on
the database and select
new
query)
and enter an query

Query


Display Estimated
Execution Plan

Query


Include Actual Execution
Plan.

When the query is executed there will be
an execution plan tab on the result pane.

Some example queries to analyze:

select * from S

select * from S order by S#

select * from S order by status

select * from SP where Qty = 200

select * from S, SP where S.S#=SP.S# and P#='P2'

select * from S where S# in (select S# from SP where P#='P2')

S2 Replacement View

Any Red part

View.

Only Red Parts1

view.

Do some views from the genealogy database (see next slide for family
tree)

Mike

Sarah

Tom

Ellen

Jeanne

Joseph

Tim

Daniel

Ira

Brian

Kathy

Jim

Alice

John

Joanne

Jason

Sally

Linda

Intro to SQL Server Optimization

[
http://www.c
-
sharpcorner.com/UploadFile/john_cha
rles/QueryoptimizationinSQLServer20
0512112007154303PM/Queryoptimiz
ationinSQLServer2005.aspx
]

Some useful terms:

[
http://technet.microsoft.com/en
-
us/library/cc917672.aspx
]

Clustered index

A clustered index is organized as a B
-
tree, where the
nonleaf

nodes are index
pages and the leaf nodes are data
pages.

A table can have only 1.


Nonclustered

index

A
nonclustered

index is organized as a
B
-
tree. Unlike a clustered index, a
nonclustered

index consists of only
index pages. The leaf nodes in a
nonclustered

index are not data pages,
but contain row locators for individual
rows in the data pages.

Hash match inner join
:
[
http://blogs.msdn.com/craigfr/archive/2006
/08/10/687630.aspx
].

Left Semi Join


returns rows from one table that would
join with another table without
performing a complete join

[
http://blogs.msdn.com/craigfr/archive/20
06/07/19/671712.aspx
]

Left Anti Semi Join

Similar to above but returns those that
would NOT join.


Stream Aggregates
:

[
http://blogs.msdn.com/craigfr/archive/
2006/09/13/752728.aspx
]

Index scan
vs

index seek
:
[
http://blogs.msdn.com/craigfr/archive/
2006/06/26/647852.aspx

Spooling:

Temporary “caching” of data for use
by another activity:
[
http://technet.microsoft.com/en
-
us/library/ms191221.aspx
]

Summaries at
[
http://blogs.msdn.com/craigfr/attach
ment/8508493.ashx
] and
[
http://technet.microsoft.com/en
-
us/library/ms191158.aspx
]

Construct query plans from low level
routines and make some attempt at
optimizing.

True optimization is sometimes more
costly than the savings.


Generating
low level routines

Brute Force
:

Cardinality of R is m, cardinality of S is n


For
i

= 1 to m do



For j = 1 to n do




If R[
i
].C = S[j].C then





Add joined
tuple
, R[
i
]:S[j]



to result table



Join of relations R and S (on attribute
C): Intro to Database Systems by C. J.
Date

Time is proportional to m*n

Worse

depending on many factors, number of
disk I/Os could be proportional to m*n

Potentially large if tens or hundreds of
thousands of
tuples

in each table or if
tuples

are very large and require a lot
of disk I/Os.

Assume index on S.C

For
i

= 1 to m do

{


Find all records S[j], in S, where S[j].C=R[
i
].C


//requires a search through the index which is


//far less work than looking through all records
in S


Join R[
i
] with corresponding
tuples

and add to
result table.

}


Indexed method

Assume (or build) a hash table on S.C

For
i

= 1 to m do

{


k = hash(R[
i
].C)


search hash list at H[k] looking for
matches to R[
i
].C


if found, add joined
tuple

to result
table.

}


Hash Table:


R… relation; A…
attributes
; R[i]
tuple

i of R;

Chosen

is

a
boolean

vector
, one bit for
each

R[i];
initially

all False.


Nested Loop

:

For
i

= 1 to m do


//look for alternative


If Chosen[
i
] is true then


{



add R[
i
].A to result table



for j = i+1 to n




if R[j].A = R[
i
].A then set Chosen[j]=true


}


Projection:

For
i

= 1 to m do

{


dup = false


k = hash(R[
i
].A)


search all
tuples

at


H[k], looking for R[
i
].A;

if not found add R[
i
].A to
tuples

at H[k]

}



Gather all
tuples

from hash table to store
into result table.


Hashing: using a hash table H

CreateView

P’


As Select * from P where Color = ‘Red’ and Weight < 20



CreateView

SP’


As Select * from SP where Qty > 200



CreateView

S’


As Select * from S where City = ‘London’



Select name from S’, SP’, P’


Where S’.S# = SP’.S# and



SP’.P# = P’.P#



Could create these three views concurrently with multiprocessors or
multi
-
core processors.


Find London suppliers who supply red
parts with weight < 20 in quantities of
200