סיכום קורס ערוך ( 4 MB)

flameluxuriantΔιαχείριση Δεδομένων

16 Δεκ 2012 (πριν από 4 χρόνια και 10 μήνες)

701 εμφανίσεις


Data Base
-
67506

(2010)

Introductio
n

Database:
A collection of data

Properties of the DB

-
ACID
:



Atomicity:
all of the tasks of a user transaction are performed or
none of them are



Consistency:
the database remains in a consistent state



Isolation:
a datab
ase user appears to be using the system as a
single user



Durability:
once the user has been notified of success, the
transaction will persist, and not be undone

Entity
-
Relationship Diagrams

The
Goal

of modeling is to translate informal requirements
of the
DB
to a
precise diagram

(Conceptual Database Design)
. This diagram can then be
translated into the desired
data model

(
Logical Database Design
)
, to
allow
data to be stored in a database
.

ER
-
diagrams
formal semantics
:


Entity

(
תושי
): An object in the world
that can be distinguished from
other objects
.


Entity set
(
תויושי תצובק
):


A set of similar entities


Attributes
(
תונוכת
):
An

attribute contains a single piece of
information

about the entity set and is true to all entities it
describe.


Key
:
A minimal set
of attributes that uniquely identify an
entity.


Relationship Set


(
םירשק תצובק
): Set of similar relationships

that
a
ssociat
e

two or more entities
. An entity can participate more then
once in a relationship.

A relationship set can have an attribute bu
t
it
cannot be used for identification!




n
-
ary Relationship
:
An
n
-
ary relationship set
R
involves
exactly
n
entity sets: E
1
, …, E
n
,

where
R


E
1
x …x E
n
.

In other word in a relationship set of N entities, each one of
the N entities participating
must

appear i
n the n

ary.



M
any

to
many relationship
:
A member of E may be
connected by R to any number of members from F, and vice
versa
.




One to many

relationship
:
E may be connected to
at most
one

members from F
, but F can be connected to
any number
of members from

E
.




One to one

relationship
:

E may be connected to at most one
member from F
, and vice versa
.



Multiplicities in multi
-
way Relationships
:


For each any tuple of relationship
e
1
,…,e
n
,f
1
,…f
i
-
1
,f
i+1
,...,f
m
,
there is at most one
f
i

connected to them by R.



D
egree Constraints
:
set limit

on the number of entities that
can be connected to a single entity via a relationship set
.

In
th
is example
, a mov
i
e has at most 10 stars
.




Referential Integrity
:

each entity in E must participate
exactly one time

in a

relation
ship with an entity in F
.

Equal to the constraint: =1




Subclass:

Defines a hierarchy between entity sets
.

Every
entity in B or

C belongs to A
, but there
may be entities in A
that do not belong to B or to C
.
There may be entities that
belong to both B a
nd C


Weak Entity Set:

A weak entity set E (
הגולפ
)
is

sub
-
units

of elements of F

(
דודג
)
and
is not unique without
taking F into consideration!

We say t
he relationship set
ל תכייש

is the
supporting relationship
for
הגולפ
,
notice


it's always connected with a round arrow to the entity
.

Weak
entity set
can participate in additional (non
-
supporting)

relationships
.

ER Diagrams to Relational Schemas
:



Create a relation with the name of
each

Entity.



Create a table with the name of
each
relationship set

(without
constraints)
.



Translating one to many relations
hip:


o

Film(
title,

year
, id
)

o

Director(
id
, name)



Translating weak entity set:

o

The

relation for weak entity set includes its own keys and
the keys of its supporting entity sets
.

o

No relation is created for the supporting relationship sets



Translating ISA:



E
/R Style Conversion:

A relation for each entity set.
Child
relation has parent's primary key.



Object
-
Oriented Approach
:
A relation for each possible
combinations of how entities may appear in the entity sets.

For example:

MoviePerson(
id
, address, name)



M
oviePersonActor(
id
, address, name, picture)



MoviePersonDirector(
id
, address, name)




MoviePersonActorDirector(
id
, address, name, picture)



Null Value Approach
:
A single relation for containing all
values

(
Possible only if NULL values are allowed


not
in the
pure relational model
)




The Relational Model
:

Logical Structure of the Data:

o

Relation
:

a relation is a table
, it
has two parts: schema and
instance
. For example:

Movies
.

o

Schema
:
these are the column in the table. For example:
Movies(title, year, le
ngth, genre)

o

Attributes
:
title

(primary key)
, year, length, genre

o

I
nstance
/tuple

)המושר(
:

these are the rows in the table, there
can

be

no duplicate

and
no null values
. For example:
(“Follow that
Bird”, 1985, 90, children)

Relational algebra (RA)

RA

is a (mathematical) query language for
the Relational Model.

Basic operators:

-

Project
ion (
הלטה
)
:


A1,…,An
(R)

Projection returns a new relation that contains only the columns

A
1
,…,A
n

from the original relation

R

, duplicate are deleted
. In a
projection an attribute can appear more then once and it's
indifferent to the order
of attributes.


A1,…,An
(
)
.

If there are
n

instances in the table then:
min |

A1,…,An
(R)|=1, max|

A1,…,An
(R)|=
n

-

Selection (
הריחב
):

C
(R)

C is a Boolea
n condition

(can contain
)

over a
single

tuple
.

min
|

C
(R)|=0, max|

C
(R)|=
n
.


-

Union (
דוחיא
) : R


S

Union can only be performed between
compatible
relations
, hence
same number of fields

with the same name and type.

Result contains
all tuples in at least one of the relations

and contains no dup. if |R|=n,
|S|=m then:
min|R


S|=max(n,m), max|R


S|=n+m.


-

Set Difference
(
שרפה
):

R


S

It c
an only be performed between relations that are compatible
.
R
esult contains the tuples from R, not in S
.

Notice:


I
f |R|=n, |S|=m then:
min|R
-
S
|=n
-
m, max|R
-
S|=n.

-

Cartesian Produc
t (
תיזטרק הלפכמ
): RxS

In the r
esult
each tuple of R is joined to all the tuples of S.

The
result size is always
.

If R and S has the same attribute A then
it will appear twice (R.A, S.A).

Notice:
.

Some extra operators:

-

Renaming (
םש יוניש
):

R(A1,…,An)
(E)

Takes the relational
-
algebra expression
E
, and returns a relation
called
R

with renamed attributes A
1
,…, A
n

(all attribute of E have to
appear in the original order).

-

Intersection (
ךותיח
): R


S

=R
-
(R
-
S)

It's p
erformed between com
patible relations
.
Result contains the
tuples that appear in R
and

S
. min| R


S|=0, max| R


S|= min(n,m).

-

Join (
ףוריצ
):
Join is commutative and associative
, but Cross Product is not
.


o

Condition
al
Join (
יאנת לע ףוריצ
):


o

Equijoin

(
ויוויש ףוריצ
ן
)
:
special conditional join

where C must be
an
equality

condition (with only
)
, hence
and in the
result the second attribute (
) is dropped
automatically
.

o

Natural Join (
יעבט ףוריצ
)
:

This is a special equ
ijoin

that
join only the field with the same name:
. If there are no
common attributes between R and S then

R


. if all

attributes are exactly the

same then

.






-

Division
(
קוליח
):

R


S

Division c
an be

preformed only if all the attributes

in S

appear in R.

The
result is the result of

w
ith all the attributes

in R that are

not

in S
.
F
or example:








Equivalences Among RA Expressions
:


We say that expressions E1 and E2 are
equivalent

if these expressions
always return the same answers
.


For example:


A
R



A
S


A
(R


S)

easy to see two way containment.

On the other hand,

A
R



A
S

A
(R


S)
, because

A
(R


S)



A
R



A
S

but

A
R



A
S



A
(R


S)
.

E1 and E2

can

be equivalent
even
if they
have

different relations
, for
example (R
-
R)UT

T.

But if the E1

contain a Join

then E2 must contain
the same relation
s

to be equivalent.


Equivalences for Optimiz
ation
-

To optimize our query we would like to
a
pply selection and projection as
soon as possible

and d
etermine t
he best order of computing the joins
.

Projection rules:

for
and y that is in common to R and S.

(R


S)


(R)



(S)


(R


S)



(

(R)



(S))

Selection rules:


C1


(E)



C1
(

C2
(E))


If
c

contains attributes only

of the relations

R

we can push it in

C
(R


S)



C
(R)


S
.

Ex
ample:

A

B<4

C>5

A,B,C,D
(R(A,B,C)


S(C,D)


T(D,E))




A
(

A,D
(

A,C

B<4

C>5
R(A,B,C)



C>5
S(C,D))


D
T(D,E))

Independence of RA Operators
-

The 5 basic operators

are
independent
. It

means that
they cannot be
expressed by other operators.

Proof:

Let E be a
n expression that contain all 5 operators. Let us
there does not
exist an expression E’ that does not use

one of the operator and E

E'.

Projection
: the only operator that can reduce that number of columns.

Union
: let R = <0> and S=

<1>

the RUS= <0,1>, any
other expression with no
union with return only one tuple.

Difference:

A query is monotonic

if adding tuples to the relations can
only add

tuples to the result (and not decrease the size of the result)

hence because all operators are monotonic except diffe
rence we can
deduce that it's interdependent.

Cross Product and Selection are missing…










SQL
-
postgresQL

postgresQL is a
Data Base Management System (DBMS):
A software
system that facilitates the creation, maintenance and use of an electronic
databas
e
.

SQL

command:

Creating a table:



C
onstraints

type on a tables
:



Not Null



Default Values

, for example
:
Gender CHAR(1) default(‘F’)
;



Unique
-

meaning n
o duplicate of values

(not including Null values!), this
constrain can also be applied on more than on
e attribute

and then the
set

of attribute cannot be duplicated
. N
otice that a Unique const
rain
on a pair is weaker

than a Unique constrain on each attribute
separately.




Primary Key
= Unique

+

Not Null.

There can be only on primary key per
table but more
then one attribute can be the primary key, hence a set
of attributes can be Unique and not null.



Foreign Key



pointer to

an attribute in

different
(existing)
table
, the
attribute we point on must be a primary key in the other table.

Syntax:

Foreign Key (
field1) References T2(field2);

If field1 in T1=field2 in T2 then we can drop (field2).

We can add "
On delete cascade
" if we want the tuples in T1 with
field1 to be delete on the delete of field1 in T2. If no such constrain
added then action of deleting fie
ld1 in T2 will be denied.

Warning:

be carful not to create cyclic constrain with foreign keys

and
then we won't be able to add new data to both tables
.



Check Constraints

-

A check condition is a Boolean
condition on
either a specific column or multiply col
umn but not on specific row.

Deleting a table:

DROP TABLE
table
;


Modifying Tables:

ALTER TABLE

table

ADD

(attribute name+type) /
DROP

(
attribute name
).

Inserting a row:

INSERT INTO

table

VALUES
(

);

The values
are specified for all columns and in the order
defined by the
table
.

Deleting a row:

DELETE FROM
Table

[
WHERE

Condition
] (optional);

Update a row:

UPDATE
Table

SET
Field1=value1,,,FieldN=valueN

WHERE

Condition

View
:

A view is a
virtual table

containg the result of

a query
. C
hangin
g a
table

automaticall
y changes the view
.

Once
a view is defined
, you can use
it in a query

(as if it's a relation).

CREATE VIEW

<view
-
name> AS <
query
>

View is good for:
Simplifying complex queries
,

Security issues

(
grant
SELECT on ViewTable

to joe;

)

It is possible to make som
e m
odifications
on a view
only when the view is
updatable
, however the modification only affect the underlying table and
not the view itself, hence
when we deleting or updating, only row from the
view table will be modified in the DB
.



SQL query structure

The query structure in SQL is limited to a certain structure which gives
the DB the chance optimize the query.
A query ca
n contain the following
clauses (

Select and from are
obligatory
):


SELECT [Distinct]

Attributes


:

Few select options:



S
elect all columns


SELECT*



Rename selected columns
-

SELECT S.sname
AS

Sailors_Name



Apply funcation
-

SELECT (age
-
5)*2



Can contain subquery
-

the subquery

must return at most one
value for each row
of

the outer query
, the subquery can also

return null. Not
ice that if no name was specified for the
subquery the name of the column will be 'Select …'



Notice

the distinct also delete the rows that have null in the
same column



FROM
relations

[NewName]


Cross product



C
an rename the relation here
.



Ca
n also contain subquery, the query in the FROM clause must
be renamed with a range variable :


FROM Sailors S,(SELECT MAX(S2.age)









FROM Sailors S2)
M;

Notice

that this subquery cannot use S!



Conditional Join
:
R1

JOIN

R2

ON

C

(boolean exp)



Equi
-
J
oin
:
R1

JOIN

R2

USING

attribute



Natural Join
:
R1

NATURAL JOIN

R2




Left Outer Join

:
R1

NATURAL LEFT OUTER JOIN

R2
,

the result wi
ll contain the

join result+
all the tuples in R
1

that
were not joined with R2 with null

values
.




Right Outer Join
:
R1

NATURAL
R
IGHT

OUTER JOIN

R2
,


the result will contain the join result+
all the tuples in R
2

that
were not joined with R1 with null

values
.




Full outer join:
NATURAL
FULL

OUTER JOIN

R2
, the result
will contain the result of left outer join and right outer join.



WHE
RE

condition

, the WHERE clause can contain:



Numerical
and
string comparison
:

between
(
val1

AND

val2
)



Logical components:
AND, OR



Null verification
: IS NULL, IS NOT NULL

(
return true or false
)

Notice
that using null not in these two form will r
eturn unknown
value to any query
, for example: sname=NULL => unknown
.



Checking against a list

o
f values
:


attribute

IN/

NOT IN

(sub
-
query)
.



A patten matching operator :
colname

LIKE

pattern
. The patten
can contain '_' for sigle char , '%' for 0 or more

characters,
for example:

WHERE sname LIKE ‘R_%y’
;



(
comparison operator
)
ALL / ANY

(
sub
-
que
ry
)
, for example
>ALL
means greater than
every value
,
greater than the maximum

value
.

>ANY means greater than
at least one value
, that is,
greater than
the minimu
m.

Notice:

the result of ALL query is
not contained in ANY query and vise versa.




EXIST

(sub
-
query)


is
a test to

check if
the subquery returns
at least one row
, if so return them
.
Vise versa for
NOT EXIST
.


GRO
U
P BY

grouping
-
attributes

: partition the ta
ble into groups by
the values of the groping attribute/s.

One answer in generated for
each group!


HAVING

group
-
condition

: a c
onstrains

that is checked per group.


ORDER BY



this
clause

can be used to sort results of
one or more
columns

by
ASC
(accending
-

default)
or DESC

(decending) order.

Querys Operators
:

These

operators remove duplicates by default!




[Query]
UNION

[Query]

. We can Use
UNION ALL

to avoid
duplicated removal.



[Query]
MINUS

[Query]



[Query]
INTERSECT

[
Query
]



D
IVISION


(not suc
h operation exist we should use not
exist to express dividion)
for example:
Sailors who reserved all boats

=

Sailors for which there does not exist a boat that they did not reserve

Aggregate Operators
:

These are operation that are only allowed in SELECT an
d HAVING.

If the select contain both aggregation operation and attribute we must
apply group by.

Notice: aggregation operation cannot be nested because the result of
each operetion is a single value.

o

COUNT(*)



count the rows in the table including null
values.

o

COUNT([DISTINCT] A)

-

count the num of rows in attrubute A not
including null vaules.

o

SUM([DISTINCT] A)

-

ignore null values

o

AVG([DISTINCT] A)

-

ignore null values


o

MAX(A)

-

ignore null values

o

MIN(A)

-

ignore null values

Null Values

There are diffe
rent type of null value:

1.

Value Unknown

2.

Value Inapplicable


3.

Value Withheld

Two important rules:

(NULL)
arithmetic

operator (value) =NULL

(NULL)
comparison operator

(value) =UNKNOWN

Notice:

Only tuples for which the WHERE clause has value TRUE are used
to c
reate tuples in the result
.


The evaluation order
of a query

1.

Compute the cross product of the tables in
relations
.

2.

Delete all rows that do not satisfy
condition
.

3.

Partition the table by groups if GROUP BY exist.

4.

Delete all
groups

that do not satisfy
group
-
c
ondition
.

5.

Delete all columns that do not appear in
attributes

or if select
contain an aggregestion
operation apply it for each group
.

6.

If Distinct is specified eliminate duplicate rows






First recursion

Second recursion

Recursion


WITH [RECURSIVE]

R1
AS

<definition of R1>

<query involvi
ng R1 >

The value of R
1

is derived by repeatedly evaluating its definition until no
changes are made
. For exmple:

WITH RECURSIVE Reaches(frm,to) AS


(SELECT frm, to FROM Flights)


UNION


(SELECT R1.frm, R2.to



FROM Reaches R1, Reaches R2



WHERE
R1.to = R2.frm)

SELECT to FROM Reaches WHERE frm=‘Tel Aviv’

Mutually Recursive

A
dependency graph

has a node for each relation and an edge if the first
relation
uses the second in its

recursive

definition
.
R and S are
mutually
recursive
, if there is a cyc
le in the graph involving nodes R and S
. To avoid
infinit loops
R can be defined using a mutually recursive relation S only if
R is
monotone

in S,
hence a
dding an arbitrary tuple to
S
never cause a
tuple to be deleted from R
.
Notice

that even though only m
inus is not
monotonic

action the recursive itself can still be non
-
monotonic without
using minus!


Translating RA to SQL
:

We assume that :



E does not use the same relation twice



No two relations have any attributes with the same names

W
e are translat
e

a
relational algebra expression E into SQL

using
inducation:

Base case:

0 operators (E is just a relation)


Induction Step:

Assume that for all E with less than k operators, there is
an SQL expression equivalent to E. We show for k
. Let us look at the “last

operator”
in E:

E=

A1,..,Ak
(E
1
):


E =

C
(E
1
):
E = E
1



E
2

:


E = E
1

-

E
2

:


E = E
1



E
2
:





Notice
that
every

operation is a query
stand on it
s own and
every
relation
is a query on its own
.
Hence e
ve
n a simple RA quer
y can end up as a huge
SQL query.

PL/pgSQL
:

This is a P
rogramming language for SQL.

Basic structure

Create [or replace] function funcName(varName1 varType1,varName2
varType2,…) returns returnVarType AS $$

DECLARE

(optional)


/* Variable declaratio
n*/

name


[CONSTANT]

type
[

NOT NULL] [DEFAULT or :=


expression
]

name
B%TYPE

(this mean that name with have B's type)

name
Table%ROWTYPE

(name will contain a whole row with the same types as in
Table);

name
record

-

A record is similar to row
-
type, but

we don’t have to predefine
its structure.

BEGIN

(mandatory)

Conditioning

IF

boolean
-
expression


THEN

statements



ELSIF

boolean
-
expression


THEN

statements

END IF
;

Loop

[
WHILE

expression

]

[
FOR

var

IN

[ REVERSE ]
stRange

..
endRange
]

-
var
is no
t declared in the declare
section
.

[
FOR

target
IN

query
]
-

looping through query result.

LOOP
statements


Exit
-

causes termination of the loop

Continue



jump to next iteration

END LOOP
;

Select into [strict]


place something into a variable. If the variabl
e is
rowtype and more than one row is returned
, the first row will be taken, or
nulls if no rows were returned
. If we w
ant

an error to be raised we can
add the strict.

If
not found

then

-
Checking if a row was returned
.

Raise Notice/Exception/Warning…
-

to

print messages to the shell.

EXCEPTION

(optional)

WHEN condition [OR condition ...] THEN handler_statements

END;

(mandatory)

$$ language plpgsql;

Calling the function:
Selec
t (funcName(val1, val1..));

Triggers

A trigger defines an acti
on we want to take
place whenever some event has occurred.




Syntax:

CREATE TRIGGER

name

{
BEFORE
|

AFTER
}
{
event

[ OR ... ] }
ON

table

[
FOR EACH ROW
|
STATEMENT
]

EXECUTE PROCEDURE

funcname

(
arguments

)


And then for the function:

CREATE FUNCTION

triggerName

RETURNS trigger AS $$



When a trigger is fired several
vari
ables are automatically created:

OLD

(the old value of the variable),
NEW

(
the

new value) TG_OP (the
operation that triggered the trigger).

Notice

that in insert there is no OLD and in

delete there is no NEW.


Row
-
level

before
triggers are usually used to

modify or check the data
that is changing

(the returned value in the trigger function count).

Row
-
level

after
triggers are usually used to

propagate the effect
of
the changes to other

tables

( the returned value in the trigger
function has no meaning).


Statement

level

neither in before nor in after the return value has a
meaning.



Design Theory

The purpose of design theory:


Some
n
otation:


Letters at the beginning of the a
lphabet (e.g.,
A, B, C

...)
denote a
single

attribute
.



Letters at the end of the alphabet (e.g.,
X, Y, Z

...)
denote
sets of
attributes
.


Use
XY

for
X

U
Y


Use
ABC

for {
A, B, C
}


We use

s, t


to denote rows (tuples)


We use
t[X]

to denote the set of values o
f t in the fields of X

Functional Dependency (FD)
:

X

Y

hold

if f
or every two tuples s and t in an instance of R
,

if s[X]=t[X]
then s[Y]=t[Y]
. In other words
if

every two rows that have the same
X
value, also have the same
Y

value
.

Notice

if X has no two t
uples with the
same value then the FD always hold (
הקיר הרוצב
).

Trivial

FD:

A functional dependency X


Y is
trivial

if it
always holds.

For example:
D


D
,
SD


S
.

X

Y

is trivial

Y

X.

Implication of FDs
:

We write
F


X

Y

)תיגול העיבנ(

i
f, for all instances r, if all the FDs in F
hold
, then also X

Y holds.

I
n order to show that


F

X


Y
we must
find a relation in which F holds

but X

Y does not hold
.

A Proof System: Armstrong's Axioms



Reflexivity: If
Y

X, then X

Y



Augmentation: If X

Y, then XZ

YZ



Transitivity: If X

Y and Y

Z, then X

Z

Some additional axiom (driven from Armstrong Axioms):



Union: If X

Y and X


Z, then X

YZ



Decomposi
tion: If X

YZ, then X

Y and X

Z



Pseudo Transitivity: If X

Y
and YW

Z, then XW


Z


If we can prove that X

Y follows from F using Armstrong's axioms, we
write F
X

Y
)תוחיכי(
.


Armstrong axioms are complete and sound hence:

F
|=

X

Y

F

X

Y

The Closure of
Attributes
:

X
+
F
= {A |

F
X

A

}.

T
he closure lemma
:
Y

X
+
F

F
X

Y

Showing that F |=
X

Y

is an exponential problem
however

there is a
n

efficient algorithm to find
F

X

Y

and since the two are equivalent

it'
s
possible
to solve both problems. How to do it? We will calculate the
closure of X with respect to F
,

X
+
F

and then check if
Y

X
+
F

, by the
closure lemma we get
F

X

Y
.

Computing

X
+
F
:

To compute the closure of V
we
add to V
all the attribu
tes (X) that can be implied
from

the attributes in V
by using the
FDs in F.

Run time is O(
X+
F).

Example:
R=ABCDEGKH
,
F = {
A


C
,
B


D
, DE


H,
DK


G
}

Does F|=ABK

G?
let us check:
ABK+

= ABK
C
D
G
.

G

ABK+


F

ABK

G


F|=ABK

G.

Claim:
V
W


Closure of FDs
:

F
+

= {X


Y | F
X

Y}

Computing
F
+

is e
xponential
, but to check if FD
X

Y

is in
F
+

all we need to
do is check if Y

X
+
.


Keys and Superkeys
:

W
e say that X is a
superkey
in
relation
R if F
X

R
. Hence X implied all
other attributes in R.

We say that X is a
key

if X is a superkey and X is minimal
, hence
X
-
{A} is
not a superkey
.


Finding a Key of R= (A
1
,...,A
n
)
:

K := A
1
,...,A
n
;


for
(i=0
; i<n; i++){

if R
(K
-
{A
i
})
+
F

then K:=K
-
{A
i
}

}
Return K
;

Starting from all R we one by one try to remove one of the attribute and
check if it's still a superkey.

Notice:

attributes that

never appear on the right side of any

FD must be
part of a

key.

Finding all keys Algorithm:

Keys
:= Find a Key with R,F


For

each K


Keys
do

For

each X

A


F do

S:=K
-
{A}


X

If S does not contain any J


Keys

then

S’ :=
Find a key with
(S,F)

Add S’ to Keys

Return

Keys

For example: R = ABCDEG,
F = {A


B, BC


E,

ED


A}
.

first we find one key CDEG
.

Keys:=
CDEG
, let's look at all the FDs that C,D,G or E on the right side.

Using BC


E : CDEG
-
E


BC=

CDGB, this is a key.


Keys=

CD
G
E,
CDGB
.

Using
A


B
: CDGB
-
B

A=CDGA this is a key.

Keys=

CD
G
E,
CDGB
,
CDBA
.

Using
ED


A
: CDGA
-
A

ED=CDGE already exist.
So we got 3 keys!


Normal Forms
:

Boyce
-
Codd Normal Form (BCNF)
:

R is in BCNF
if for every X

Y

F
+

one of the following holds
:


X

Y is trivial
(Y

X)
or



X is a superkey

of R
.

BCNF prevent
all
redundancy because if X is a super key then there is no
two tuples that are the same (for if there was then we will get two exact
s
ame line and hence once will be deleted).

Notice:

That if no FDs are
given there is no redundancy.

Third Normal Form (3NF)
:

R is in Third Normal Form (3NF) if for every

X

A

F
+

one of the
following holds
:


X

A

is trivial (
A

X)
or


X is a superkey

of R
or



A is an attribute in some key of R
.

3NF prevents some redundancy
. Any BCNF is also 3NF.

Decompositions
:

Whe
n given a "bad" schema that has

a lot of redundan
cy we would like to
decompose it

into sub schema. The following properties
must be
preserved:



Lossless



D
ependency preserving



Be

BCNF or at least 3NF.

Testing for Lossless
:


We say decomposition is lossless if the natural join between then bring us
back to the original relation (no less and no more either!).

1.

Testing for Lossless w
ith 2 relations R
1
, R
2
: Decomposition

is
lossless

F
|=

R
1

R
2


R
1

or F
|=

R
1

R
2

R
2
.Test this by
computing
R
1

R
2

and check if either R
1

or R
2

is contained in this
set.

2.


General

Lossless algorithm
: create
a table
with the attribute as
columns. For each row place an 'a' at each
attribute of the
decomposition. Now for each FD X

Y in F check if there are
places in X that have an 'a' while Y doesn't, if s
o place 'a' in Y.

for example: R=ABCD

F = {
B


C
,
C


D
}






At the end, if there is row containing only “a”

values, then the
decomposition is lossless
.

Dependency Preservation
:

When

a tuple is inserted into a relation, we immediately test whether the
tu
ple satisfies the FDs.

If the decomposition is dependences preserving
then
it is sufficient to test that each of the
FD

defined locally, on the
decompositions hold, in order to ensure that all dependencies hold on the
entire relation
.

Testing for dependenc
ies preservation:

Let R be a relation, decomposed into R
1
, R
2
,…,R
n
. F
or each X

Y in F

we
run:



D

C

B

A

a

a

a

a

a

a

a


a

a





Notice: if
R
i

F then for sure
R
i
is preserved.

For example:
R=ABCD
,
F = {A

B, B


C, C


D, D


A}
,
R
1
=AB, R
2
=BC,
R
3
=CD

It is clear that A
-
>B, B
-
>C and C
-
>D are preserved, since the
y

are
contained in
F
. It remains to show that

D
-
>A is preserved:

-

Z=D.

-

Using R3, we get Z=D U ((D
CD)+
CD) = DC

-

Using R2, we get Z=DC U ((DC

BC)+

BC) = BDC

-

Using R1, we get Z=BDC U ((BDC

AB)+
AB
) = ABDC

From here on Z canno
t change hence we exit the loop and check if
A
ABCD
, yes! So the decomposition is preserving.

NOTE:

Even if the decomposition is dependen
cy preserving, we sti
ll

have
to c
ompute a natural join before querying
or
inserting
, otherwise we
might be
losing some tuples!

Normal Form of a Decomposition
:

The
projection of F on X

is the set of FDs in
F
+

that involve only
attributes in X.

Denoted as
F
X
.


We compute

F
X

by computing the
closure
of each subset of attributes in X, in other word for each
Y

X, compute
= {Y,A,B} and then add Y

AB

to

F
X
.


To determine the normal form of a decomposition
R
1
, R
2
,…,R
n

we need to
check the normal form of the
of each
R
i
.

Notice:

relation containing only 2 attributes is always in BCNF
.

If the
decomposition is BCNF it doesn't necessarily
mean the decomposition is
lossless!

Finding a Decomposition

(DE
-
P)

Decomposition into 3NF:

3NF DE
-
P is: lossless, FD preserver and
every relation in the
DE
-
P

is in
3NF
. The algorithm is:




Given a relation R with functional dependencies F



Step 1:

Find a mini
mal cover G of F



Step 2:

For each FD X

A in G, create a schema XA



Step 3:

If no schema created so far contains a key, add a key as a
s
chema



Step 4:

Remove schemas that are contained in other schemas

Finding a Minimal Cover

A
minimal cover

for a set F of FD
s is a set G of FDs such that:



Every dependency is of the form X

A, where A is a single attribute



We can
delete an attribute or a dependency from G and still keep
F
+
=
G
+

, meaning G
only
contains FD that do not follow
from the
others
.

The algorithm

of find
ing a minimal cover:



Create G by d
ecompos
e
each FD so that it has a single attribute on
the right side
.



For each X

A and for each B in X, check whether
A

(
X
-
B
)+

If
so, remove B
.



For each X

A in G, check

if A
X
+

(we compute X
+

with
out
X

A
).

Decomposition into BCNF
:

There always exists a decomposition into BCNF that has a lossless join

but not always can we find algorithm that is also dependence preserving.

Simple Algorithm for a lossless BCNF Decomposition



If R is in BCNF, return R



Otherwise, let X

Y be a BCNF violation



Define R
1

= (X
+
). Define R
2

= X


(R
-
X
+
)



Recursively find a BCNF decomposition of R
1

and R
2



Return the union of these decompositions

For example :

R = ABCDE,
F = {AB

C, DE

C, B

E}

The dependency AB
-
>C is a BCNF viol
ation.

R1 = AB+ = ABCE. R2 = AB U (ABCDE
-
ABCE) = ABD


Consider R1: The dependency B
-
>E is a BCNF violation.

R11 = B+ = BE, R12 = B U (ABCE
-
BE) = ABC

R11, R12, R2 is in BCNF so the result of the decomposition: BE, ABC, ABD

Notice:

the redundant that we p
revent through BCNF decomposition is
only FD redundancy, there are other kind of redundancies that aren’t
prevented (
multivalued dependency
).

Applying Design Theory

i.

Start by identifying all attributes that should be modeled

ii.

Then, identify all functional d
ependencies

iii.

If the functional dependencies form cycles, identify which
implications are wrong, i.e., where there are 2 paths that lead to
the same attribute, but in reality may lead to different values

iv.

From 3, discover which attributes have 2 meanings and
should be
split. Split these

v.

Identify ISA relationships.

vi.

Do not model the functional dependencies implied by the ISA
relationships. Instead, model ISA relationships using foreign key
constraints

Query Processing

The Disk
:

The Disk consist of several
plat
ter
, each one is
divided to equal circular
tracks
.

The

set of
the
same size track

from top to bottom is called a
cylinder
. Each track is divided to
sector

(512B)
and a
block

of memory normally consist of 8
sector = 4KB.

Each block has a header consist of a

fixed part and an array of pointers to the
records
in the block. Records can move feely inside the
block, they'
re
all
packed at the end of the block
.


The delay in transferring data
=

Seek Time

(the time to find the right
cylinder)
+

Rotational Delay

(time

it takes to rotate the track so the head
point to the right block)
+

Transfer Time
.

R
eading
seq
uential

blocks

save us the
S
eek

and
R
otational

time and hence
we would like to arrange

the
data on disk in an efficient way.

Running
-
Time Analysis
: The running

time of an algorithm is the I/O cost
.
We measure the I/O cost in terms of the number of blocks

that are read
or written

while ignoring the CPU time and the final write to the disk
since all queries do that.

Notice:

Updating a block count as two I/O

read

update
-
write.

Read and
write are done with a whole block (even if we need only small part of it).

The File:

Typically, all the tuples in a given table in the

DB

are stored in files
.
Files
are simply sets of blocks

that
may or may not be

continuous in the
disk
memory
.

Files can be organized in three different ways:

Heap File:


In a heap, records are packed into blocks in no particular order
. It's easy
to insert a block but difficult to search for one.

Sorted File:

File is sorted

sequentially on some searc
h key

(attribute)
, and a primary
index is associated with it

(meaning an indexed file is always sorted!)
.

R
ecords are inserted in sorted order
hence i
f the file consist of

N
records
tha
n we will need
logN to search f
or the right place, N for
shifting

recor
d
s to make space for the new record
,

total of logN+N

to
insert a new record
.

Search key

is a field(s) on which we

can search efficiently for a given
k
ey

value

(it's not necessary a key in FD term
).

P
rimary index

-
the primary index
determine the physical

o
rder of the
records on

the disk
.

It
can store either an index or the file itself
. (it's
often also referred as the
primary search key
)


S
econdary index

i
s on an

unsorted

field
.

It

store
s

the records of a dense
index

(not sparse!)
, namely pair of key and poi
nter. It's often also
referred as the
secondary search key
.

Notice:
The index blocks

and the blocks

are
not

necessarily

contiguous,
but they are chained
.

When ordering the indices we can use two type of index:

Dense Index
:



An index record appears for

ever
y

search key
value in file.

Keys can r
epeat

in the file but
will
have only one index
.



A

record contains
a
search key value and a pointer
to the actual record.



When deleting

and inserting always

have to
update the index.

Bucket
-

bucket is an intermediate

level of indices that
are used to make dense index more efficient as in the
case of secondary index with a file with duplicates.


Sparse Index:



Only one index entry for each block (for the

block’s
first

value)
.



In case of duplicates
in the file
i
ndex e
ntry is created

to first instance
of each search key.



To locate a record, we find the index record wi
th
the largest search key value
less than or equal to
the search key value we are looking for.




If the index records get too big we can also use
multi leve
l indices.



When
deleting

a record only have to update the index if we deleted
the search key of the index record.






When
inserting

if there is empty space in the block we are inserting
we don't need to update the index. But if the block is full we will need

to shift the records to make space for the new record and change the
index accordingly. To avoid this we can use overflow blocks.

However
these cause the file

to be
not sequential
.

Clustered index:

stored

the data

contiguously

in the
blocks of the file

ac
cording to a primary search key

(for example all values of A are
clustered together)

This means that there can only be one clustered
index per tabl
e
.

Notice that sorted index is always clustred but the
opposite in not

necessarily true
.

Scattered

index:
is

a secondery search key used to
store

the data

across the file.

B+Trees
:


B+ Trees are always balanced
. Each node in the tree
corresponds

to a
block.

When traveling the in the tree:



A leaf contain
s

a search key
,

a
pointer to the record with the key

an
d
a pointer to the next leaf. A
non
-
leaf contain
s

search keys and
pointers to its

child
-
node. Notice that it always has one pointer more
then than number of keys.


B+ Trees keep similar
-
valued records together on

a disk page, which
takes advantage of locali
ty of

reference.


B+ Trees guarantee that every node will

be full at least
to a certain
minimum percentage. So when a node is full and need to be split each
new node will contain minimum amount of pointers:

Non
-
leaf:



(n+1)/2


pointers

Leaf:


(n+1)/2


po
inters to data




B+

Tree can
be
implement
ed

with
either a primary or a secondary
search ke
y (index
).



Insertion into a B+

Tree:

a)

S
imple case

-
space available in leaf
.


b)

L
eaf overflow



for example:
insert 7

c)

N
on
-
leaf overflow
-

for example insert 160

Notice:

when creating a new node in the same
level we always use the right most key, but when creating a new
node up the tree we take the left most key.

d)

N
ew root
-

for example
inset (45)


Delet
ion from a

B+ Tree:

(a) Simple case


no index update.

(b) Merge with

neighbor (sibling)


delete (50)

(c) Re
-
distribute keys

-

delete (50)

(d) Cases (b) or (c) at non
-
leaf


However, i
n reality the merge part is not implemented since it
'
s
inefficient.

In general B+ Tree is more efficient then Dense & Sparse
Index.

If an in
dex on
A
is implemented as either a B+

tree, sparse or dense
index, then it is a
sorted
index

unless A

is
a

primary search key that is

implemented as a hash file, then the relation is clustered on
A
but not
sorted
. If A

is a secondary search key, then the

relation is

not clustered.

Notice
: sorted index doesn't mean sorted file unless the index is also
primary.

Algorithms for Computing R

S


B(R)

denote
s

the number of blocks
of relation

R
.


T(R)

denotes the number of records in R.


V(R,A)

denotes
the number

of distinct values
A

in the relation
R
,
i.e., the number of distinct records in

A
(
R
)

The optimal analysis:

I/O cost
: B(R)+ B(S)

Memory requirement
: B(R) +2

Nested Loop:

In

each

iteration
we read


from R and join it with S, so

we need to
read S's blocks

times.

I/O cost
:



Memory requirement
:

M+2

Indexed Nested Loop
-

For each record
r

of

R
, we use the index to get only the records of
S

that can be joined with
r
.
Outer loop is repeated
T(R)

times
.

This method
b
eats other join methods if |
R
| is not too big
!
The I/O cost of the inner

loop depend on the type of index:

I/O cost:

Clustered:
T(R)





+
B(R)





Scattered:
T(R)


+
B(R)


Memory requireme
nt:

2 blocks.



Pipelined Nested

Loop
I

This
is used for the
computation of
R


S


T

and it
does not

write any
intermediate result to the disk
.

W
e

read M blocks from R, N blocks from
S and then compute the join of the three of them.
The

buffer should be
divided between
R

and
S

,
where

M
+
N

= total buffer size
-
2 blocks(one
for T one

for the output).

Notice

we are computing
S


T



times!

The I/O cost
:



Memory requirement
: M+N+2

Pipelined Nested

Loop
II

An alternative

algorithm
is to compute the first join just once
. For each
M blocks of R that we read we
compute the join
R


S

and keep the result
of the join in N blocks. Now we compute the join with the result and T. So
we need M+N + 1 block for S+ 1 block for T+ 1 block for output.

The I/O cost
:

W is the size of the
intermediate

result.

Memor
y requirement
: M+N+3

Notice

1
:

unless we use pipeline we always have to add
another

B(W)

to
the I/O cost , that is the cost of writing the
intermediate

result to the
disk.

Notice 2:
if we have enough memory for the intermediate result and R
then we get th
e optimal I/O
.

Hash Join:

First we p
artition R into n

=

buckets

with size n
: R
1
,


,R
n

using a

hash function
h

that has a range of
n

values
.
Similarly, partition
S
:
S
1
,

,
S
n
.
This require reading each block of R and S hashing it into one of
the

buckets and when the bucket

is full write i
t

to the disk.
S
o I/O cost
for partitioning R

and S
is
2B(R) + 2B(S)
.

Now all we need to do is

compute
R
i



S
i

for all
i

, (the hash function made sure that all values in

R
i



and

S
i

are the same)

each

cost
B(R
i

) + B(S
i
)


(since we have memory of
size n)
.

Sum of I/O cost over all
i

is
B(R)

+
B(S)
.


I/O cost
:
3B(R)+ 3B(S)
.

Memory requirement
:

+2

Sort
-
Merge Join
:

First w
e arbitrarily divide R

and S

into
M
1

and M
2

lists

respectively total
of

(B(R)

+ B(S)

1

lists
.

e
ach list is read

fro the disk
, sorted in
the
m
emory and written back to

the

disk
,
that cost

2B(R)
+2B(S).

Now to
merge the lists

we need one block from each list in the memory
:




In each step, advance the pointer that points to the minimal
value
among all the lists
.



Repeat this step until there is a pair of pointers, one for a list of
R

and the other for a list of
S
, such that both of them point to the
same value
v
.



Now, advance the pointers of the other lists, until each one of
them points
to either
v

or a value greater than
v
.



Next, join the records of
R

and
S

that have the value
v
, and then
continue to scan

The
total cost of the join is B(R)+B(S).


I/O cost
:

3B(R)+ 3B(S)
.

Memory requirement in the best case
2
:
M
1

+M
2

+1

Memory requirement i
n worst case
3
:
M
1

+ M
2

+ 1 +
P


P = max
a

A

{

}
, M
1
+ M
2

=

(B(R) + B(S)





1

the idea is that to be able
to

make the
join efficiently we need

enough space in the memory

meaning

:

M >

+

)

2

This is the case when all records are randomly distributed among the lists.

3

This is the case when the same records are all in the same list while the other lists have few or non
of that record.


Estimating
the (average) number of records in the
result
:




T[
(R )
]

=
T(R)/ V(R,A)
.

Notice:

If R has a clustered index on A than
T[
(R )
]

=
B
(R)/ V(R,A)


Notice:

that T(R)

=V(R,A) if A is
a key in terms of FDs.



B[
(R )
]

=

T[
(R )
]
/

|B|
.




T[
(R
)]

=

T(R)/
3
.

Notice:

If R is sorted on A and also have a sorted
index on A than

T[
(R
)]

=


B(R)/ 3


(similarly for A>a)
.



T
[
(R
)]

= T(R)
-
T(R)/V(R,A).



T[

C1


C2
(
R
)
]

=T[

C1
(
R
)
]

T[

C
2

(
R
)
]



if C1 and C1 are disjoined
)םירז(
:
T[

C1


C2
(
R
)
]

= T[

C1
(
R
)
]

T[

C
2

(
R
)
]



if C1 and C2 are independent:
)החדהו הלכהה ןורקיע יפל
)

T[

C1


C2
(
R
)
]

= T[

C1
(
R
)
]

T[

C
2

(
R
)
]
-

T[

C1


C2
(
R
)
]

.

NOTE:

If we push p
rojection and sel
e
ction

down the query then
processing projection and selection

are actually done "on the fly"
,

namely
they are done as part of other operations

(w
hen reading a relation from
the
DB

or

w
hen generating the result of a join
)
, hence they have zero
I/O cost and no intermediate result.




Under the
assumption
.

T[
R

S
]
=


,
that is if R
&

S
share

attribute B
,

if R & S share also attribute A
than:

T[
R

S
]
=
.

And if they share
no attribute
T[
R

S
]

= T(R)
T(S)
)תיזטק הלפכמ(
.



We assume independence of the join equalities
, so if V
1
<V
2
..<V
k
,
V
i
=V
i
(R
i
,A)
than

T [R
1

R
2



R
k
]
=
.

Notice
:

The size
the result of
join sequence is independent

of

the
order

of the join.



T[R
-
S]
=
T(R)
-
T(S)



T[RUS]
=
T(R)+
T(S)




Optimizing R
1
R
2


R
k

An optimal plan consists of
:



Determine

the join order
-

join tree.



Optimal plan for each join operation
.

Left
-
join Tree:

corresponds to a sequence
R
1

R
2



R
k
that is
executed left to right, namely,
((

((R
1

R
2
)


R
3
)


)


R
k
-
1
)

R
k

.

Why Left
-
join tree of all trees? Because i
t drastically cuts down the
search space

and it's

more important to avoid bad plans than to find the
best one

so we assume tha
t there exist one left
-
join tree that is not bad.

So now
we
have a left
-
join tree but we have

permutation of ordering
the join. To reduce this to

we use
d
ynamic
p
rogramming
. The idea is
starting from the inner most brackets we
c
heck for each
i
which join is
the best when using
R
i

as the last relation in the join
.

W
e continue
inductively
to the next brackets
until the best order of join found.

Notice
:

that result is not necessarily the best order since we might need
to join two re
lation that are not the best join in order to get the optimal
solution in the end.

A different approach is "
G
reedily Finding a Join Order
":



Start with the pair that generates the smallest result



Among the relations not yet included in the join, choose the
one that
generates the smallest result when it is joined with the current
sequence







Transaction Management

לדומה
-

תוינכות לש הטשפה
:

לע תולועפ קר תללוכ תינכותה לש )היצקרטסבא( הטשפהה

םינותנה דסמ

(
read / write
)

ו
דסה
ר
ובש

תועצבתמ

תולועפ

הלא
:


הקסע וא( העונת
)
:
הביתכו האירק תולועפ לש הרדס איה )היצקסנרט( העונת
לע תועצבתמש
םיטירפ
(
items
דסמהמ )
.

הקסע

תארוק אל

תבתוכ וא

א
טירפ ותו
תחא םעפמ רתוי

ו
האירק
איה
.הביתכ ינפל דימת


ןומזת
:
ןמזה ריצ לע רדסה תא גציימ ןומזת

לש

ליבקמב תועונת רפסמ
רשאכ ,
לכב
הלועפ תעצבתמ ןמז תדוקנ
תחא

דבלב
.


יתרדס ןומזת
:

וז רחא וז תועצבתמ תועונתה ובש ןומזת אוה יתרדס ןומזת
יתרדס ןפואב
רשאכ ,
תעצבתמ םעפ לכ
הקסע

הפוס דעו התליחתמ דבלב תחא
.
!ןוכנ דימת יתרדס ןומזת ןאכמ


ילבקמ ןומזת
:

ש ושוריפ
תואקסע
תדוקנ לכב רמולכ ,תבלושמ הרוצב תועצבתמ
תמ ןמז
ה תחא לש תדדוב הלועפ תעצב
תואקסע
.


תויליבקמ

Concurrency
:

יתמ
ןומזת
יליבקמ
יתרדס אל /

?ןוכנ אוה

יתרדיס עוציבל לוקש אוה םא

דחא

עמל(
םיעוציבה לכל לוקש אוה דחאל לוקש אוה םא הש
.)םייתרדסה

םילוקש םינומזת
:

םיטבמ תוליקש
:

םינומזת
S
1

ו
-

S
2

םניה
םיטבמ ילוקש
:םא




S
1

ו
-

S
2

תועונת ןתואמ םיבכרומ


ס התוא העונת לכלו(
י
)םינומזתה ינשב תולועפ תרד



םא
T
k

תארוק
ה תא
ךרע
ה
לש יתלחתה
A

ב
-

S
1
זא ,
T
k

ב םג תארוק
-

S
2

ה תא
ךרע
ה
לש יתלחתה
A

.



םא
T
k

לש ךרע תארוק
A

י"ע בתכנש
T
i

ב
-

S
1
זא ,
T
k

ב םג תארוק
-

S
2

לש ךרע
A

י"ע בתכנש
T
i
.



םא
T
i

לש יפוס ךרע תבתוכ
A

ב
-

S
1
זא ,
T
i

ב םג תבתוכ
-

S
2

לש יפוס ךרע
A


בל ומיש
:
העונתש םיכרעל
T
i

סמל םיבתכנש םייפוס םיכרע הלא םא קר העפשה שי תבתוכ
ד
םתוא תארוק תרחא העונתש וא


תרוויע הביתכ
:

ותוא אורקל ילבמ טירפ תבתוכ איה םא תרוויע הביתכ תעצבמ העונת

)!העפשה תרסח תויהל הלוכי תרוויע הביתכ(


תועונת ןיב םיטקילפנוק
:

רדגומ טקילפנוק
תולועפ יתש ןיב

ש תונוש תואקסע לש
תועצבתמ
ע
,טירפ ותוא ל
רשאכ
איה ןהמ תחא תוחפל
הביתכ תלועפ
.

עבוק טקילפנוק

תא
יתש רדס
ה
תולועפ
ש
תונוש תועונת י"ע טירפ ותוא לע תועצבתמ

.


םיטקילפנוק תוליקש

:

ינש

םינומזת

םנה

ילוקש

םיטקילפנוק

םא

םה

םיבכרומ

ןתואמ

תועונת

)
לכלו

העונת

התוא

תרדס

תולועפ

ינשב

םינומזתה
(

ןכ ומכ
לכ

דמצ

לש

תולועפ

שיש

ןהיניב

טקילפנוק

עיפומ

ינשב

םינומזתה

ותואב

רדס
.

:המל

םיטקילפנוק תוליקש

.)ןוכנ אל ךפהה( םיטבמ תוליקש

םינומזת

ירב
-
תויתרדס
serializable)
(

תיטבמ
:

וניה ןומזת
רב
-
אוה םא תיטבמ תויתרדס
והשלכ יתרדס ןומזתל םיטבמ לוקש
.

ירב םינומזת
-
תיטקילפנוק תויתרדס
:

ןומזת
S
1

וניה
רב
-
ס
תיטקילפנוק תויתרד
םייק םא
ןומזת

יתרדס
S
2

םיטקילפנוק ילוקש םנה םינומזתה ינשש ךכ
החנהב תאז .

ש
םיטירפה לכ
ל םישדח םיטירפ סינכהל ןתינ אל רמולכ( םיעובק
-
DB

ןומזתה ךלהמב
)
: אמגודל .





:טפשמ

םא
S

זא ,תורוויע תוביתכ אלל ןומזת וניה
S

רב וניה
-
תיטבמ תויתרדס
ם"םא

S

וניה
רב
-
תיטקילפנוק תויתרדס
.

הרעה
:
רב אוהש ןומזת
-
לש רדסה תא םיפילחמ םא םג הזכ ראשנ תיטקילפנוק תויתרדס
טקילפנוק ןהיניב ןיאש תובקוע תולועפ יתש

(
תובקוע תולועפ

ןהש תולועפ דמצל סחייתמ
תונוש תואקסעל תוכיישו ןמזה ריצ לע תובקוע
)

תקידבל םתירוגלא

רב
-
יטקילפנוק תויתרדס
ת
:

תויומידקה ףרג

ליכמ ןותנ ןומזת רובע

לכל תמוצ
הקסע
,
מ תנווכמ עלצ
-

T
i

ל
-

T
k

שי םא
ןיב טקילפנוק
T
i

ןיבל

T
k

ש עבוקש
-

T
i

ינפל עיפוהל הכירצ
T
k
.

טפשמ
:

ונה ןומזת

רב
-
תיטקילפנוק תויתרדס

םילגעמ רסח ונה ולש תויומידקה ףרג


תוליענ י"ע תוליבקמ תרקב
:

לכל
םילוענמ יגוס ינש שי טירפ
:
( ףתושמ לוענמ
shared
( ידעלב לוענמ ,)
exclusive
)
.



טירפ אורקל ידכ
A

ףתושמ לוענמב ותוא לוענל הליחת ךירצ

תואקסע רפסמ רשאכ(
.)ףתושמ לוענמ ותוא שקבל םילוכי




טירפ בותכל ידכ
A

ידעלב לוענמב ותוא לוענל הליחת ךירצ
.




ףתושמ לוענמ גרדשל הלוכי הקסע

ידעלב לוענמל

תוקיזחמה תואקסע ןיאש יאנתב
לע ףתושמ לוענמ
A

.

תוזאפ יתשב תוליענ לוקוטורפ
2PL

:


לבקל תבייח הקסע
:
האירק ינפל ףתושמ לוענמ
ו
מ
.הביתכ ינפל ידעלב לוענ

הקסעש רחאל
םיפסונ םילוענמ שקבל הלוכי אל איה והשלכ לוענמ תררחשמ
.

:טפשמ

םייקמש ןומזת
2PL

רב אוה
-
דס
תיטקילפנוק תויתר

)ןוכנ אל ךפהה(
.


Deadlock
-

:ןואפיק

.הלש לוענמה תא ררחשת תרחא הקסעש הכחמ הקסע לכ יכ העוקת הנכותה אובש בצמ
:ןואפיק יבצמב לפטל םיכרד יתש


ןואפיק יבצמ תעינמ
:
םינוב
wait
-
for

graph

שי רשאכ
מ עלצ
-

T
1

ל
-

T
2

םא
T
1

עגרכש לוענמל הכחמ
T
2

הקיזחמ
שי םא .

לטבמ םילוענמה להנמ יזא ףרגב לגעמ
.תואקסעה תחא תא


לוטיבו ןואפיק יבצמ יוליג
:
ןמז תמתוח הקסע לכל םינתונ
הלחתהה ןמז תא תנייצמש(
)הקסעה לש

הקיתו הקסע
ידע
הפ

לע
:םילוקוטורפ ינש םנשי רשאכ הריעצ הקסע



Wait
-
Di
e


ש הריעצ הקסע
תשקבמ

לטבתהל הלולע לוענמ

לצא אוה םע
,הקיתו

.הכחת הקיתוה זא הלצא רבכ אוה םא לבא



Wound
-
Wait



ש הריעצ הקסע
הקיזחמ

לטבתהל הלולע לוענמ

םא
.לוענמה תא הכירצ הקיתו הקסע

הבערה
(
starvation
)
:

אוב בצמ
תמייתסמ הניא ןכלו בושו בוש תלטבתמ הקסע
גישהל החילצמ אל םעפ ףא ןכלו
.הכירצ איהש תוחתפמה לכ תא

,הבערה עונמל ידכ

ריאשהל ךירצ שדחמ הקסע םיליחתמשכ
התייהש ןמזה תמתוח תא הל
םדוק הל

.רוקמב

תוליענ אלל תוליבקמ תרקב
:

ל םיבלש השולש
הקסיע עוציב
:

1
)

.םיטירפ תאירק בלש

2
)

:הקידב בלש
תרמוש הקסע לכש םיקדוב

לע

תוליקשה

םע

ןומזתה

יתרדסה

םא

הקסיע

הניא

תרבוע

תא

תוקידבה

םימרוג

הל

לטבתהל

ליחתמ
םי
התוא

שדחמ
.

3
)

:קסידל הביתכ בלש
תואצותה תא הקיתעמ הקסיעה ,החלצהב הרבע הקידבה םא
דסמל יטרפה הדובעה חטשמ תויפוסה
.


ןמז תומתוחב תושמתשמ תורקבה לכ

תואקסעה לע

:ןמסנ




START(T)

הקסיעה הליחתמ ובש ןמזה
T




VAL(T)

רובע הקידבה תעצבתמ ובש ןמזה
T




FIN(T)

ה תמייסמ ובש ןמזה
הקסיע
T




RS(T)

ש םיטירפה תצובק איה
-

T

הארק



WS(T
)

ש םיטירפה תצובק איה
-

T

הבתכ

םנשי
4

תוליבקמ תורקב יגוס

תומיאה בלשב םילדבנה

רצונש ןומזתהו םהלש
:

תרקב

תוליבקמ

ע
"
י

תומיא
:


ןומזתה

רצונש

וניה

לוקש

םיטקילפנוק

ןומזתל

יתרדס

תואקסעה לש

יפל

רדסה

לש

ינמז

קידבה
( ה
תומיא
(
.


הקידבה בלש

יטירק עטק אוה
:)הז בלשב תוליבק ןיא רמולכ(

T

תקדבנ
עגרכ
,

קודבנ
:



םא
START(T) < FIN(U)
ןכ םא ,

ש םיקדוב
-

RS(T)


WS(U) =



ל
לכ
הקסיע
U

הקידבה תא החלצהב הרבעש
.



םא
VAL(T) < FIN(U)
םיקדוב ןכ םא ,
ש
-

WS(T)


WS(U) =


הקסיע לכל
U

בה תא החלצהב הרבעש
הקיד
.

:בל םישנ

הביתכ גוסמ םיטקילפנוק רצווהל םילוכי אל יכ
-
יכ האירק
איהש ירחא קר תבתוכ
הקידבה בלש תא החלצהב תרבוע

ו
-

U

הקדבנש ינפל דסמהמ אורקל המייס
.

תימיטפוא תוליבקמ תרקב

:


ןומזתה

רצונש

וניה

לוקש

םיטקילפנוק

ןומזתל

יתרדס

תואקסעה לש

יפל

רדסה

לש

ינמז

קידבה
( ה
א
תומי
(
.


רמולכ ,יטירקה עטק אוה הביתכהו הקידבה בלש
תקדבנ רתויה לכל תחא הקסיע קר
ןותנ עגר לכב םינותנה דסמל תבתוכ וא
:


T

תקדבנ
עגרכ
,

םא קודבנ
START(T) < FIN(U)
ןכ םא ,
ש םיקדוב
-



RS(T)


WS(U) =



ל
הקסיע לכ
U

הקידבה תא החלצהב הרבעש
.

נוק קודבל ךירצ אל(
הביתכ טקילפ
-
.)יטירק עטק אוה הביתכה בלש יכ הביתכ

:לקוטורפה רופיש

הקסיע רשאכ
T
i

קודבל רשפא ,הקידבה בלש תא החלצהב תרבוע

לכ רובע

הקסיע
T
k

,הקידבה בלש תא הלחה םרטש
איה םא
טירפ הארק
A

ש
-

T
i

ל תדמוע
.בתכש
שי םא
הקסיע
תאזכ

הקידבה בלשב לשכית איהש רורב זא

ןכל
הל רשפא
תא גור
T
k

( דיימ
kill
policy
( הקידבה בלשל עיגתשכ תומל הל תתל וא )
die policy
שי הז ןויער שממל ידכ .)

תואקסעה לכ לע עדימ תרמושה הלבט רומשל
הקידב ורבע םרטש

:וארק םהש םיטירפהו
(transaction id, item id, modified)

הקסיע רשאכ
T
i

טירפ דסמל תבתוכ
A
הנשמ איה ,
לכב

ושרה
טירפה רובע תומ
A

תא
הדשה
modified

ל
-

true

.

kill policy

-

םיגרוה

דיימ
מ ןבומכ ץוח( תואקסעה לכ תא
-

i
T

ןהלש תומושרה תאש ,)המצע

T
i
התניש
.

:בל םישנ

נ םא יכ הטיש לש הקקש הזש בל םישנ
בת איה ,הקידבה בלש דע ץורל ךישמ
אי
ש םיפדה לכ תא ימינפה ןורכיזל
צ איה
הכיר

ו
ןכל

רה
הברה היהת הקסיעה לש האבה הצי
הרצק רתוי
)ןורכיזהמ ופע אל םיפדהש החנהב(
רתוי לודג יוכיס שי ךכיפלו

תא רובעת איהש
חלצהב הקידבה בלש


die policy

-
הקידבה בלשב
קודבנ

והשלכ טירפ רובע ,תומושרה תחאב םא
A
הדשה ,
modified

אוה

true

.הלטבנ ןכ םא


םא
T
i


.הלבטהמ הלש תומושרה לכ תא תקחומ איה החלצהב המייס

הלבטה לע תולועפה
לוענמ תובייחמ

רמולכ
הלבטה לע ידעלב לוענמ הכירצ הקסיעה

.הלבטה לע יוניש לע רובע

ןמז תומתוח לע תססובמה תוליבקמ תרקב
:

תוח םירמוש ונא ןאכ
:הקסיע רפ קר אלו טירפ רפ ןמז תומ



RTS(A)

יצמה ןמז תמתוח הנה
תני
תא
הארקש רתויב הריעצה הקסיעה לש ןמזה
תא
A
הקסעשכ ןכל .
T

טירפ תארוק
A

םא קודבת איה ,
RST
(A)<TS(S)

ןכ םא
R
TS(A)=TS(T)
.



WTS(A)

תנייצמה ןמז תמתוח הנה

תא
הבתכש רתויב הריעצה הקסיעה לש ןמזה
תא
A
.

הקסעשכ ןכל
T

טירפ תבתוכ
A

םא קודבת איה ,
W
ST
(A)<TS(S)

ןכ םא
WTS
(A)=TS(T)
.


ןומזתה
ש
וניה לעופב רצונ
לוקש

םיטבמ

)םיטקילפנוק לוקש אלו(

יתרדסה ןומזתל
לש רדסה יפל תועצבתמ תואקסעה ובש
הלחתהה ינמז
.


האירק וא הביתכ רפ השענ הקידבה בלש

חתפמ תלעב תבתוכש הקסע רשאכ
ידעלב

תבייחתמ איהש דע
:




הקסיעשכ
T

טירפ אורקל הצור

A
םא
TS(T)
WTS(A)

זא

RTS(A) :=

max{ TS(T), RTS(A) }

יכ(
הקסיע
T

רבכש טירפ אורקל הלוכי אל
תרחא הקסיע ידי לע בתכנ

ירחא עצבתהל הרומאש
T
)



רשאכ
T

טירפ בותכל הצור
A

םא
TS(T)
R
TS(A)

םגו
TS(T)


WTS(A)

זא
WTS(A) := TS(T)
.



e
Thomas Write Rul
:

םא
S(A)
RT


TS(T)

םגו
WTS(A)
<
TS(T)

םולכ םישוע אל
רובע שדח ךרע בותכל ילבמ ,ךישמהל הלוכי הקסיעהו
A

תא תונשל וא
WTS(A)
רמולכ ,
לש יונישה
T

תועמשמ רסח
תרחא הקסיע יכ


T
'

ש
ירחא העיפומ לוקשה יתרדסה ןומזתב
T

הביתכ עצבת
לש תרוויע
A

ש המ תא סורדת ןכלו
-
T

הביתכ תאזש םיעדוי .תבתוכ

יכ תרוויע

WTS(A
)= T
'

וליא
T
'

מ התיה
צב
םג זא האירק תע
R
TS(A
)= T'

זא לבא
TS(T) <
RTS(A)

.החנהל דוגינב

תבורמ תוליבקמ תרקב

תואסרג
:


ןומזתה
ש
לוקש וניה לעופב רצונ

םיטבמ

)םיטקילפנוק לוקש אלו(

יתרדסה ןומזתל
לש רדסה יפל תועצבתמ תואקסעה ובש
.הלחתהה ינמז


סמב טירפ לכל
תובר תואסרג םירמוש ד
:

o

הקסיעשכ
T

טירפ תבתוכ
A

השדח הסריג תרציימ השעמל איה
A
'

לש
A
ש ךכ ,
-

WTS(A
'
)
=TS(T)

.

o

תואסרגהמ תחא לכל
A
'

טירפ לש
A

שי
RTS(A
'
)
ל הוושש ,
-

TS

הקסיעה לש
וז הסריג הארקש תרחואמ יכה
.



חתפמ תלעב תבתוכש הקסע רשאכ האירק וא הביתכ רפ השענ הקידבה בלש
:תבייחתמ איהש דע ידעלב




הקסעשכ
T

אורקל הצור
טירפ
A
,
ןיא
םיבוכיע
!

איה

הסריגה תא תארוק
A
'

:םייקמש

max{A'|
WTS(A
'
)
}

:ןכדעמו .



RTS(A
'
):= max{TS(T), RTS(A
'
)}



ל
הקסיע
T

השדח הסריג בותכל רתומ

לש
A

תמייק הסריג ףא ןיא םא
A
'

ש ךכ
-

RTS(A
'
) > TS(T) > WTS
(A
'
)


?תואסריג םיקוחמ יתמ

טירפל םא
A

אוה הלש הביתכה ןמזש הסריג שי
t

תואקסעה לכו
ןמז ירחא וליחתה תכרעמב תוצרש
t
לש תואסרגה לכ תא קוחמל רשפא זא ,
A

ןמזש
ל םדוק ןהלש הביתכה
-

t
.

:םילוקוטורפ ןיב האוושה



אכ רתוי םיבוט )ןמז תומתוח לע םיססובמש( תוליענ אלל םילוקוטורפ
טעמ םייופצ רש
תואקסע ןיב םיטקילפנוק

יכ
,שדחמ ןתלחתהלו תואקסע לוטיבל םימרוג םיטקילפנוק
םילוענמב שומישמ רתוי דוע בכעמ הזש



שיו םיטקילפנוק טעמכ ןיאשכ רתוי םיבוט ימיטפואה לוקוטורפהו תומיאה לוקוטורפ
ימינפ ןורכיז קיפסמ




לוקוטורפ ףא

תילמיסקמ תוליבקמ גישמ וניא




נמה
תוליבקמ תרקבל ןונג

תויבקיעו דודיבה תנוכת יכ חיטבמ )םילוענמ ילב וא םע(
.תורמשנ דסמה לש



תוששואתה

תוליפנמ

תונוכת ןנה תודימעו תוימוטא
ש דסמה לש
תוששואתהל ןונגנמה י"ע תוחטבומ

"נה



ןתיא דדומתהל שיש תויעבה
:

1
.

תולטבתמ תואקסע
ונילע הז הרקמב .
עצבל
rollback

נילע ןכלו
ו
ענמהל
מ
האירק
תכלכולמ
-

( וילע ובייחתה םרטש עדימ לש האירק
commit
.)

תוששואתה רשפאמש ןומזת

)הקסע לוטיבמ(
:

הקסע םא תוששואתה רשפאמ ןומזת
ירחא קר תבייחתמ
ש
סעה לכ
בייחתה הארק איה ןהיכרע תאש תואק
אלש שורדנ( ו
.) תוכלכולמ תואירק ןיאש

לוקוטורפ
2PL

רימחמ
:
ומכ
2PL

ליגרה

השירדה תפסותבו
ש

תא תררחשמ הקסע
הבייחתהש ירחא קר הקיזחמ איהש םילוענמה
.תוששואתה רשפאמ הז לוקוטורפ .

2
.

לופיל לוכי בשחמה
.

עדימ רוזחיש


תוששואתה

וילע ושעש דחא וא( םינותנה דסמ רובע םיקסיד ינש בשחמב


partition
ל אוה דחאה )
-
DB


יוביגל אוה ינשהו
LOG
כנש ייוניש לכ .
ל בת
DB

בתכנ
םדוק

ל
LOG

לכ השעמל ןכל(
)תוביתכ ינש הכירצמ הביתכ
רשאכ
השעמל ונל שי ימינפה ןורכיזב

buffer

הביתכ רובע
ל
-
DB

ו
-
buffer

ל הביתכ רובע
-
LOG
ל בותכל הילע בייחתהל הצור הקסע םא .
-
LOG

תא
טירפ לכ לע ועצובש םייונישה
A

טירפ לכ לש םיירוקמה םיכרעה תא +
A

כל זאו
איהש בות
.תבייחתמ

:תלעיימ השעמל קסידל הלופכה הביתכה


תיתרדס איה גולב הביתכה


גולה ףוסל תושדח תומושר םיפיסומ דימת
עדימו ,
דחא קולב ךירצמ כ"רדב גולל םיפיסומש
הלוז איה גולל הביתכ ןכל ,
.


ה
-
LO
G

רמולכ הפיכא יאב ךמות
ל תבייחתמ איהש תבתוכ הקסעש עגרב
-
LOG

עמה תניחבמ
אל דוע םמצע םיונישהש תורמל תאז םילוענמ ררחשל הלוכי איה תכר
ל ובתכנ
DB
!
עצבל תלוכי תורחא תואקסע תעכ
םיקולב םתוא לע םיינויש

השעמ לכו
!דחא תבב םיבתכינ םייוסמ קולב לע םייונישה לכ



LOG

ה םא רמולכ ,הבינגב ךמות
buffer

דועש עדימ לש הביתכ רשפאמ אוה אלמ
הילע ובייתה אל
ונבתכש עדימהש הדימב .תרחא הקסעל םוקמה תא תונפלו גולל ם
.תכלכולמ הביתכ בשחנ אל הז ןכלו ירוקמ וכרעל ותוא רזחשל ןתינ לטבתה

תוששואתהל ינטשפ םתירוגלא
:


( הזילנאה בלש
Analysis Phase
)
:
מ( גולה לע םירבוע הז בלשב
קסידה תיתחת
)
תומושר גולב שי תואקסע הזיא רובע םיקדובו
co
mmit

הזיא רובעו


ןיא



( שדחמ היישעה בלש
Redo Phase
)
:
תואקסעה לכ תא שדחמ םיעצבמ הז בלשב
)ובייחתה אלש הלא ללוכ(


( היישעה לוטיב בלש
Undo Phase
)

:

לש תוביתכה לכ תא םילטבמ הז בלשב
ובייחתה אלש תואקסע

גולה תליחתמ תוששואתה עצבל ליעי אל הז

םיעצבמ םעפל םעפמ ךכיפל
chec
kpoint

זאו
המ תוששואתה עצבל ךירצ
-

checkpoint

רחאה
םעפ ידמ .ןו
לש םיקסידה תא תובגל ךירצ
םיקסידה סרה לש הרקמב דסמה תא רזחשל היהי ןתינש ידכ ,דסמה
ונעציבש בותכנ .

יוביג

.תאזה 'קנהמ קר הז רוזחש לש הרקמב ןכלו גולל
Fuzzy Checkpoint and

Fuzzy
Backup

עוציב םירשפאמ
chec
kpoint

.בשחמ לע תוצרש תואקסע לכ תריצע אלל יוביגו

הבורמ תויניערג םע םילוענמ

םימוטנאפה תייעב
:

ל םישדח םיטירפ הסינכמ הקסע אוב בצמ
-
A

הלענ תרחא הקסעש ןמזב
תא
A
.

.היוגש האצות לבקנו ןכתייו ןובשחב םיחקלינ אל וסנכוהש םיטירפה השעמל זאו

צרנ תונוש תויצאוטיס תחתש ןאכמ
.םינוש םילוענמב שמשתהל ה
תושרל
םילעונמה להנמ
ץע
םילוענמ לש היכררה
:
לכ תליענ ץעה שורוש
םינותנה דסמ
,תואלבט יכמש
)סחי( הלבט לכ
םיפד הליכמ
,
תומושר ליכמ )קולב( ףד לכ
:תוליענה יגוס .



לוענמ
X
-

קסידל הביתכל ותוא םיכירצ ,ידעלב



לוענמ
IX

-

שוריפ
ו

ל שיש
הקסע

וענל הנווכ
לוענמ י"ע תמוצה לש םיאצאצ ל
X




לוענמ
S
-

shared
קסידהמ אורקל הצור הקסעש ריהצמ ,



לוענמ
IS
-

שוריפ
ו

ל שיש
הקסע

לוענמ י"ע תמוצה לש םיאצאצ לוענל הנווכ
S

קיזחהל הלוכי תחא הקסע
IX

םגו
S

םא לבא
לוענמ שי םא
X
לוענמ ףאל תורשפא ןיא זא ,
ףסונ

הקסע ףא י"ע

.

יענל לוקוטורפה
בר תול
-
תויניערג

o

י"ע שרושה תליענב ליחתהל תבייח הקסע
S
,
X
,
IX

וא
IS


o

י"ע תמוצ לוענל הלוכי הקסע
IS

וא
S

הקיזחמ איה םא
IS

תמוצה לש הרוהה לע

o

י"ע תמוצ לוענל הלוכי הקסע
IX

וא
X

הקיזחמ איה םא
IX

תמוצה לש הרוהה לע

o

לוענמ םא
גוס לכמ

ה תועמשמה זא ,ץעב תמוצ לע קזחומ
לע קזחומ הז לוענמש אי
תמוצה לש םיאצאצה לכ לע
.

o

( אוהש גוס לכמ לוענמ ריסהל רשפא
IS
,
IX
,
S

וא
X
םוש ןיא םא קר ,ןותנ תמוצמ )
תמוצה לש םידליה לע םילוענמ

o

תוזאפ יתשב תוליענ לש לוקוטורפה יפל תולהנתמ תוליענה לכ
.




:םימוטנאפה תיעבל ןורתיפ

המיאתמה תמוצה תא לענת הקסעה
י"ע סחיה לכל
IX

י"עו
S

(
תולוכי תורחא תואקסע
י"ע הז תמוצ לוענל
IS

)
י"ע םיפסונ םיאצאצ לענת הקסעה ךשמהב
IX

וא
X

תנמ לע
.קסידל םתוא בותכל