Unit - 4

orangesvetElectronique - Appareils

8 nov. 2013 (il y a 7 années et 11 mois)

317 vue(s)

Unit
-

4

Introduction to the

Other

Databases



Introduction :
-


The

Distributed

Database

System

(DDBS)

is

a

database

physically

stored

on

several

computer

systems

across

several

sites

connected

together

via

communication

network
.


Each

site

is

typically

managed

by

DBMS

that

is

capable

of

running

independently

of

the

other

site
.


In

other

words,

each

site

is

a

database

system

site

in

its

own

right

and

has

its

own

local

users,

its

own

local

DBMS,

and

its

own

data

communication

managers
.


It

site

has

its

own

transaction

management

software,

including

its

own

locking,

logging

and

recovery

software
.


Although

geographically

dispersed,

s

distributed

database

system

manages

and

controls

the

entire

database

as

a

single

collection

of

data
.


The

location

of

all

data

items,

and

degree

of

autonomy

of

individual

sites

have

a

significant

impact

on

all

aspect

of

the

system,

including

query

optimization

and

processing,

concurrency

control

and

recovery
.


In

DDBS,

both

data

and

transaction

processing

are

divided

between

one

or

more

computers

connected

by

network,

each

computer

playing

a

special

role

in

the

system
.


The

computers

in

the

distributed

systems

communicates

with

one

other

via

various

communication

media
.

They

do

not

share

main

memory

or

disk
.


A

DDBS

allows

applications

to

access

data

from

local

or

remote

database
.


DDBS

use

client/server

architecture

to

process

information

requests
.

The

computer

in

DDBS

are

referred

to

by

a

number

of

different

names

such

as

sites

or

nodes
.


Distributed

database

system

located

at

geographically

distributed

locations

because

of

the

need

of

using

the

part

of

database

locally

then

to

the

remote

access
.


For

example
,

local

branches

of

a

multinational

or

a

national

banks

or

a

large

company

can

have

their

localized

databases

situated

at

different

branches
.


The

advancement

in

communication

and

networking

system

triggered

the

development

of

distributed

database

approach
.


It

became

possible

to

allow

these

distributed

systems

to

communicate

among

themselves
,

so

that

the

data

can

be

effectively

access

among

computer

systems

in

different

geographical

locations
.


As

a

result

the

different

site

machines

are

quit

likely

to

be

heterogeneous
,

with

entirely

different

individual

architecture
.

General Distributed Database Architecture



Desired Properties of DDBS :
-


Distributed

database

should

have

the

following

properties

:
-


Distributed

data

independence
.


Distributed

transaction

atomicity
.

1.
Distributed

data

independence

:
-


This

property

enables

users

to

ask

queries

without

specifying

where

the

reference

relations

or

copies

or

fragments

of

the

relation,

are

located
.


This

principle

is

a

natural

extension

of

physical

and

logical

data

independence
.


Further,

queries

that

span

multiple

sites

should

be

optimized

systematically

in

a

cost
-
based

manner,

taking

into

account

communication

cost

and

difference

in

local

communication

cost
.

2.
Distributed

Transaction

Atomicity

:
-



This

property

enables

users

to

write

transactions

that

access

and

update

data

at

several

sites

just

as

they

would

write

transaction

over

purely

local

data
.


I

particularly,

the

effects

of

a

transaction

across

sites

should

continue

to

be

atomic
.


That

is,

all

changes

persist

if

the

transaction

commits,

and

non

persist

if

aborts
.



Types of Distributed Databases :
-


In

distributed

database

system

the

data

and

software

are

distributed

over

multiple

sites

connected

by

a

communication

network
.


However

DDBS

can

describe

various

systems

that

differ

from

one

another

in

many

respect

depending

on

various

factors,

such

as,

degree

of

homogeneity,

degree

of

local

autonomy,

and

so

on
.


Following

two

types

of

distributed

database

are

most

commonly

used

:
-


Homogeneous

DDBS
.


Heterogeneous

DDBS
.

1.
Homogeneous

DDBS

:
-


This

is

the

simplest

form

of

distributed

database

where

there

are

several

sites,

each

running

their

own

applications

on

the

same

DBMS

software
.


All

sites

have

identical

software,

are

aware

of

one

another

and

agree

to

cooperate

in

processing

user

request
.


The

application

can

all

see

the

same

schema

and

run

the

same

transactions
.


That

is,

there

is

location

transparency

in

homogeneous

DDBS
.

The

provision

of

location

transparency

from

the

core

of

distributed

database

management

system

(DDBMS)

development
.


In

homogeneous

DDBS,

the

use

of

a

single

DBMS

avoids

any

problem

of

mismatches

database

capabilities

between

nodes,

since

the

data

all

managed

within

a

single

framework
.

Homogeneous Distributed Database

2.
Heterogeneous

DDBS

:
-



In

this

DDBS,

different

sites

run

under

the

control

of

different

DBMSs,

essentially

autonomously

and

are

connected

somehow

to

enable

access

to

data

from

multiple

sites
.


Different

site

may

use

different

schemas

and

different

DBMS

software
.


The

sites

may

not

be

aware

of

one

another

and

they

may

provide

only

limited

facilities

for

cooperation

in

transaction

processing
.


In

other

words,

in

heterogeneous

DDBS,

each

site

is

an

independent

and

centralized

DBMS

that

has

its

own

local

users,

local

transactions

and

database

administrator

(DBA)
.


Heterogeneous Distributed Database



Advantages of DDBS :
-


Sharing

of

data

where

users

at

one

site

may

be

able

to

access

the

data

residing

at

another

sites

and

at

the

same

time

retain

control

over

the

data

at

their

own

site
.


Increase

efficiency

of

processing

by

keeping

the

data

close

to

the

point

where

it

is

more

frequently

used
.


Efficient

management

of

distributed

data

with

different

level

of

transparency
.


It

enables

the

structure

of

the

database

to

mirror

the

structure

of

the

enterprise

in

which

the

local

data

can

be

kept

locally,

while

at

the

same

time

remote

data

can

be

accessed

when

necessary
.


Increased

local

autonomy

where

each

site

is

able

to

retain

degree

of

control

over

data

that

are

stored

locally
.




Increase

accessibility

by

allowing

to

access

data

between

several

sites

via

communication

network
.


Increase

availability

in

which

if

one

site

is

fail,

the

remaining

sites

may

be

able

to

continue

operating
.


Increase

reliability

due

to

greater

accessibility
.



Improved

performance
.


Improved

scalability
.


Easier

expansion

with

the

growth

of

organization

in

terms

of

adding

more

data,

increasing

database

size

and

adding

more

CPUs
.


Parallel

evaluation

by

subdividing

a

query

into

sub
-
queries

involving

data

from

several

sites
.



Disadvantages of DDBS :
-


Recovery

of

failure

is

more

complex
.


Increase

complexity

in

the

system

designing

and

implementation
.


Increase

transparency

lead

to

a

compromise

between

ease

of

use

and

the

overhead

cost

of

providing

transparency
.


Increase

software

development

cost
.


Greater

potential

for

bugs
.


Increase

processing

overhead
.


Technical

problem

of

connecting

dissimilar

machines
.


Difficulty

in

database

integrity

control
.


Security

concern

of

replicate

data

in

multiple

location

and

the

network
.

1. Client / Server Architecture :
-


Client

/

Server

Architectures

are

those

in

which

a

DBMS

related

workload

is

split

into

two

logical

components

namely

client

and

server,

each

of

which

typically

execute

on

different

systems
.


Client

is

the

user

of

the

resources

where

as

the

server

is

the

provider

of

the

resources
.


It

has

one

or

more

client

processors

and

one

or

more

server

processors
.

The

applications

and

tools

are

put

on

client

platforms

and

they

are

connected

to

the

database

management

system

that

reside

on

the

server

platform
.


The

applications

and

tools

act

as

a

client

of

a

DBMS,

making

request

for

its

services
.

The

DBMSs

in

tern,

serves

these

requests

and

return

the

result

to

the

client(s)
.


Clients

are

responsible

for

user

interface

issues

and

servers

manage

data

and

execute

transactions
.


In

other

words

the

client/server

architecture

can

be

used

to

implement

a

DBMS

in

which

the

client

is

the

transaction

processor

(TP)

and

the

server

is

the

data

processor

(DP)
.


A

client

process

could

run

on

personal

computer

and

send

queries

to

the

server

running

on

a

mainframe

computer
.


All

modern

information

systems

are

based

on

client/server

architecture
.

Client/Server database Architecture


Components

of

client/server

architecture

:
-



Client

in

form

of

workstation

as

the

user’s

contact

point
.


DBMS

server

as

a

common

resources

performing

specialized

tasks

for

devices

requesting

their

services
.


Communication

network

connecting

the

clients

and

the

servers
.


Software

applications

connecting

clients,

servers

and

network

to

create

a

single

logical

architecture
.


Client

applications

issues

the

SQL

statements

for

data

access,

just

as

they

do

in

centralized

computer

environment
.


The

networking

interface

enables

client

applications

to

connects

to

the

server,

and

send

SQL

statements

which

are

created

by

the

clients

to

the

server,

and

revise

the

result

or

error

written

code

to

the

client,

which

is

send

by

the

server

after

processing

the

SQL

statement
.



Benefits of Client/Server Architecture


Relatively

simple

to

implement

because

of

the

centralized

server

and

clean

separation

of

functionalities
.


Better

adaptability

to

the

computing

environment

to

meet

the

ever
-
changing

business

needs

of

the

organization
.


Use

of

Graphical

User

Interface

(GUI)

on

microcomputer

by

the

user

at

client,

improve

the

functionality

and

simplicity
.


It

is

to

less

expensive

then

to

mini

or

mainframe

solution
.


Expensive

server

machines

are

optimally

utilized

because

users

are

interfering

with

the

inexpensive

client

machines
.


Overall

productivity

improvement

due

to

decentralized

operations
.


Improve

performance

with

more

processing

power
.



Limitations of Client/Server Architecture


The

client/server

architecture

does

not

allow

a

single

query

to

span

multiple

servers

because

the

client

process

would

have

to

be

capable

to

breaking

such

a

query

into

appropriate

sub
-
queries

to

be

execute

at

the

different

sites

and

then

putting

together

to

get

the

answer

to

the

sub
-
queries
.


An

interface

in

the

number

of

users

and

processing

sites

often

create

security

problem
.

2. Collaborating Server Systems :
-


In

collaborating

server

architecture,

there

are

several

database

servers,

each

capable

of

running

transactions

against

local

data,

which

cooperatively

execute

transactions

spanning

multiple

servers
.


When

a

server

receives

a

query

that

requires

a

access

to

data

at

other

servers,

it

generates

appropriate

sub
-
queries

to

be

execute

by

other

server

and

put

the

result

together

to

compute

answers

to

the

original

query
.

3. Middleware Systems :
-


The

middleware

database

architecture,

also

called

data

access

middleware,

is

designed

to

allow

a

single

query

to

span

multiple

servers,

without

requiring

all

database

servers

to

be

capable

of

managing

such

multisite

execution

strategies
.


Data

access

middleware

provides

users

with

a

consistent

interface

to

multiple

DBMSs

and

file

system

in

transparent

manner
.


Data

access

middleware

simplifies

the

heterogeneous

environment

for

programmers

and

provide

users

with

an

easier

means

of

accessing

live

data

in

multiple

source
.


It

eliminate

the

needs

for

programmers

to

code

many

environment

specific

requests

or

calls

in

any

applications

that

need

access

to

current

data

rather

to

copies

of

data
.



The

direct

request

or

call

for

data

movement

to

several

DBMSs

are

handle

by

the

middleware,

and

hence

the

major

rewrite

of

application

program

is

not

required
.


The

middleware

is

basically

a

layer

of

software,

which

works

as

a

special

server

and

coordinate

the

execution

of

queries

and

transactions

across

one

or

more

independent

data

servers
.


The

middleware

server

is

capable

of

executing

joins

and

other

relational

operations

on

data

obtain

from

the

other

servers,

but

typically

does

not

itself

maintain

any

data
.


Middleware

might

be

responsible

for

routing

a

local

request

to

one

or

more

servers,

transporting

the

request

by

supporting

various

networking

protocols,

converting

data

from

one

format

to

another
.

Middleware System


Data

access

middleware

architecture

consists

of

middleware

application

programming

interface

(API),

middleware

engine,

drivers

and

native

interfaces
.


API

usually

consists

of

a

series

of

available

function

calls

as

well

as

series

of

data

access

statements

(dynamic

SQL,

OBE

and

so

on)
.


The

middleware

engine

is

basically

an

application

programming

interface

for

routing

of

request

to

various

drivers

and

performing

other

functions
.

It

handles

the

data

access

requests

that

has

been

issued
.


Drivers

are

used

to

connect

the

various

data

sources

and

they

translate

the

request

received

from

the

API

into

the

proper

format

which

is

understand

by

targeted

data

source
.



1.) Data Fragmentation :
-

2.) Data Allocation :
-

3.) Data Replication :
-







Data Fragmentation :
-


This

is

apply

to

the

relational

database

system

to

partition

the

relations

among

network

sites
.


Technique

of

breaking

up

database

into

logical

unite,

which

may

be

assigned

for

storage

at

the

various

sites

is

called

Data

Fragmentation
.


In

the

fragmentation

the

relation

can

be

partitioned

into

a

several

fragments

for

physical

storage

purpose

and

there

may

be

several

replaces

of

each

fragment
.


These

fragments

contain

sufficient

information

to

allow

reconstruction

of

the

original

relation
.


All

fragment

of

the

given

relation

will

be

independent
.



None

of

the

fragment

can

be

derived

from

the

others
.


For

example,

let

us

consider

a

relation

EMPLOYEE

:


Now

this

relation

can

be

fragment

into

three

fragments

as

follows

:
-








Main Relation :
-

EMPLOYEE

Fragments

AT SITE

Based on

Mumbai_Emp

Mumbai

Dept_ID = 2

Jamsedpur_Emp

Jamsedpur

Dept_ID = 3

London_Emp

London

Dept_ID = 4

ID

NAME

DEPT_ID

SALARY

E
-
101

XYZ

3

12,000

E
-
102

XYZ

4

15,000

E
-
103

XYZ

2

13,000

E
-
104

XYZ

3

14,500

E
-
105

XYZ

4

12,000

E
-
106

XYZ

2

15,000


The

above

fragmented

relation

can

be

stored

at

various

site

as

shown

in

table

in

which

the

tuples

for

Mumbai

employees

with

Dept_ID

=

2

are

stored

at

Mumbai

site,

tuples

for

Jamsedpur

Employees

with

Dept_ID

=

3

are

stored

at

Jamsedpur

site,

tuples

for

London

Employees

with

Dept_ID=
4

are

stored

at

London

site
.


In

this

example

the

fragmented

names

are

Mumbai_Emp,

Jamsedpur_Emp,

London_Emp
.


Reconstruction

of

original

relation

is

done

via

suitable

JOIN

and

UNION

operations
.


The

system

that

support

data

fragmentation

should

also

support

fragmentation

independence

also

called

as

fragmentation

transparency
.


That

means

the

users

should

not

be

logically

concerned

about

the

fragmentation
.


The

users

should

have

a

fillings

as

if

the

data

were

not

fragmented

at

all
.


In

other

words,

fragmentation

independence

implies

that

the

users

will

be

presented

with

a

view

of

data

in

which

the

fragments

are

logically

recombine

by

means

of

suitable

JOINs

and

UNIONs
.


It

is

the

responsibility

of

the

system

optimizer

to

determine

which

fragment

need

to

be

physically

accessed

in

order

to

satisfy

any

given

user

request
.


Following

are

the

two

different

schemas

for

fragmenting

a

relation

:


Horizontal

Fragmentation

:
-


Vertical

Fragmentation

:
-


Mixed

Fragmentation

:
-



Horizontal Fragmentation :
-


A

Horizontal

Fragmentation

of

a

relation

is

a

subset

of

the

tuples

with

all

attributes

in

that

relation
.


Horizontal

fragmentation

split

the

relation

horizontally

by

assigning

each

tuple

or

a

group

of

tuples

of

a

relation

to

one

or

more

fragments,

where

each

tuple

or

a

subset

has

a

certain

logical

meaning
.


These

fragments

can

be

assigned

to

different

sites

in

the

distributed

database

system
.


A

horizontal

fragmentation

is

produced

by

specifying

a

predicate

that

performs

a

restriction

on

the

tuples

in

the

relation
.

Relation :
-

Jamsedpur_Emp

ID

NAME

DEPT_ID

SALARY

E
-
101

XYZ

3

12,000

E
-
104

XYZ

3

14,500

Relation :
-

London_Emp

ID

NAME

DEPT_ID

SALARY

E
-
102

XYZ

4

15,000

E
-
105

XYZ

4

12,000

Relation :
-

Mumbai_Emp

ID

NAME

DEPT_ID

SALARY

E
-
103

XYZ

2

13,000

E
-
106

XYZ

2

15,000

σ
<condition>
(R)



The

horizontal

fragmentation

can

be

written

in

terms

of

relational

algebra

as

:



MUMBAI_EMP

:

σ

Dept_ID

=

2

(EMPLOYEE)


JAMSEDPUR_EMP

:

σ

Dept_ID

=

3

(EMPLOYEE)


LONDON_EMP

:

σ

Dept_ID

=

4

(EMPLOYEE)



In

horizontal

fragmentation,

UNION

operation

is

done

to

reconstruct

the

original

relation
.



Vertical Fragmentation :
-


A

Vertical

Fragmentation

split

the

relation

by

decomposing

“Vertically”

columns

(attributes)
.


A

vertical

fragment

of

relation

keeps

only

certain

attributes

of

the

relation

at

the

particular

site,

because

each

sites

may

not

need

all

the

attributes

of

the

relation
.


Thus

vertical

fragmentation

groups

together

the

attributes

in

the

relation

that

are

used

jointly

by

the

important

transaction


A

simple

vertical

fragmentation

is

not

quit

proper

when

the

two

fragments

are

store

separately
.

Since

there

is

no

common

attribute

between

the

two

fragments,

we

can

not

put

the

original

EMPLOYEE

relation

together
.


Therefore

it

is

necessary

to

include

a

primary

attribute

or

candidate

attribute

in

every

vertical

fragmentation
.

П

a1, a2, …an

(R)


For

example

:



Fragment

EMPLOYEE

table

.


MUMBAI_EMP

:

(TID,

EMP_ID,

EMP_NAME)


JAMSEDPUR_EMP

:

(TID,

DEPT_ID)


LONDON_EMP

:

(TID,

EMP_SALARY)



MUMBAI_EMP

:

П

TID,

EMP_ID,

EMP_NAME

(EMPLOYEE)


JUAMSEDPUR_EMP

:

П

TID,

DEPT_ID

(EMPLOYEE)


LONDON_EMP

:

П

TID,

EMPSALARY

(EMPLOYEE)



The

original

relation

is

obtain

by

performing

JOIN

operation
.



Relation :
-

Mumbai_Emp

TID

EMP_ID

EMP_NAME

T
-
1

E
-
10215

XYZ

T
-
2

E
-
14587

XYZ

T
-
3

E
-
45875

XYZ

T
-
4

E
-
87456

XYZ

Relation :
-

London_Emp

TID

EMP_SALARY

T
-
1

12,000

T
-
2

15,000

T
-
3

16,000

T
-
4

18,000

Relation :
-

Jamsedpur_Emp

TID

DEPT_ID

T
-
1

2

T
-
2

3

T
-
3

2

T
-
4

3



Mixed Fragmentation :
-


Sometimes,

horizontal

or

vertical

fragmentation

of

database

schema

by

itself

is

insufficient

to

adequately

distribute

the

data

for

some

applications
.

For

that

mixed

or

hybrid

fragmentation

is

required
.


Thus

horizontal

fragmentation

of

a

relation

is

followed

by

further

vertical

fragmentation

or

vice

versa

is

called

Mixed

Fragmentation
.


A

mixed

fragmentation

is

defined

by

SELECT

or

PROJECTION

operation

of

the

relation

algebra
.

П

a1, a2, …an

(
σ
<condition>
(R))

σ
<condition>
(
П

a1, a2, …an
(R))


The

original

can

be

obtain

by

performing

JOIN

and

UNION

operations

of

relation

algebra
.





Data Allocation :
-


Data

allocation

describe

the

process

of

deciding

about

locating

or

placing

data

to

several

sites
.


Following

are

the

data

fragment

strategies

that

are

used

in

Distribute

Database

System

:


Centralized


Partitioned

or

fragmented


Replication

1.
Centralized

Strategies

:



In

this

strategy

entire

single

database

and

DBMS

is

stored

at

one

site
.

However

user

are

geographically

distributed

across

the

network
.


The

local

reference

is

there

for

all

the

sites,

except

centralize

site

for

all

the

data

access
.


Thus

the

communication

costs

are

high
.


Because

of

the

entire

database

is

there

on

one

site,

there

is

a

loss

of

entire

database

in

case

of

failure

of

single

system
.


Hence

the

reliability

and

availability

are

low
.

2.
Partitioned

Strategies

:



In

this

strategy

database

is

divided

in

to

several

disjoint

parts

(fragments)

and

stored

at

several

sites
.


The

data

item

is

located

at

the

site

where

it

is

used

more

frequently
.


Since

there

is

no

replication,

the

storage

cost

is

low
.


The

failure

of

system

at

particular

site

will

result

in

the

loss

of

data

of

that

site

not

entirely
.

Hence

the

reliability

and

availability

are

high
.


The

communication

cost

is

low

and

overall

performance

is

good

as

compare

to

the

centralized
.

3.
Replication

Strategies

:



In

this

strategy

copies

of

one

or

more

database

fragments

are

stored

at

several

sites
.


Thus

the

locality

and

of

reference,

reliability,

availability

and

performance

are

very

high,

but

the

communication

cost

and

storage

cost

are

very

high
.


Data

replication

is

the

technique

that

permits

storage

of

certain

data

in

more

then

one

sites
.


The

system

maintains

several

identical

copies

of

relation

and

store

each

copy

at

a

different

site
.


Data

replication

is

introduce

the

availability

of

the

system
.


If

a

copy

is

not

available

due

to

failure

of

system,

it

should

be

possible

to

access

another

copy
.



Data

can

be

replicate

as

:


REPLICATE

LONDON_EMP

AS



LONDON
-
MUMBAI_EMP

AT

SITE

‘Mumbai’

REPLICATE

MUMBAI_EMP

AS



MUMBAI
-
LONDON_EMP

AT

SITE

‘London’


Data

replication

should

also

support

replication

independence

also

known

as

replication

transparency
.


That

means

user

should

be

able

to

behave

as

if

the

data

were

in

fact

not

replicate

at

all
.


Replication

independence

simplifies

user

program

and

terminal

activities
.


It

is

the

responsibility

of

System

Optimizer

to

determine

which

replicas

physically

need

to

be

accessed

in

order

to

satisfy

any

given

user

request
.


Advantages

of

data

replication

:
-



Data

replication

enhances

the

performance

of

read

operations

by

increasing

speed

at

site
.

That

means

with

data

replication,

application

can

operate

on

local

copies

instead

of

having

a

communication

with

remote

sites
.



Data

replication

increases

the

availability

of

data

to

read
-
only

transactions
.

That

means

a

given

replicated

object

remains

available

for

processing,

at

least

for

retrieval,

so

long

as

at

least

one

copy

available
.


Disadvantages

of

data

replication

:
-



Increase

overhead

of

update

transactions
.

That

means,

when

a

given

replicated

object

is

updates

all

copies

of

that

object

must

be

updated
.



More

complexity

in

controlling

concurrent

updates

by

several

transactions

to

replicate

data
.