Distributed DBMSs-Concept and Design

dargspurNetworking and Communications

Oct 27, 2013 (3 years and 10 months ago)

70 views

Distributed DBMSs
-
Concept and
Design

Jing Luo

CS 157B

Dr. Lee

Fall, 2003


DBMSs

Centralized DBMS


It allows users to access
only a single logical
database located at one site
under its control.

Distributed DBMS


It allows users to access
not only the data at their
own site but also data
stored at remote sites.


Definitions


Distributed database
: A logically interrelated collection of shared
data (and a description of this data) physically distributed over a
computer network.


Distributed DBMS
: The software system that permits the
management of the distributed database and makes the
distribution transparent to users.



Users access the distributed database via
applications


Local applications


Applications are those do not require data from other sites.



Global applications


Applications are those do require data from other sites.








Characteristics of DDBMS


A collection of logically related shared data;


The data is split into a number of fragments;


Fragments may be replicated;


Fragments/replicas are allocated to sites;


The sites are linked by a communications network;


The data at each site is under the control of a DBMS;


The DBMS at each site can handle local applications, autonomously;


Each DBMS participates in at lease one global application.




A DDBMS is required to have at least one global application.

It is not necessary for every site in the system to have its own local
database.

DDBMS

Computer network

Site 1

Site 2

Site 3

Site 4

DB

DB

DB

Distributed processing


A centralized database that can be accessed
over a computer network.



Distributed Processing (cont’d)

Distributed Processing

Computer network

Site 3

Site 4

Site 1

Site 2

DB




Distributed DBMS vs. Distributed Processing

Distributed DBMS


System consists of data that is
physically distributed across a
number of sites in the network.


Distributed processing


Data is centralized, even
though other users may be
accessing the data over the
network.


Parallel DBMSs


A DBMS running across multiple processors and
disks that is designed to execute operations in
parallel, whenever possible, in order to improve
performance

Three Main Architectures for Parallel DBMSs

To provide multiple processors with

common access to a single database, a

parallel DBMS must provide for shared

resource management.


Shared memory


Shared disk


Shared nothing

Shared memory

is a tightly coupled architecture in which
multiple processors within a single system share system memory.


Symmetric multiprocessing (SMP)


This approach has become popular on platforms ranging from personal
workstations that support a few microprocessors in parallel, to RISC (Reduced
Instruction Set Computer) based machines, all the way up to the largest
mainframes.


The architecture provides high
-
speed data access for a limited number of
processors, but it is not scalable beyond about 64 processors when the
interconnection network becomes a bottleneck.

Shared Memory (cont’d)


Shared Memory

CPU

CPU

CPU

CPU

Interconnection network

Memory

DB

DB

DB

Shared disk

is a loosely
-
coupled architecture optimized for
applications that are inherently centralized and require high
availability and performance.


Each processor can access all disks directly, but each has its own
private memory.


Shared disk architecture eliminates the shared memory performance
bottleneck without introducing the overhead associated with
physically partitioned data.

Shared Disk (cont’d)


Shared Disk

Memory

Memory

Memory

Memory

CPU

CPU

CPU

CPU

Interconnection network

DB

DB

DB

Shared nothing

known as massively parallel processing, is a
multiple processor architecture in which each processor is part of
a complete system, with its own memory and disk storage
.


The database is partitioned among all the disks on each system
associated with the database, and data is transparently available to
users on all system.


This architecture can easily support a large number of processors.

Shared nothing (cont’d)


SN

Memory

CPU


CPU


Memory

Interconnection network

Memory


CPU


Memory


CPU


DB

DB

DB

DB

Homogeneous & Heterogeneous DDBMSs

Homogeneous system


All sites use the same DBMS
product.

Heterogeneous system


Sites may run different DBMS
products, which need not be based
on the same underlying data model,
and so the system may be composed
of relational, network, hierarchical,
and object
-
oriented DBMSs.

Heterogeneous system problems


In a heterogeneous system, translations are required to

allow communication between different DBMSs.

The system has the task of locating the data and

performing any necessary translation.


Data required from another site may have:


Different hardware


Different DBMS products


Different hardware and different DBMS products


If the hardware is different but the DBMS products are the same,

involving the change of codes and word length.

If the DBMS products are different, involving the mapping of data

structures in one data model to the equivalent data structures in

another data model.


Heterogeneous system problems (cont’d)

An additional complexity is the provision of a common

Conceptual schemas. The integration of data models can be very difficult owing to
the semantic heterogeneity.

For example, attributes with the same name in two

Schemas may represent different things. Equally well,

Attributes with different names may model the same thing.

Solution

Gateways
, which convert the language and model of each different DBMS
into the language and model of the relational system.


Limitation


It may not support transaction management. The gateway between two systems may be
only a query translator. For example, a system may not coordinate concurrency control
and recovery of transactions that involve updates to the pair of databases.


The gateway approach is concerned only with the problem of translating a query
expressed in one language into an equivalent expression in another language. As such,
generally it does not address the issues of homogenizing the structural and
representational differences between different schemas.

A
multidatabase system

(MDBS) is a distributed DBMS in which
each site maintains complete autonomy. An MDBS resides
transparently on top of existing database and file systems, and
presents a single database to its users. It maintains a global
schema against which users issue queries and updates; an MDBS
maintains only the global schema and the local DBMSs
themselves maintain all user data.

Concepts of Networking

Network

An interconnected collection of autonomous

computers that are capable of exchanging

information.

For our purposes, the DDBMS is built on top

of a network in such of a way that the

Network is hidden from the user.

Classification of network



LAN
: a local area network is intended for connecting computers
at the same site.



WAN
: a wide area network is used when computers or LANs
need to be connected over long distances
.


A special case of the WAN is a metropolitan area network
(MAN), which generally covers a city or suburb.

Summary of WAN and LAN characteristics

WAN


Distances up to thousands of kilometers
link autonomous computers


Network managed by independent
organization (using telephone or satellite
links)


Data rate up to 33.6 kbits/(dial
-
up via
modem), 45 Mbit/s (T3 circuit)


Complex protocol


Use point
-
to
-
point routing


Use irregular topology


Error rate about 1:10^5

LAN


Distances up to a few kilometers


Link computers that cooperate in
distributed applications


Network managed by users (using privately
owned cables)


Data rate up to 2500 Mbit/s (ATM)


Simpler protocol


Use broadcast routing


Use bus or ring topology


Error rate about 1:10^9

Network protocols

a set of rules that determines how messages between computers
are sent, interpreted, and processed.


TCP/IP (Transmission Control Protocol/Internet Protocol)


SPX/IPX (Sequenced Packet Exchange/Internetwork Package
Exchange)


NetBIOS (Network Basic Input/Output System)


APPC (Advanced Program
-
to
-
Program Communications)

Network protocol (cont’d)


DECnet


AppleTalk


WAP (Wireless Application Protocol)


SPX/IPX (Sequenced Packet Exchange/Internetwork Package
Exchange)