DISTRIBUTED DATABASE SYSTEM

tongueborborygmusElectronics - Devices

Nov 7, 2013 (3 years and 1 month ago)

57 views

INCON13-IT-017


1

DISTRIBUTED DATABASE SYSTEM

Ms.Manjushree S.Yewale
ASMS CSIT, Pune.

Abstract

Distributed database systems (DDBS) pose different problems when accessing distributed
and replicated databases. Particularly, access control and transaction management in DDBS
require different mechanism to monitor data retrieval and update to databases. Current
trends in multi-tier client/server networks make DDBS an appropriated solution to provide
access to and control over localized databases. Oracle, as a leading Database Management
System (DBMS) vendor employs the two-phase commit technique to maintain consistent state
for the database. The objective of this paper is to explain transaction management in DDBS
and how Oracle implements this technique. An example given to demonstrate the step
involved in executing the two-phase commit. By using this feature of Oracle, organizations
will benefit from the use of DDBS to successfully manage the enterprise data resource.
Concurrency control is a database management systems (DBMS) concept this used to
address conflicts with the simultaneous accessing or altering of data that can occur with a
multi-user system. Concurrency control, when applied to a DBMS, is meant to coordinate
simultaneous transactions while preserving data integrity. [1] The Concurrency is about to
control the multi-user access of Database All database systems must be able to respond to
requests for information from the useri.e. process queries. Obtaining the desired
information from database system in a predictable and reliable fashion is scientific art of
Query Processing.

Keywords: 1. Distributed database 2. Concurrency control.
3. Query optimization 4.query processing
Scope

A distributed database is a database in which storage devices are not all attached to a
common processing unit such as the CPU. It may be stored in multiple computers located in
the same physical location, or may be dispersed over a network of interconnected computers.
Distributed database can reside on network servers on the Internet, on corporate intranets or
extranets, or on other company networks. Distributed database replication and fragmentation,
there are many other distributed database design technologies.
Centralized Database
In a centralized database, all the data of an organization is stored in a single place such as a
mainframe computer or a server. The centralized database (the mainframe or the server)
should be able to satisfy all the requests coming to the system, therefore could easily become
a bottleneck. But since all the data reside in a single place it easier to maintain and back up
data. Furthermore, it is easier to maintain data integrity, because once data is stored in a
centralized database, outdated data is no longer available in other places.

Distributed Database
In a distributed database, the data is stored in storage devices that are located in different
physical locations. They are not attached to a common CPU but the database is controlled by
INCON13-IT-017


2

a central DBMS. Users access the data in a distributed database by accessing the WAN. To
keep a distributed database up to date, it uses the replication and duplication processes

Difference between Distributed Database and Central ized Database

While a centralized database keeps its data in storage devices that are in a single location
connected to a single CPU, a distributed database system keeps its data in storage devices that
are possibly located in different geographical locations and managed using a central DBMS.
A centralized database is easier to maintain and keep updated since all the data are stored in a
single location. Furthermore, it is easier to maintain data integrity and avoid the requirement
for data duplication. But, all the requests coming to access data are processed by a single
entity such as a single mainframe, and therefore it could easily become a bottleneck. But with
distributed databases, this bottleneck can be avoided since the databases are parallelized
making the load balanced between several servers. But keeping the data up to date in
distributed database system requires additional work, therefore increases the cost of
maintenance and complexity and also requires additional software for this purpose.
Furthermore, designing databases for a distributed database is more complex than the same
for a centralized database.

Distributed DBMS Architecture



The functions are divided into two classes: server functions and client functions. Providing a
two-level architecture which makes it easier to manage the complexity of DBMSs

Query processing
A given SQL query is translated by the query processor into a low level program called an
execution plan An execution plan is a program in function all language: The physical
relational algebra, specialized for internal storage representation in the DBMS. The physical
INCON13-IT-017


3

relational algebra extends the relational algebra with Primitives to search through the internal
storage
Structure of the DBMS.
Distributed query optimization
In a distributed database the optimization of queries by the DBMS itself is critical to the
efficient performance of the overall system. Query optimization must take into account the
extra communication costs of moving data from site to site, but can use whatever replicated
copies of data are closest, to execute a query. Thus it is a more complex operation than query
optimization in centralized databases
The function of a DBMS query optimization engine i s to finds an evaluation plan that
reduces the overall execution cost of a query. We have seen in the previous sections that the
costs for performing particular operations such as select and join can vary quite dramatically.

All database systems must be able to respond to requests for information from the useri.e.
process queries. Obtaining the desired information from a database system in
A predictable and reliable fashion is the scientific art of Query Processing. Getting these
Results back in a timely manner deals with the technique of Query Optimization.
Query optimization overview
Query optimization is essential if a DBMS is to achieve acceptable performance and
efficiency. Relational database systems based on the relational model and relational algebra
have the strength that their relational expressions are at a sufficiently high level so query
optimization is feasible in the first place; in non-relational systems, user requests are low
level and optimization is done manually by the user - the system cannot help. Hence systems
which implement optimization have several advantages over systems
The optimization process itself involves several stages, which involves the implementation of
the relational operators. A different approach to query optimization, called semantic
optimization has recently been suggested.
This technique may be used in combination with the other optimization techniques and uses
constraints specified on the database schema. Data localization maps calculus query into
algebra operations and applies data distribution information to the algebra operations. Query
decomposition consists of normalization, analysis; elimination of redundancy .Data
localization reduces horizontal fragmentation with join and selection, and vertical
fragmentation with joins, and aims to find empty relations.

Database systems and transaction processing (transaction management) distributed
concurrency control refers primarily to the concurrency control of a distributed database. the
most common distributed concurrency control technique is strong strict two-phase locking
distributed locking typical heavy performance penalty (due to delays, latency) can be saved
by using the atomic commitment protocol, which is needed in a distributed database for
(distributed) transactions' atomicity (e.g., 2PC, or a simpler one in a reliable system), together
with some local commitment ordering variant (e.g., local SS2PL) instead of distributed
locking, to achieve global serial inability in the entire system
INCON13-IT-017


4

Distributed concurrency control
Concurrency control in distributed databases can be done in several ways. Locking and time
stamping are two techniques which can be used, but time stamping is generally preferred.
The problems of concurrency control in a distributed DBMS are more severe than in a
centralized DBMS because of the fact that data may be replicated and partitioned. If a user
wants unique access to a piece of data, for example to perform an update or a read, the
DBMS must be able to guarantee unique access to that data, which is difficult if there are
copies throughout the sites in the distributed database.
Time stamping
Time stamping is a method of concurrency control where basically, all transactions are given
a timestamp or unique date/time/site combination and the database management system uses
one of a number of protocols to schedule transactions which require access to the same piece
of data. It is more complex to implement than locking, time stamping does avoid deadlock
occurring by avoiding it in the first place.
Concurrency control is the activity of coordinating concurrent accesses to a database in a
multiuser database management system (DBMS). Concurrency control permits users to
access a database in a multi programmed fashion while preserving the illusion that each user
is executing alone one dedicated system. The main technical difficulty in attaining this goal is
to prevent database updates performed by one user from interfering with database retrievals
and updates performed by another. The concurrency control problem is exacerbated in a
distributed DBMS (DDBMS) because (1) users may access data stored in many different
computers in a distributed system, and (2) a concurrency control mechanism at one computer
cannot instantaneously know about interactions at other computers Concurrency control has
been actively investigated for the past several years, and the problem for non distributed
DBMSs.
A broad mathematical theory has been developed to analyze the problem, and one approach,
called two phase locking, has been accepted as a standard solution. Current research on non
distributed concunency control is focused on evolutionary improvements to two phase
locking, detailed performance analysis and optimization, and extensions to the mathematical
theory.
Pessimistic Locking: This concurrency control strategy involves keeping an entity in a
database locked the entire time it exists in the database's memory. [2] This limits or prevents
users from altering the data entity that is locked. There are two types of locks that fall under
the category of pessimistic locking: write lock and read lock. [2]
With write lock, everyone but the holder of the lock is prevented from reading, updating, or
deleting the entity. With read lock, other users can read the entity, but no one except for the
lock holder can update or delete it. [2]
Optimistic Locking: This strategy can be used when instances of simultaneous transactions,
or collisions, are expected to be infrequent. [2] In contrast with pessimistic locking,
optimistic locking doesn't try to prevent the collisions from occurring. Instead, it aims to
detect these collisions and resolve them on the chance occasions when they occur. [2]
INCON13-IT-017


5

Distributed database transparency
A distributed DBMS should provide a number of features which make the distributed nature
of the DBMS transparent to the user. These include the following:
· Location transparency
· Replication transparency
· Performance transparency
· Transaction transparency
· Catalog transparency
The distributed database should look like a centralized system to the users. Problems of the
distributed database are at the internal level.
Advantages
· Management of distributed data with different levels of transparency like network
transparency, fragmentation transparency, replication transparency, etc.
· Increase reliability and availability.
· Easier expansion.
· Reflects organizational structure  database fragme nts are located in the departments
they relate to.
· Local autonomy or site autonomy  a department can control the data about them (as
they are the ones familiar with it.)
· Protection of valuable data  if there were ever a catastrophic event such as a fire, all
of the data would not be in one place, but distributed in multiple locations.
· Improved performance  data is located near the sit e of greatest demand, and the
database systems themselves are parallelized, allowing load on the databases to be
balanced among servers. (A high load on one module of the database won't affect
other modules of the database in a distributed database.)
Disadvantages
· Complexity  extra work must be done by the DBAs to ensure that the distributed
nature of the system is transparent. Extra work must also be done to maintain multiple
disparate systems, instead of one big one. Extra database design work must also be
done to account for the disconnected nature of the database  for example, joins
become prohibitively expensive when performed across multiple systems.
· Economics  increased complexity and a more extensi ve infrastructure means extra
labour costs.
· Security  remote database fragments must be secure d, and they are not centralized
so the remote sites must be secured as well. The infrastructure must also be secured
(e.g., by encrypting the network links between remote sites).
· Difficult to maintain integrity  but in a distribu ted database, enforcing integrity over
a network may require too much of the network's resources to be feasible.,
· Inexperience  distributed databases are difficult to work with, and as a young field
there is not much readily available experience on proper practice.