Enterprize database Managment x - Louisiana Tech University

clutteredreverandΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

79 εμφανίσεις


Page
1




CSC 532











Enterprise database management

Pranav Raj Sharma



Page
2


D
atabase m
anagement system is the backbone of most
companies

from small size to large size

that
handle their sales, inventory, day to day transaction, order history,
purchase, financial data
and many more.
An enterprise DBMS is designed to manage large number of users,
huge

database
s,

and to integrate
&

run multiple types of software applications. E
nterprise DBMS
offers several
important features including a
multi
-
processor

support, parallel
processing,
distributed transactions
, concurrency control,
data security,
high
scalability,
and high availability
features such as clustering and replications.
E
nterprise database management system can directly
communicate with frontend application such as SAP, Oracle E
-
business, J.D Edwards, Dynamics
AX,
Baan

and so forth to perform
daily
operation of
many small to
large
scale
organization
s
.
Some of largest
DBMS
include Oracle, Microsoft SQL

Server, Informix, DB2,
and
Sybase
.
Today
Oracle and
Microsoft
SQL

Server

are two of the l
arge
st

DBMS
vendors

competing for
market share

o
f medium to large organization. Therefore, database is one of the most crucial
entity of

software engineering.

Enterprise database management ensures database quality, and performance which serves both
internal and external users. Continuous and growing data volumes, coupled with the legal and
regulatory requirements to save data are impacti
ng application performance and straining
resources of the organizations.

In a typical database management system actual data
base

is stored in a disk.
In order to ensure
fault tolerance and higher performance various
Redundant Array of Independent Disks
-

R
AID
level are used in software

engineering

industry. RAID can be defined as redundant array of
independent disk or redundant array of inexpensive disk. Some of the basic Raid level that is
used in EDBMS is listed below.

1.

Raid 0: It offers block striping
or data striping
meaning data will be spread across various
block in a given disk which promotes faster reads
/write due to parallel processing. But

Page
3


there is no fault tolerance due to lack of redundancy meaning if a disk which contains
database fails, data
will be lost permanently.

2.

Raid 1: It offers redundancy of disk or mirroring but there is no data striping. Due to
redundancy feature, it offers faster read but slower write because writing involves 2
writing in 2 disks at the same time. On a positive note

it offers fault tolerance due to
mirroring or redundant disk feature.

3.

Raid 0+1

or Raid 10
: It offers fault tolerance and mirroring. It offers excellent
performance due to parallel reads and writes. It also offers better durability due to its
mirroring fe
ature.

4.

Raid5: Just like Raid 0+1, Raid5 offers both fault tolerance and
faster read/write access
.
It involves
parity blocks distributed over all disk.

Most industry EDBMS recommends that we use RAID 10 or at least Raid 5 for parallel
transaction processi
n
g and fault tolerance purposes especially for production environment.

In a typical
DBMS, main memory/ buffer
stores current data in use. Likewise
, the actual

databases are stored in disk sub system

and

t
he archived data & backup files are stored in offsi
te
tape.

Most Enterprise level Database Management system can do disk space and buffer
management. Data must be in RAM (buffer) for DBMS to operate on it.
Most transactions
retrieve their data from main memory or buffer pool where the data pages are
cach
ed & flushed
out on periodic basis
. Buffer pool acts as a me
diator between disk data and transaction manager
.
Because reading and writing data to & from disk can be very expensive, data pages stays in main
memory or buffer pool for better access.
Therefore
, the larger the server memory, the better the
faster the transaction will be. The size of main memory/ Ram is dictated by size of databases on
the server, the amount of transactions
throughput
, and the size of data pages that gets stored on
the buffer poo
l. The
amount of
data
pages which get

stored in buffer pool is called buffer cache.
Therefore if the buffer cache is larger than main memory or the server RAM, data pages will get

Page
4


flushed out of memory very quickly

which will lead to expensive query and lo
wer page life
expectancy of database pages
.
Low P
age life expectancy

is a good indicator of
memory pressure.
Page life expectancy can be defined as the amount of time data page stays on buffer pool before
it gets flushed from the memory. In Microsoft SQL S
erver, if the
average
Page life Expectancy
(PLE) is
consistently
less than 300,
they recommend that we either increase the server memory
or increase query efficiency
in order to alleviate memory pressure
.


All Enterprise DBMS have database object called Index which
is
basically
an
underlining
structure in

a database that determines how data is stored and retrieved
in a database
thereby
expediting the query processing. Indexes are one of the most powerful too
ls for database
administrator when it comes to troubleshooting slowness and performance tuning. When rows on
indexes are not sorted such tables are called heap. The problem with heap table structure is it
involves table scan for query processing.

If a tabl
e is small, table scan is not very costly.
However, if a table is substantially large, table scan will lead to high query wait time, blocking
locks and higher disk I/O which will all lead to system slowness.

What indexing does is it not
only sort the data
base table but also avoid table scan
by searching/ filtering records in a table
based on user’s query.


Most commercial Database Management Systems have
two major types of index: Clustered
Index and Non
-
Clustered Index. A database table can have only one
Clustered index. This is
because in clustered index, the leaf node contains the actual physical data. So, the Clustered
index will map to the physical data pages in tables. In non
-
clustered index, the logical structure
of index is different from the actual

physical structure of the data on tables. Therefore, each table
can have several non
-
clustered indexes as opposed to single clustered index per table. In
enterprise Database Management system, most indexes come with databases that are generally

Page
5


created by

ERP applications. However, during a course of time,
as
more and more users are
connected to databases and they run different queries/reports, new indexes must be added to
satisfy user’s query.


Clustered indexes are typically created on one key column or

column used frequently such as

one that is referenced by where clause. Generally speaking clustered indexes are suitable for

range queries where large amount of data is queried. Also query that uses order by or group by

clause can largely benefit from clu
stered indexes. Moreover, clustered index can be more

effective when data is accessed sequentially/frequently or when data needs to be sorted

because data is automatically sorted when clustered index is added to a table and doesn’t have

to be re
-
sorted. Qu
eries that involve small data set will benefit from creation of non
-
clustered
index. Nonclustered indexes can be added to address queries that are not covered by clustered
index. For a given table, you can have only one clustered index at most. As a good p
ractice,
clustered index should be created before adding non
-
clustered indexes on a table. This is because
Nonclustered index relies on location of index data within the clustered table

(Source: Database
Management Systems
-

Third Edition: Author RamaKrishn
an, Gehrke)
.


In
a database world
,
transaction is an atomic unit of work.
Transactions are called queries when
they request read only access. In other words, read only transaction is called query. Multiple
concurrent
transaction can interfere each other when they are both connected to the same
database resource performing either read or write operation. Generally speaking when 2 or more
concurrent
transactions involve read operation from the same table; it should not r
esult in
conflict. However, when there is at least one read transaction and multiple write transaction
trying to access same resource or multiple transactions trying to
modify

the

same table, it result
s

in conflict. In order to maintain isolation level and

eliminate transaction conflict
,

each database

Page
6


management system has feature called
lock escalation and
concurrency control
which basically
regulate

transactions in order to reduce conflicts.
When multiple
concurrent
read transactions
access
same

table in
a database they will
use Shared lock (SLOCK). Likewise, when there is
multiple read/write transactions, they will each transaction will use exclusive lock in order to
maintain integrity of database.

Concurrency control can be applied at various
levels tha
t include

filed level, record level, page level, extent level, file
level,

and database level. In SQL Server
there is 3 type of row versioning to reduce transaction conflicts which are: 1. Read committed
isolation level 2. Read committed snapshot isolation

level 3. Snapshot isolation level.


Regardless of which
vendors

you
use for managing your data
,
any Enterprise Database
Management system must guarantee the

following
four
fundamental properties
of transactions
known as
ACID properties

ad defined below
.

1.

Atomicity: Transaction is all or nothing. Transaction will both succeed to completion and
commit the transaction. Else it will rollback if the transaction is incomplete or
unsuccessful. The atomicity property of transaction is to ensure that integrity of a

database.


2.

Consistency: Database Management System must take a transaction from one consistent
state to another. Transaction should not be in intermediate state. To ensure consistency
and integrity of a database, there are various constraint such as prima
ry key constraint,
not null constraint, check constraint, referential constraint that includes primary &
foreign key.



Page
7


3.

Isolation: EDBMS should give illusion to its user as being sole user of a database. In an
enterprise environment, hundreds of thousands
of user are connected at the same time.
These users should not interfere with one another.


4.

Durability: In Enterprise database Management system, data once committed and written
to disk must stay permanent unless somebody deletes them on purpose. In othe
r words,
data in database should stay permanent unless somebody deletes them.

Most
Enterprise

Database Management vendors such as IBM, Informix, Oracle, SQL Server uses

a standard relational query language called SQL. SQL is a structured query language or
iginally
developed by IBM in the 1970’s which currently uses the latest SQL
-
99 standard which
supports procedural construct (such as if, then, else) and Object Oriented constructs such as
inheritance, polymorphism.

A typical database management system ha
s following layers described below.

1.

Query optimization & execution layer

Produces

most

efficient query execution plan based on user queries.

2.

Database Operator layer

Implements physical data model operators such as relational operators; select, project,
jo
in.

3.

Buffer management

Deals with buffer and manages how databases access it. So basically buffer management
layer partitions the main memory allocated to the DBMS into buffer page frame

and
brings data pages to and from disk as requested by file manager.


4.

Disk Space Management


Page
8


Supports the file concept to higher layers and supports access paths to the data in those
files such as indexes.


In today’s commercial world the colorful front end by itself wouldn't make any sense
without the robust backend

database
.
The advancement in database has made our lives
easier.


References:

1. RamaKrishnan, Gehrke, “Database Management Systems,” Third Edition, 2009.

2.
Microsoft Corporation

http://msdn.microsoft.com/en
-
us/library/ms174173.aspx


3.
University of Pennsylvania

http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf


4.
Microsoft Corporation

http://www.microsoft.com/sqlserver/en/us/default.aspx


6. Wikipedia

http://en.wikipedia.org/wiki/Enterprise_database_management

http://en.wikipedia.org/wiki/RAID


5.
Database Systems: The Complete Book, Stanford University:

http://infolab.stanford.edu/~ullman/dscb.html