ppt

righteousgaggleΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 8 μήνες)

249 εμφανίσεις

©Silberschatz, Korth and Sudarshan

23.
1

Database System Concepts, 5
th

Ed.

1

Chapter 23:

Advanced Application Development

©Silberschatz, Korth and Sudarshan

23.
2

Database System Concepts, 5
th

Ed.

2



Chapter 1: Introduction


Part 1: Relational databases


Chapter 2: Relational Model


Chapter 3: SQL


Chapter 4: Advanced SQL


Chapter 5: Other Relational Languages


Part 2: Database Design


Chapter 6: Database Design and the E
-
R Model


Chapter 7: Relational Database Design


Chapter 8: Application Design and Development


Part 3: Object
-
based databases and XML


Chapter 9: Object
-
Based Databases


Chapter 10: XML


Part 4: Data storage and querying


Chapter 11: Storage and File Structure


Chapter 12: Indexing and Hashing


Chapter 13: Query Processing


Chapter 14: Query Optimization


Part 5: Transaction management


Chapter 15: Transactions


Chapter 16: Concurrency control


Chapter 17: Recovery System


Database System Concepts


Part 6: Data Mining and Information Retrieval



Chapter 18: Data Analysis and Mining



Chapter 19: Information Retreival


Part 7: Database system architecture


Chapter 20: Database
-
System Architecture


Chapter 21: Parallel Databases


Chapter 22: Distributed Databases


Part 8: Other topics


Chapter 23: Advanced Application Development


Chapter 24: Advanced Data Types and New Applications


Chapter 25: Advanced Transaction Processing


Part 9: Case studies


Chapter 26: PostgreSQL


Chapter 27: Oracle


Chapter 28: IBM DB2


Chapter 29: Microsoft SQL Server


Online Appendices


Appendix A: Network Model


Appendix B: Hierarchical Model


Appendix C: Advanced Relational

Database Model




©Silberschatz, Korth and Sudarshan

23.
3

Database System Concepts, 5
th

Ed.

3

Part 8: Other topics

(Chapters 23 through 25).


Chapter 23: Advanced Application Development


covers
performance benchmarks, performance tuning and standardization
.



Chapter 24: Advanced Data Types and New Applications


covers advanced data types and new applications, including
temporal data,
spatial and geographic data, multimedia data
, and issues in the management
of
mobile and personal databases
.



Chapter 25: Advanced Transaction Processing


deals with advanced transaction processing. We discuss
transaction
-
processing monitors, high
-
performance transaction systems, real
-
time
transaction systems, and transactional workflows.

©Silberschatz, Korth and Sudarshan

23.
4

Database System Concepts, 5
th

Ed.

4

Table of Contents


23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


©Silberschatz, Korth and Sudarshan

23.
5

Database System Concepts, 5
th

Ed.

5

DBMS

Application





Monitoring DBMS

using
Trigger

Limited Access

by
Authorization


WEB Interface

:
Servlet

JSP



Protecting DBMS

by
Security


User

User

Web Browser



Ch 8: Application Design & Development


Ch 23: Advanced Application Development


Performance


Standardization


Application Migration

©Silberschatz, Korth and Sudarshan

23.
6

Database System Concepts, 5
th

Ed.

6

Performance Tuning


Adjusting
various parameters

and
design choices

to improve system performance
for a specific application.


Tuning is best done by
identifying bottlenecks

&
eliminating them.


Can tune a database system at 3 levels:


Hardware


add disks to speed up I/O


add memory to increase buffer hits


move to a faster processor


Database system parameters


set buffer size to avoid paging of buffer


set checkpointing intervals to limit log size


system may have automatic tuning.


Higher level database design


such as the schema, indices and transactions (more later)

©Silberschatz, Korth and Sudarshan

23.
7

Database System Concepts, 5
th

Ed.

7

Bottlenecks


Performance of most systems (at least before they are tuned) usually limited by
performance of one or a few components


These are called
bottlenecks


E.g. 80% of the code may take up 20% of time and 20% of code takes up 80%
of time


Worth
spending most time on

20% of code

that take 80% of time


Finding bottlenecks is very important



Bottlenecks may be in hardware or in software


e.g. disks are very busy, CPU is idle



Removing one bottleneck often exposes another bottleneck


De
-
bottlenecking consists of repeatedly finding bottlenecks, and removing them


This is a heuristic

©Silberschatz, Korth and Sudarshan

23.
8

Database System Concepts, 5
th

Ed.

8

Identifying Bottlenecks


Transactions request a sequence of services


e.g. CPU, Disk I/O, locks


With concurrent transactions, transactions may have to wait for a requested service
while other transactions are being served


Can model database as a
queueing system

with a queue for each service



transactions repeatedly do the following


request a service


wait in queue for the service


get serviced



Bottlenecks in a database system typically show up as very
high utilizations

of a
particular service


Bottlenecks shows very long queues


E.g. Disk utilization vs CPU utilization


100% utilization leads to very long waiting time:


Rule of thumb:


design a system for
about 70% utilization at peak load


utilization over 90%

should be avoided

©Silberschatz, Korth and Sudarshan

23.
9

Database System Concepts, 5
th

Ed.

9

Finding Bottlenecks:

Queues in a

Database System

©Silberschatz, Korth and Sudarshan

23.
10

Database System Concepts, 5
th

Ed.

10

Tunable Parameters


The lowest level is at
the hardware level


Adding disks or using a RAID system


Adding more memory


Moving to a faster processor



The second level consists of
the DBMS parameters


Buffer size and Checkpointing intervals



The third level is
the highest level


Schema and Transactions


Indices and Materialized views

©Silberschatz, Korth and Sudarshan

23.
11

Database System Concepts, 5
th

Ed.

11

Tuning of Hardware


Even well
-
tuned transactions typically require a few I/O operations


Typical disk supports about 100 random I/O operations per second


Suppose each transaction requires just 2 random I/O operations.


Then to support
n

transactions per second
, we need to stripe data across
n
/50 disks

(ignoring skew)


If 1000 transactions / 1 sec is required, 20 disks are needed



Number of I/O operations per transaction can be reduced by keeping more data
in memory


If all data is in memory, I/O would be needed only for writes


Keeping frequently used data in memory reduces disk accesses, reducing
number of disks required, but has
a memory cost


©Silberschatz, Korth and Sudarshan

23.
12

Database System Concepts, 5
th

Ed.

12

Hardware Tuning: Memory


Question:
which data to keep in memory
:


The disk cost for 1 I/0 per second is



price
-
per
-
disk
-
drive


accesses
-
per
-
second
-
per
-
disk


Reducing 1 I/O saves this cost


If a page is accessed
n

times per second, the value of keeping
the page

in
memory can be thought of as much as the disk cost for
n

I/Os per second



n

* price
-
per
-
disk
-
drive


accesses
-
per
-
second
-
per
-
disk


Cost of keeping
a page

in memory (
buying more memory
)



price
-
per
-
megabyte
-
of
-
memory


pages
-
per
-
megabyte
-
of
-
memory



Break
-
even point: value of
n

for which
above two costs are equal


If accesses to the papge are more than
n

times


better to keep the page in
the memory (buying more memory)



saving by buying memory is greater than buying cost


©Silberschatz, Korth and Sudarshan

23.
13

Database System Concepts, 5
th

Ed.

13

Hardware Tuning: 5
-
Min & 1
-
Min Rule


Solving above equation with current disk and memory prices leads to:


5
-
minute rule:

if a page that is randomly accessed is used more frequently
than once in 5 minutes


楴⁳桯畬搠扥b灴p楮 浥浯特
(by buying sufficient
memory!)



For sequentially accessed data, more pages can be read per second


Assuming sequential reads of 1MB of data at a time:


1
-
minute rule
:
sequentially accessed data that is accessed once or more in
a minute should be kept in memory
(by buying sufficient memory!)



Prices of disk and memory have changed greatly over the years, but
the ratios
have not changed much


rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!


©Silberschatz, Korth and Sudarshan

23.
14

Database System Concepts, 5
th

Ed.

14

Hardware Tuning: Choice of RAID Level


To use
RAID 1

or
RAID 5
?



Depends on ratio of reads and writes


RAID 5 requires 2 block reads and 2 block writes to write out one data block


If an application requires
r
reads and
w

writes per second


RAID 1 requires
r + 2w

I/O operations per second


RAID 5 requires:
r + 4w

I/O operations per second


For reasonably large r and w, this requires lots of disks to handle workload


RAID 5 may require more disks than RAID 1 to handle load!


Apparent saving of number of disks by RAID 5 (by using parity, as opposed to
the mirroring done by RAID 1) may be illusory!


Thumb rule
: RAID 5 is fine
when writes are rare and data is very large
, but RAID 1 is
preferable otherwise


If you need more disks to handle I/O load, just mirror them since disk capacities
these days are enormous!

©Silberschatz, Korth and Sudarshan

23.
15

Database System Concepts, 5
th

Ed.

15

Tuning of the Schema


Vertically partition relations

to isolate the data that is accessed most often
--

only
fetch needed information.


E.g., split

account

(account
-
number, branch
-
name, balance) into two relations,
(
account
-
number
,

branch
-
name
) and (
account
-
number
,
balance
).


Branch
-
name need not be fetched unless required


Improve performance by storing a
denormalized relation



E.g., store join of
account

relation and

depositor

relation; branch
-
name and
balance information is repeated for each holder of an account, but join need not
be computed repeatedly.


Price paid: more space and more work for programmer to keep relation
consistent on updates


better to use materialized views (more on this later..)


Cluster together

on the same disk page
records

that would match in a frequently
required join,


compute join very efficiently when required.

©Silberschatz, Korth and Sudarshan

23.
16

Database System Concepts, 5
th

Ed.

16

Tuning of Indices


Index tuning


Tradeoff between queries and updates


Create appropriate indices to speed up slow queries/updates


Speed up slow updates by removing excess indices


Choose type of index (B
-
tree / Hash) appropriate for most frequent queries


Choose which index to make clustered


Index tuning wizards (SW Tool)


Look at past history of queries and updates (the
workload
)


Recommend which indices would be best for the workload

©Silberschatz, Korth and Sudarshan

23.
17

Database System Concepts, 5
th

Ed.

17

Tuning using Materialized Views


Materialized views can help speed up certain queries


Particularly aggregate queries


Overheads


Space


Time for view maintenance


Immediate view maintenance
: done as part of update transaction



time overhead paid by update transaction


Deferred view maintenance
: done only when required


update transaction is not affected, but system time is spent on view
maintenance

»
until updated, the view may be out
-
of
-
date


Preferable to denormalized schema since view maintenance is systems
responsibility, not programmers


Avoids inconsistencies caused by errors in update programs

©Silberschatz, Korth and Sudarshan

23.
18

Database System Concepts, 5
th

Ed.

18

Tuning using Materialized Views (Cont.)


How to choose a set of materialized views


Helping one transaction type by introducing a materialized view may hurt others


Choice of materialized views depends on costs


Users often have no idea of actual cost of operations


Overall, manual selection of materialized views is tedious



Some database systems provide tools to help DBA choose views to materialize



Materialized view selection wizards



©Silberschatz, Korth and Sudarshan

23.
19

Database System Concepts, 5
th

Ed.

19

Automated Tuning of Physical Design


Automated Tools


Index selection


Materialized View selection


Partition data in a parallel database system


Users specify information about
the size of database and related statistics


MicroSoft’s Database Tuning Assistant


Allow users to pose “what
-
if” questions for selecting materialized view



Workload compression


Generate workload using a small number of queries and updates


Greedy heuristics

and other various techniques

©Silberschatz, Korth and Sudarshan

23.
20

Database System Concepts, 5
th

Ed.

20

Tuning of Transactions


Basic approaches to tuning of transactions


Improve set orientation


Reduce lock contention


Split a large transaction


Rewriting of queries to improve performance was important in the past, but smart
optimizers have made this less important



Improving set orientation



Communication and query handling overhead of each call are significant


Set orientation


fewer calls to database


Combine

multiple embedded SQL/ODBC/JDBC queries
into a single set
-
oriented query


E.g. tune program that computes total salary for each department using a
separate SQL query by instead using a single query that computes total
salaries for all department at once (using
group by)



Use stored
-
procedures
: avoids re
-
parsing and re
-
optimization of query

©Silberschatz, Korth and Sudarshan

23.
21

Database System Concepts, 5
th

Ed.

21

Tuning of Transactions (Cont.)


Reducing lock contention


Long transactions (typically read
-
only) that examine large parts of a relation
result in
lock contention

with update transactions


E.g. large query to compute bank statistics and regular bank transactions


Use
multi
-
version concurrency control


E.g. Oracle “snapshots” which support multi
-
version 2PL


Use
degree
-
two consistency (cursor
-
stability) for long transactions


Drawback: result may be approximate

©Silberschatz, Korth and Sudarshan

23.
22

Database System Concepts, 5
th

Ed.

22

Tuning of Transactions (Cont.)


Long update transactions

cause several problems


Exhaust lock space


Exhaust log space


Also greatly increase recovery time after a crash


may exhaust log space even more during recovery if recovery algorithm is
badly designed!



If a single large transaction updates every record of a very large relation,


log may grow too big


Split large transaction into
a batch of ``mini
-
transactions'‘


Each mini transaction is suppose do perform a limited number of the updates


Hold locks
across transactions

in a mini
-
batch to ensure serializability


If lock table size is a problem


can release locks, but at the cost of
serializability


In case of failure during a mini
-
batch


must complete
its remaining portion

on recovery, to ensure atomicity


support the concept of nested transaction

©Silberschatz, Korth and Sudarshan

23.
23

Database System Concepts, 5
th

Ed.

23

Performance Simulation


Performance simulation

using queuing model


useful to predict bottlenecks as well as the effects of tuning changes,


even without access to real system


Queuing model as we saw earlier


Models activities that go on in parallel


Simulation model is quite detailed, but usually omits some low level details


Model
service time
, but disregard details of service


E.g. approximate disk read time by using an average disk read time


Experiments can be run on model, and provide an estimate of measures such as
average throughput/response time



After simulation, parameters can be tuned in model and then replicated in real system


E.g. number of disks, memory, algorithms, etc


©Silberschatz, Korth and Sudarshan

23.
24

Database System Concepts, 5
th

Ed.

24

Table of Contents


23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


©Silberschatz, Korth and Sudarshan

23.
25

Database System Concepts, 5
th

Ed.

25

Performance Benchmarks


Suites of tasks

used to quantify the performance of software systems


single task not enough for complex systems


Important in comparing database systems, especially as systems become
more standards compliant.


Commonly used performance measures:


Throughput

(transactions per second, or TPS)


Response time

(delay from submission of transaction to return of result)


Availability

or mean time to failure (MTTF)


Need to beware when computing
average throughput

of different transaction types


Suppose that a system
A

runs transaction type T1 at 99 tps and transaction
type T2 at 1 tps and another system
B

runs both T1 and T2 at 50 tps.


The idea of simple averaging TPS is wrong!


equal performance of two systems A and B (50 tps)!


If we ran 50 transactions of each type


System A: T1 (0.01 sec), T2 (1 sec)


need 50.5 secs


System B: T1 (0.02 sec), T2 (0.02)


need 2 secs

©Silberschatz, Korth and Sudarshan

23.
26

Database System Concepts, 5
th

Ed.

26

Performance Benchmarks (Cont.)


Instead,
harmonic mean of n throughputs t1, t2, … tn
:





n




system A (T1: 99 tps. T2: 1 tps), system B (T1: 50 tps. T2: 50 tps)


The harmonic mean for system

A

is 1.98 ( 2 / ((1/99) + (1/1)) ) while for system
B
, it is 50 (2 / ((1/ 50) + (1 / 50)) )


System B is 25 times faster than system A



Interference

(e.g. lock contention) makes even this incorrect if different transaction
types run concurrently

1/t
1

+ 1/t
2
+ … + 1/t
n

©Silberschatz, Korth and Sudarshan

23.
27

Database System Concepts, 5
th

Ed.

27

Database Application Classes


Online transaction processing (OLTP)



requires high concurrency and clever techniques to speed up commit
processing, to support
a high rate of update transactions
.


Decision support applications



including
online analytical processing, or OLAP

applications


require good query evaluation algorithms and
query optimization
.



Architecture of some database systems tuned to one of the two classes


E.g.
Teradata DBMS

is tuned to decision support


Others try to balance the two requirements


E.g.
Oracle, with snapshot

support for long read
-
only transaction


©Silberschatz, Korth and Sudarshan

23.
28

Database System Concepts, 5
th

Ed.

28

Transaction Benchmarks Suites


The Transaction Processing Council (TPC) benchmark suites

are widely used.


TPC
-
A

and
TPC
-
B
: simple OLTP application modeling
a bank teller application

with and without communication


Not used anymore


TPC
-
C
: complex OLTP application modeling
an inventory system


Current standard for OLTP benchmarking


TPC
-
D
: complex
decision support application


Superceded by TPC
-
H and TPC
-
R


TPC
-
H:

(H for ad hoc) based on TPC
-
D with some extra queries


Models
ad hoc queries

which are not known beforehand


Total of 22 queries with emphasis on aggregation


prohibits materialized views


permits indices only on primary and foreign keys


TPC
-
R:

(R for
reporting
) same as TPC
-
H, but without any restrictions on
materialized views and indices


TPC
-
W
: (W for Web)
End
-
to
-
end Web service

benchmark modeling a Web
bookstore, with combination of static and dynamically generated pages

©Silberschatz, Korth and Sudarshan

23.
29

Database System Concepts, 5
th

Ed.

29

TPC Performance Measures


TPC performance measures


transactions
-
per
-
second

with specified constraints on response time


transactions
-
per
-
second
-
per
-
dollar

accounts for cost of owning system



TPC benchmark requires database sizes to be scaled up with increasing
transactions
-
per
-
second


reflects real world applications where
more customers

means more database
size and more transactions
-
per
-
second



External audit of TPC performance numbers is
mandatory


TPC performance claims can be trusted

©Silberschatz, Korth and Sudarshan

23.
30

Database System Concepts, 5
th

Ed.

30

Sample TPC Performance Measures


Two types of tests for TPC
-
H (ad hoc) and TPC
-
R (report)


Power metric test


runs queries and updates
sequentially


then takes mean to find queries per hour


Throughput metric test


runs queries and updates
concurrently


multiple streams running in parallel: each stream generates queries, with
one parallel update stream


the total time of the entire run


Composite “query per hour” metric


overall metric


square root of ( power metric X throughput metric )



Composite “price vs. performance” metric


system price / composite metric

©Silberschatz, Korth and Sudarshan

23.
31

Database System Concepts, 5
th

Ed.

31

Other Benchmarks


OODB transactions require a different set of benchmarks.


Reason: hard to define what is a typical OODB application


The object operational benchmark, version 1:
001 benchmark


OO7 benchmark

has several different operations


provides a separate benchmark number for each kind of operation


Traversing a connected objects


Retrieving all objects in a class



Benchmarks for XML databases are now being discussed


©Silberschatz, Korth and Sudarshan

23.
32

Database System Concepts, 5
th

Ed.

32

Table of Contents


23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


©Silberschatz, Korth and Sudarshan

23.
33

Database System Concepts, 5
th

Ed.

33

Standardization


The
complexity

of contemporary software systems and the need for their
interoperation

require a variety of standards.


syntax and semantics of programming languages


functions in application program interfaces


data models (e.g. object
-
oriented / object
-
relational databases)




Formal standards



De facto standards


Anticipatory standards


Reactionary standards



©Silberschatz, Korth and Sudarshan

23.
34

Database System Concepts, 5
th

Ed.

34

Standardization (Cont.)


Formal standards

are standards developed by a standards organization (ANSI,
ISO), or by industry groups, through a public process.


De facto standards

are generally accepted as standards without any formal
process of recognition


Standards defined by dominant vendors (IBM, Microsoft) often become de
facto standards


De facto standards often go through a formal process of recognition and
become formal standards


Anticipatory standards

lead the market place, defining features that vendors then
implement


Ensure compatibility of future products


But at times become very large and unwieldy since standards bodies may not
pay enough attention to ease of implementation (e.g.,SQL
-
92 or SQL:1999)


Reactionary standards

attempt to standardize features that vendors have already
implemented, possibly in different ways.


Can be hard to convince vendors to change already implemented features.
E.g. OODB systems

©Silberschatz, Korth and Sudarshan

23.
35

Database System Concepts, 5
th

Ed.

35

SQL Standards History


SQL developed by IBM in
late 70s

/
early 80s


SQL
-
86
: The first formal standard


IBM SAA standard for SQL in 1987


SQL
-
89

added features to SQL
-
86 that were already implemented in many
systems


SQL
-
89 was a reactionary standard


SQL
-
92
added many new features to SQL
-
89 (anticipatory standard)


Defines levels of compliance (
entry, intermediate and full
)


Even now few database vendors have full SQL
-
92 implementation


SQL 1999


Adds variety of new features
---

extended data types, object orientation,
procedures, triggers, etc.


SQL 2003


©Silberschatz, Korth and Sudarshan

23.
36

Database System Concepts, 5
th

Ed.

36

SQL Standards History (Cont.)


The SQL: 2003 Standard


Part 1
: SQL/Framework; overview


Part 2
: SQL/Foundation; types, schemas, DDL, DML, security, etc


Part 3
: SQL/CLI (Call Level Interface); API interface


Part 4
: SQL/PSM (Persistent Stored Modules); procedural extensions


Part 9:

SQL/MED (Management of External Data)


Interfacing of database to external data sources


other databases and even files can be viewed as part of the database


Part 10:

SQL/OLB (Object Language Bindings); embedding SQL in Java


Part 11
: SQL Schemata(Information and Definition Schema) defines a
standard catalog interface


Part 13
: SQL/JRT (Java Routines and Types) defines standards for accessing
routines and types in Java


Part 14
: SQL/XML defines XML
-
related specifications


Missing parts 5,6,7, 8, 12 cover features that are not near standardization yet


SQL/Bindings, Temporal Data, Distributed Transaction, Multimedia Data

©Silberschatz, Korth and Sudarshan

23.
37

Database System Concepts, 5
th

Ed.

37

Database Connectivity Standards


Open DataBase Connectivity

(
ODBC
) standard for
database interconnectivity


based on
Call Level Interface

(CLI)

developed by X/Open consortium and the
SQL Access Group


defines
application programming interface (API),

and SQL features that must
be supported at different levels of compliance


JDBC

standard used for interconnectivity between Java and Database


X/Open XA


standards by X/Open consortium


define transaction management standards for supporting distributed 2PC


OLE
-
DB
by Microsoft


API like ODBC, but to support
non
-
db sources

such as flat files or email stores


OLE
-
DB program can negotiate with data source to find what features are
supported


Interface language may be a subset of SQL


ADO (Active Data Objects)

by Microsoft


Easy
-
to
-
use interface to
OLE
-
DB functionality


Can be called from scripting languages such as VBScript and Jscript


ADO.NET API is for application in the .NET languages such as C# and Visual
Basic.NET

©Silberschatz, Korth and Sudarshan

23.
38

Database System Concepts, 5
th

Ed.

38

Xopen / XA
작동

그림예제


©Silberschatz, Korth and Sudarshan

23.
39

Database System Concepts, 5
th

Ed.

39

OLE
-
DB & ADO
작동

그림예제


©Silberschatz, Korth and Sudarshan

23.
40

Database System Concepts, 5
th

Ed.

40

Object Oriented Databases Standards


Object Database Management Group (ODMG
) standard for OODB


version 1 in 1993, version 2 in 1997, version 3 in 2000


provides
language independent

Object Definition Language
(ODL)

as well as
several
language
-
specific
bindings



Object Management Group (OMG)

standard for distributed SW based on objects


Object Request Broker (ORB
)

provides transparent message dispatch to
distributed objects


Interface Definition Language (IDL) for defining language
-
independent data types


Common Object Request Broker Architecture (CORBA) defines specifications of
ORB and IDL


©Silberschatz, Korth and Sudarshan

23.
41

Database System Concepts, 5
th

Ed.

41

CORBA
작동

그림예제


©Silberschatz, Korth and Sudarshan

23.
42

Database System Concepts, 5
th

Ed.

42

XML
-
Based Application Standards


Several XML based Standards

for E
-
commerce


E.g.
RosettaNet
(supply chain),

BizTalk


Define catalogs, service descriptions, invoices, purchase orders, etc.


XML wrappers are used to export information from relational databases to XML



Simple Object Access Protocol (SOAP):


XML based remote procedure call (RPC) standard


Uses XML to encode data (both parameters and results)


HTTP as transport protocol (procedure call becomes HTTP request)


Endorsed by W3C


Popular in B2B E
-
commerce


Standards based on SOAP for specific applications


E.g. OLAP and Data Mining standards from Microsoft



DOM & SAX API are for
general programming language

to connect to
XML Data


©Silberschatz, Korth and Sudarshan

23.
43

Database System Concepts, 5
th

Ed.

43

SOAP
작동

그림예제


©Silberschatz, Korth and Sudarshan

23.
44

Database System Concepts, 5
th

Ed.

44

Table of Contents


23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


©Silberschatz, Korth and Sudarshan

23.
45

Database System Concepts, 5
th

Ed.

45

Legacy Systems


Legacy systems are older
-
generation systems that are incompatible with current
generation standards and systems but still
in production use


E.g. applications written in Cobol that run on mainframes


Today’s hot new system is tomorrows legacy system!


Porting legacy system applications to a more modern environment is problematic


Rewriting code



Very expensive, since legacy system may involve millions of lines of code,
written over decades


Original programmers usually no longer available


Switching over from old system to new system is a problem


Big
-
Bang Approach


Chicken
-
little Approach



One approach: build a
wrapper

layer
on top of legacy application

to allow
interoperation between newer systems and legacy application


E.g. use ODBC or OLE
-
DB as wrapper

©Silberschatz, Korth and Sudarshan

23.
46

Database System Concepts, 5
th

Ed.

46

Legacy Systems (Cont.)


Rewriting legacy application requires a first phase of understanding what it does


Often legacy code has no documentation or outdated documentation


reverse engineering
:
process of going over legacy code to


Come up with schema designs in ER or OO model


Find out what procedures and processes are implemented, to get a high
level view of system


Re
-
engineering
: reverse engineering followed by design of new system


Improvements are made on existing system design in this process

©Silberschatz, Korth and Sudarshan

23.
47

Database System Concepts, 5
th

Ed.

47

Migrating to a New System


Switching over from old to new system is a major problem


Production systems are in every day, generating new data


Stopping the system may bring all of a company’s activities to a halt, causing
enormous losses


Big
-
bang approach
:

1.
Implement complete new system

2.
Populate it with data from old system

1.
No transactions while this step is executed

2.
scripts are created to do this quickly

3.
Shut down old system and start using new system


Danger with this approach
: what if new code has bugs or performance
problems, or missing features


Company may be brought to a halt


©Silberschatz, Korth and Sudarshan

23.
48

Database System Concepts, 5
th

Ed.

48

Migrating to a New System (cont.)


Chicken
-
little approach
:


Replace legacy system one piece at a time


Use wrappers to interoperate between legacy and new code


E.g. replace front end first, with wrappers on legacy backend


Old front end can continue working in this phase in case of problems
with new front end


Replace back end, one functional unit at a time


All parts that share a database may have to be replaced together, or
wrapper is needed on database also


Drawback
: significant extra development effort to build wrappers and ensure
smooth interoperation


Still worth it if company’s life depends on system



©Silberschatz, Korth and Sudarshan

23.
49

Database System Concepts, 5
th

Ed.

49

Table of Contents


23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


©Silberschatz, Korth and Sudarshan

23.
50

Database System Concepts, 5
th

Ed.

50

Ch 21: Summary (1)


The Web browser has emerged as the most widely used user interface to
databases.


HTML provides the ability to define interfaces that combine hyperlinks with
forms facilities.


Web browsers communicate with Web servers by the HTTP protocol.


Web servers can pass on request to application programs, and return the
results to the browser.


There are several client
-
side scripting languages
-

Java script is the most
widely used
-
that provide richer user interaction at the browser end.


Web servers execute applications programs to implement desired functionality.


Servlets are a widely used mechanism to write application programs that
run as part of the Web server process, in order to reduce overheads.


There are also many server
-
side scripting languages that are interpreted by
the Web server and provide application program functionality as part of
Web server.

©Silberschatz, Korth and Sudarshan

23.
51

Database System Concepts, 5
th

Ed.

51

Ch 21: Summary (2)


Tuning of the database (system parameters, as well as the higher
-
level
database design
-
such as the schema, indices, and transactions) is important
for good performance.


Tuning is best done by identifying bottlenecks and eliminating them.


Performance benchmarks play an important role in comparisons of database
systems, especially as systems become more standards compliant.


The TPC benchmark suites are widely used, and the different TPC
benchmarks are useful for our comparing the performance of databases
under different workloads.


Standards are important because of the complexity of database systems and
their need for interoperation.


Formal standards exist for SQL. Defacto standards, such as ODBC and
JDBC, and standard adopted by industry groups, such as CORBA, have
played an important role in the growth of client
-
server database systems.


Standards for object
-
oriented databases, such as ODMG, are being
developed by industry groups.


©Silberschatz, Korth and Sudarshan

23.
52

Database System Concepts, 5
th

Ed.

52

Ch 21: Summary (3)


E
-
commerce systems are fast becoming a core part of how commerce is performed.


There are several database issues in e
-
commerce systems. Catalog
management, especially personalization of the catalog, is done with the
databases.


Electronic marketplaces help in pricing of products through auctions, reverse
auctions, or exchanges.


High
-
performance database systems are needed to handle such trading.


Orders are settled by electronic payment systems, which also need high
-
performance database systems to handle very high transaction rates.



Legacy systems are systems based on older
-
generation technologies such as
nonrelational databases or even directly on file systems.


Interfacing legacy systems with new
-
generation systems is often important when
the run mission
-
critical systems.


Migrating from legacy systems to new
-
generation systems must be done
carefully to avoid disruptions, which can be very expensive.


©Silberschatz, Korth and Sudarshan

23.
53

Database System Concepts, 5
th

Ed.

53

Ch 21: Bibliographical Notes (1)


Information about servlets, including tutorials, standard specifications, and
software, is available on java.sun.com/products/servlet.



Information about JSP is available at java.sun.com/products/jsp.



An early proposal for a database
-
system benchmark (the Wisconsin benchmark)
was made by Bitton et al.[1983].



The TPC
-
A,
-
B, and

C benchmarks are described in Gray[1991].



An online version of all the TPC benchmarks descriptions, as well as benchmark
results, is available on the World Wide Web at the URL
www.tpc.org


the site also contains up
-
to
-
date information about new benchmark proposals.


O’Neil and O’Neil[2000] provides a very good textbook coverage of performance
measurement and tuning.


The five minute and one minute rules are described in Gray and Putzolu[1987]
and Gray and Graefe[1997].


Brown et al.[1994] describes an approach to automated tuning.

©Silberschatz, Korth and Sudarshan

23.
54

Database System Concepts, 5
th

Ed.

54

Ch 21: Bibliographical Notes (2)


Index selection and materialized view selection are addressed by Ross et al.[1996],
Labio et al. [1997], Gupta [1997], Chaudhuri and Narasayya [1997], Agrawa et al.
[2000] and Mistry et al. [2001].


Poess and Floyd[2000] gives an overview of the TPC
-
H, TPC
-
R, and TPC
-
W
benchmarks.



The OO1 benchmark for OODBs is described in Cattell and Skeen [1992]; the OO7
benchmark is described in Carey et al. [1993].



Kleinrock [1975] and Kleinrock [1976] is a popular two
-
volume textbook on queuing
theory.



Shasha [1992] provides a good overview of database tuning.


The American National Standard SQL
-
86 is described in ANSI[1986].



The IBM Systems Application Architecture definition of SQL is specified by IBM
[1987].



The standards for SQL
-
89 and SQL
-
92 are available as ANSI [1989] and ANSI
[1992] respectively.



For reference in the SQL:1999 standard, see bibliographical notes of Chapter 9.

©Silberschatz, Korth and Sudarshan

23.
55

Database System Concepts, 5
th

Ed.

55

Ch 21: Bibliographical Notes (3)


The X/Open SQL call
-
level interface is defined in X/Open [1993]; the ODBC API
is described in Microsoft [1997] and Sanders [1998].


The X/Open XA interface is defined in X/Open[1991]. More information about
ODBC, OLE
-
DB, and ADO can be found on the Web site
www.microsoft.com.data
, and in a number of books on the subject can be found
through
www.amazon.com
.


The ODMG 3.0 standard is defined in Cattell [2000]. ACM Sigmod Record, which
is published quarterly, has a regular section on standards in databases, including
benchmark standards.


A wealth of information on XML based standards is available online.


You can use a Web search engine such as Google to search for more up
-
to
-
date
information about the XML and other standards.



Loeb[1998] provides a detailed description of secure electronic transactions.


Business process reengineering is covered by Cook [1996]. Kirchmer [1999]
describes application implementation using standard software such as Enterprise
Resource Planning (ERP) packages.


Umar [1997] covers reengineering and issues in dealing with legacy systems.

©Silberschatz, Korth and Sudarshan

23.
56

Database System Concepts, 5
th

Ed.

56

Ch 21: Tools


There are many Web development tools that support database connectivity
through servlets, JSP, Javascript, or other mechanisms.



We list a few of the better
-
known ones here:


JAVA SDK from SUN (
java.sun.com
)


Apache’s Tomcat (
jakarta.apache.org
) and Webserver (
apache.org
)


IBM WebSphere (
www.software.ibm.com
)


Microsoft’s ASP tools (
www.microsoft.com
)


Allaire’s Coldfusion and JRun products (
www.allaire.com
)


Caucho’s Resin (
www.caucho.com
)


Zope (
www.zope.org
).




A few of these, such as Apache, are free for any use, some are free for
noncommercial use or for personal use, while others need to be paid for


See the respective Web sites for more information.

©Silberschatz, Korth and Sudarshan

23.
57

Database System Concepts, 5
th

Ed.

57

Table of Contents



23.1 Performance Tuning


23.2 Performance Benchmarks


23.3 Standardization


23.4 Application Migration


23.5 Summary


Database System Concepts

©Silberschatz, Korth and Sudarshan

See
www.db
-
book.com

for conditions on re
-
use

©Silberschatz, Korth and Sudarshan

Database System Concepts

58

End of Chapter