Relational Database Design and Performance Tuning for DB2 Database Servers

tribecagamosisΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

122 εμφανίσεις



Product Name Here


Tivoli Software

(via Subject
-

File>Prop>Summary)



IBM Confidential When Filled In



1


Relation
al

Database Design and
Performance Tuning

for DB2 Database
Servers

How to
document


Document Version: 1.0


Document Status:
<Draft >







IBM SWG
-

Tivoli Software

3901 S. Miami Blvd.

Durham, NC 27703





+1
(919) 224
-
1
598






+1
(919) 224
-
2
5
6
0


Issue Date:



Author:

Edward Bernal



Template version 4

NOTE: The hard copy versio
n of this document is FOR REFERENCE ONLY. Online version is the master

It is the responsibility of the user to ensure that they have the current version.

Any outdated hard copy is invalid and must be removed from possible use.

It is also the responsibil
ity of the user to ensure the completeness of this document prior to use.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
2

of
35



HOW TO DOCUMENT

................................
................................
................................
................................
..............................

1

1.

DOCUMENT CONTROL

................................
................................
................................
................................
...................

3

1.1

S
UMMARY OF
C
HANGES

................................
................................
................................
................................
..................

3

2.

INTRODUCTION

................................
................................
................................
................................
................................

4

2.1

D
EFINITIONS

................................
................................
................................
................................
................................
...

4

2.2

P
ERFORMANCE
F
ACTORS

................................
................................
................................
................................
................

5

2.2.1

Database Design

................................
................................
................................
................................
....................

5

2.2.2

Application Design

................................
................................
................................
................................
.................

5

2.2.3

Hardware Design and Operating System Usage

................................
................................
................................
....

5

3.

DATABASE DESIGN DETA
ILS

................................
................................
................................
................................
.......

5

3.1

K
EY
F
ACTORS

................................
................................
................................
................................
................................
.

6

3.1.1

Tablespaces

................................
................................
................................
................................
............................

6

3.1.2

Tables

................................
................................
................................
................................
................................
.....

7

3.1.3

Bufferpools

................................
................................
................................
................................
.............................

8

3.1.4

Logging

................................
................................
................................
................................
................................
..

9

3.1.5

Indexing

................................
................................
................................
................................
................................

10

3.2

D
ATABASE
M
AINTENANCE

................................
................................
................................
................................
...........

14

3.2.1

REORG

................................
................................
................................
................................
................................
.

14

3.2.2

RUNSTATS

................................
................................
................................
................................
...........................

15

3.2.3

REBIND

................................
................................
................................
................................
................................

16

4.

APPLICATION DESIGN D
ETAILS

................................
................................
................................
...............................

17

4.1

K
EY
F
ACTORS

................................
................................
................................
................................
...............................

17

4.1.1

Review application SQL for efficiencies *** IMPORTANT ***

................................
................................
..........

17

4.1.2

Concurrency Control and Isolation Level

................................
................................
................................
............

17

4.1.3

Locking

................................
................................
................................
................................
................................
.

18

4
.1.4

Fetch needed columns only

................................
................................
................................
................................
..

19

4.1.5

Reuse resources

................................
................................
................................
................................
....................

19

4.1.6

SQL Statement Preparation

................................
................................
................................
................................
..

19

4.1.7

Query Tuning

................................
................................
................................
................................
........................

20

5.

HARDWARE DESIGN AND
OPERATING SYSTEM USA
GE

................................
................................
...................

21

5.1

K
EY
F
ACTORS

................................
................................
................................
................................
...............................

21

5.1.1

Memory

................................
................................
................................
................................
................................

21

5.1.2

CPU

................................
................................
................................
................................
................................
......

21

5.1.3

I/O

................................
................................
................................
................................
................................
........

22

5.1.4

Network

................................
................................
................................
................................
................................

23

6.

TUNING OPPORTUNITIES

................................
................................
................................
................................
............

24

6.1

I
NSERT
/D
ELETE
P
ERFORMANCE

................................
................................
................................
................................
....

24

6.2

D
ATABASE
M
ANAGER
C
ONFIGURATION
T
UNING

................................
................................
................................
..........

25

6.3

D
ATABASE
C
ONFIGURATION
T
UNING

................................
................................
................................
...........................

26

6.4

B
UFFERPOOLS

................................
................................
................................
................................
...............................

27

6.5

R
EGISTRY
V
ARIABLES

................................
................................
................................
................................
..................

27

7.

MONITORING TOOLS

................................
................................
................................
................................
....................

28



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
3

of
35


8.

TUNING METHODOLOGY

................................
................................
................................
................................
............

28

8.1

T
UNING
A
PPROACH

................................
................................
................................
................................
.......................

28

8.2

S
KILLS
N
EEDED

................................
................................
................................
................................
............................

29

8.3

G
ENERAL
P
ROCESS

................................
................................
................................
................................
.......................

29

8.4

DB2

S
PECIFIC
T
UNING

................................
................................
................................
................................
..................

30

8.4.1

SQL Reviews

................................
................................
................................
................................
.........................

30

8.4.2

Database Statistics

................................
................................
................................
................................
...............

31

8.4.3

SNAPSHOT and EVENT Monitors

................................
................................
................................
.......................

32

8.4.4

DB2BATCH

................................
................................
................................
................................
..........................

34

9.

PUBLICATIONS

& ONLINE HELP

................................
................................
................................
...............................

34

10.

SIZING

................................
................................
................................
................................
................................
............

34

11.

REFERENCE

................................
................................
................................
................................
................................
.

35

12.

ADDENDUM

................................
................................
................................
................................
................................
..

35

1.

Document Control

1.1

Summary of Changes

The table below contains the summary of changes:

Version

Date

Description of changes

1.0


Distribution of 1
st

draft







Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
4

of
35



2.

Introduction

Any performance tuning requires some knowledge of the ap
plication involved. R
elational Database

systems are no different
. In fact, the more knowledge available about an application, the better one can
make decisions that will positively affect the performance of that application.

While it is possible to
perf
orm some amount of tuning of Relational Database Systems after the fact, the more attention you pay
to the overall design of your entire system up front will generally yield the best results.




While much of this material is applicable to DB2 on all av
ailable platforms, this paper specifically deals
only with DB2 on distributed platforms,
i.e.

Windows,
UNIX
,
and Linux
.

The material was obtained
from readily available publications on DB2 and from over 18 years experience with designing,
developing, and
tuning DB2 systems. While DB2 specific, many of the concepts are applicable to
Relational Databases in general, such as Oracle and Microsoft SQL Server.


T
his document is not intended to replace the detailed information that is available in various manual
s,
Redbooks
, etc., that deal specifically with DB2 performance. The intent is to point out some of the major
factors that affect DB2 performance and, hopefully, make it an easy to use and understand reference so
that the user will not have to read through

and understand in detail all of the previously mentioned
reference material.

The detailed material can, and should, be referenced when addressing a specific area
of concern.

Refer to Section 9 for a listing of some of these publications.
It is importan
t to note that all of
these factors should be addressed for each application developed.



2.1

Definitions



Throughput

-

The amount of data transferred from one place to another or processed in a
specified amount of time. Data transfer rates for disk drives and
networks are measured in
terms of throughput. Typically, throughputs are measured in kbps, Mbps and Gbps.



Optimizer

-

When
an SQL statement needs to be executed, the
SQL compiler
needs to
determine the access plan to the database tables. The
optimize
r

cre
ates this access plan, using
information about the distribution of data in

specific columns of tables and indexes if these
columns are used to select rows

or join tables. The optimizer uses this information to estimate
the costs of

alternative access plans

for each query.

I
ts decisions are heavily

influenced by
statistical information about the size of the database tables and

available
indexes.



Clustered Index
-

An index whose sequence of key values closely corresponds to the sequence
of rows stored in a
table. The degree to which this correspondence exists is measured by
statistics that are used by the optimizer



Cardinality


with respect to tables, t
he number of rows in
the table. With respect to indexed
columns, the number of distinct values of that co
lumn in a table.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
5

of
35




Prefetch

-

an operation in which data is read before, and in anticipation of, its use
. DB2
supports
the following mechanisms:



sequential prefetch
-

A mechanism that reads consecutive pages into the

buffer pool before
the pages are require
d by the application.



List Prefetch
-

Sometimes called list sequential prefetch.

Prefetches a set of

non
-
consecutive data pages efficiently.


2.2

Performance Factor
s

There are a number of areas that factor into the overall performance of any application. Bel
ow are the
general areas of concern along with an explanation of
each
.


2.2.1

Database Design

The term “Database Design” can mean a lot of things to a lot of people. There are two main types of data
model
s
: a logical and a physical. A “Logical” model is
a representation, often graphical in nature, of the
information requirements of a business area, it is not a database.
Its main purpose is to ensure that it
s
structure and content

can represent and support the business requirements of an area of interest.

It is
independent of any database technology
.


After completing your logical database design, there are a number of issues you should consider about the
physical environment in which your database and tables will reside. These include understanding the f
iles
that will be created to support and manage your database, understanding how much space will be required
to store your data, and determining how you should use the tablespaces that are required to store your
data.


This document deals only with the a
spects of physical database design.


2.2.2

Application Design

For the purposes of t
his document
, application design
deals
with aspects of how you access your database
system.
There are a number of techniques that will be discussed that, if used, can positively
influence the
performance of your application.


2.2.3

Hardware
Design
and Operating
System
Usage

For any database system, there are a number of common areas that need
to be addressed and sized
appropriately in order to support
your

application workload. This se
ction will discuss common, and
platform specific, hardware and operating system components.

3.

Database Design
Details



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
6

of
35


3.1

Key
Factors

3.1.1

Tablespaces

A tablespace
is a physical storage object that
provides a level of indirection between a database and the
tables s
tored within the database.
It is made up of a
collection of containers into which database objects
are stored
. A container is a
n

allocation of space to a table space. Depending on |the table space type, the
container can be a directory, device, or file. T
he data, index, long field, and LOB portions of a table can
be stored in the same table space, or can be individually broken out into separate table spaces.


When working with database systems, the main objective is to be able to store and retrieve data as

quickly
and efficiently as possible. One important consideration when designing your database or analyzing a
performance problem on an existing database is the physical layout of the database itself
. DB2 provides
support for two types of tablespaces:



Sys
tem Managed Space (SMS)
-

store
s

data in operating system files. They are an excellent choice
for general purposes use. They provide good performance with little administration cost.



Database Managed Space (DMS)

-

with database
-
managed space (DMS) table sp
aces, the
database manager controls the storage space. A list of devices or files is selected to belong to a
tablespace when it is defined. The space on those devices or files is managed by the DB2 database
manager.
There is some additional administration

cost with DMS tablespaces primarily due to
monitoring and adjusting the size of the pre
-
allocated files.
A DMS tablespace can be easily
increased in size by either ALTERing an existing container or adding a new container to it.


3.1.1.1

Recommendations



DMS tabl
espaces usually perform better than SMS tablespaces because they are pre
-
allocated and
do not have to spend time extending files when new rows are added. DMS tablespaces can be
either raw devices or file system files.
DMS table spaces in raw device conta
iners provide the best
performance because double buffering does not occur. Double buffering, which occurs when data
is buffered first at the database manager level and then at the file system level, might be an
additional cost for file containers or SMS t
able spaces.



If you use SMS tablespaces, consider using the
db2empfa

command on your database
. The
db2empfa (Enable Multipage File Allocation) tool enables the use of multipage file allocation for
a database. With multipage file allocation enabled for S
MS table spaces, disk space is allocated
one extent

rather than one page at a time, improving INSERT throughput.



U
sing DMS tablespaces also allows a single table to store its data, index, and large objects on up
to three different DMS tablespaces, thus imp
roving performance through
parallel

disk I/O
operations.


Parallel I/O is t
he process of reading from or writing to two or more I/O devices at the
same time to reduce response time

For

example
, look at the following statement (assume there are
no large ob
ject data types)
:



CREATE TABLE T1


(COLA

etc.


IN
TS1

INDEX IN
TS2




Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
7

of
35


CREATE INDEX IX1 ON T1

(COLA ASC)


The table data will be placed in tablespace
TS1
and the index data will be place in tablespace
TS2.


It would be important to put the different tables
paces on different disk drives

in order to enable the
possibility of parallel I/O operations.

In fact, each tablespace can have multiple containers, and
each of those containers could be on a different disk drive.
The ideal configuration has to consider
a number of factors, such as the number of disks

available
, RAID level, etc.


3.1.1.2

Platform Specific Recommendations

Windows


File system caching is performed as follows:




For DMS file containers (and all SMS containers), the operating system

might cache page
s in the
file system cache




For DMS device container table spaces, the operating system does not

cache pages in the file
system cache.



On Windows,

the
DB2
registry variable DB2NTNOCACHE

specifies whether or not DB2
will
open database files with the
NOCACHE

option. If DB2NTNOCACHE=ON, file system caching

is eliminated. If DB2NTNOCACHE=OFF, the operating system

caches DB2 files. This applies to
all data except for files that

contain LONG FIELDS or LOBS. Eliminating system caching

allows
more memory to be avai
lable to the database so that the

buffer pool or sortheap can be increased.


3.1.2

Tables

As previously discussed, before creating your physical database tables, you should draw a logical design
of your data model.
We will briefly
discuss aspects of logical da
tabase design that affect performance.

3.1.2.1

Normalization

This is the process of restructuring a data model by reducing its relations to their simplest forms. It is a
key step in the task of building a logical relational database design.
Normalization reduces
redundancy
from your data and can improve the performance of update and delete statements, since you only have to
do it in one place. By normalizing your data, you try to ensure that all columns in the table depend on the
primary key.

The disadvantage of
a fully normalized data structure is in data retrieval operations,
specifically when a query is accessing a large number of related pieces of data from different tables via
join operations.

For more information about Normalization, author C.J. Date is one

of the better
resources
. Access to his works can be located by searching the internet.

3.1.2.2

De
n
ormalization

The intentional duplication of columns in multiple tables whose consequence is increased data
redundancy. Denormalization is sometimes necessary to min
imize performance problems and is a key


Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
8

of
35


step in designing a physical relational database design
. The decision about whether to denormalize or not
should be based on the following:



Can
you utilize and implement some of the other methods described to tune y
our
database
and
improve performance

to an acceptable level
without denormalizing



Can you quantify the
likely performance gains by denormalizing, and
are
the
y a
reasonable trade
-
off against the added update overhead?

3.1.2.3

Other considerations



NULLs
-

In genera
l, columns defined as NOT NULL perform better than nullable columns due to
the path length reduction. The database manager does not have to check for null values in a NOT
NULL column. Also, every nullable column requires one extra byte per column value.

Use
NULLs where appropriate and not as a default
.



Column lengths


you should define your column lengths, particularly VARCHAR columns, as
small as possible for your application. By doing this there will be space savings which may lead
to a reduced numbe
r of used table and index pages, and index levels, which can improve query
performance.

o

If you create an index on a column defined as VARCHAR, all index entries for that
column take the maximum length of the VARCHAR definition, even is the actual data
leng
th in the column is smaller than the maximum.



Identity Columns
-

Significant performance enhancement can be realized using DB2 generated

identity values compared to those implemented by an application.

They are typically used
for
generating unique primary

key values.




Put frequently updated columns together and at the end of the row. This has an effect on update
performance due to the following logging considerations:

o

For fixed lengths row updates, DB2 logs from the first changed column to the last change
d
column

o

For variable length row updates, DB2 logs from the first changed byte to the end of the
row. If the length of a variable length column changes, this will result in a change to the
row header (which includes the row length), and thus the entire ro
w will be logged.

3.1.3

Bufferpools

A bufferpool is an area of memory into which database pages are read, modified, and held during
processing. On any system, accessing memory is faster than disk I/O. DB2 uses database buffer pools t
o
attempt to minimize disk
I/O.


There is no definitive answer to the question of how much memory you should dedicate to the buffer
pool. Generally, more is better. A good rule of thumb would be to start with about 75% of your system’s
main memory devoted to buffer pool(s), but

this rule is applicable only if

the machine is a dedicated
database server. Since it is a memory resource, its use has to be considered along with all other
applications and processes running on a server.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
9

of
35




A spreadsheet will be provided later in this docum
ent that can be use to estimate DB2 memory
usage


If you
r t
ablespaces
have
multiple page sizes, then you

should create only one buffer pool for each page
size.



There are some cases where defining m
ultiple buffer pools

of the same size can improve perfor
mance,
but,
if badly configured, can have a huge negative impact on

performance.

C
onsider
the following when
de
ciding
to create
multiple buffer pools:



You create tables which reside in table spaces using a page size

other than the 4 KB default. This
is re
quired

(as mentioned above)
.



You have tables which are accessed frequently and quickly by many short

update transaction
applications. Dedicated buffer pool(s) for these tables

may improve response times.



You have tables larger than main memory which are al
ways fully scanned
. T
hese could have their
own dedicated buffer pool.

3.1.4

Logging

On of the main purposes of all database systems is to maintain the integrity of your data. All databases
maintain log files that keep records of database changes. DB2
l
ogging
consists of a set of primary and
secondary log files that con
tain
log records that record all changes to a database. The database log is used
to roll back changes for units of work that are not committed and to recover a database to a consistent
state.

DB
2 provides
two logging strategy choices.

3.1.4.1

Circular logging

This is the default log mode. With circular logging, the log records fill the log files and then
overwrite the
initial log records in the initial log file. The overwritten log

records are not reco
verable.

This type of
logging is typically not suited for a production application.

3.1.4.2

Log Retain logging

Each
log file is archived when it fills with log

records. New log files are made available for log records.
Retaining log files

enables roll
-
forward re
covery. Roll
-
forward recovery reapplies changes to

the database
based on completed units of work (transactions) that are

recorded in the log. You can specify that roll
-
forward recovery is to the end

of the logs, or to a particular point in time before the
end of the logs.

Archived log files are never directly deleted by DB2, therefore, it is the applications responsibility to
maintain them; i.e. archive, purge, etc.

3.1.4.3

Log Performance

Ignoring the performance of your database in relation to its logging can be

a costly mistake, the main cost
being time.

Placement of the log files needs to be optimized, not only for write performance, but also for
read performance, because the database manager will need to read the log files during database recovery.

3.1.4.4

Recommen
dations



Use the fastest disks available for your log files



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
10

of
35


o

Use a separate array and/or channel if possible



Use Log Retain logging



Mirror your log files



Increase the size of the database configuration Log Buffer parameter (logbufsz)

o

This parameter specifies

the amount of the database heap to use as a buffer for log records
before writing these records to disk. The log records are written to disk when one of the
following occurs:



A transaction commits, or a group of transactions commit, as defined by the
minc
ommit configuration parameter.



The log buffer is full.



As a result of some other internal database manager event.

o

Buffering the log records will result in more efficient logging file I/O because the log
records will be written to disk less frequently, and
more log records will be written at each
time.



Tune the Log File Size (logfilsiz) database configuration parameter so that
you are not creating
excessive log files.


3.1.5

Indexing

An index is a set of keys, each pointing to a row, or rows in a table. An ind
e
x serves two primary
purposes
:

1.

To ensure uniqueness, as in the case of a Primary Key. Unique indexes can be created to ensure
uniqueness of the index key. An index key is a column or an ordered collection of columns on
which an index is defined. Using a
unique index will ensure that the value of each index key in the
indexed column or columns is unique.

2.

To allow more efficient access to rows in a table by creating a direct path to the data through
pointers. The SQL optimizer automatically chooses the mos
t efficient way to access data in
tables. The optimizer takes indexes into consideration when determining the fastest access path to
data.

The main advantages of indexes were pointed out above. Creating an index to ensure uniqueness is
typically driven by

a business requirement of the application, an
d

thus, are absolutely necessary to create.
Beyond that, you should be very careful about the number and size of the indexes that you create on your
database tables. Each additional index
will have an impact
on the following:



disk storage



INSERT and DELETE Processing

o

CPU

o

I/O



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
11

of
35


o

Logging



Database Maintenance

o

REORG

o

RUNSTATS

3.1.5.1

Index Considerations

The creation of any index for the purpose of efficient access

should be done based on a review of the
actual SQL that has
been, or will be, written against the tables.

In addition to the above mentioned comment, t
he following items need to be considered
when
determin
ing

which
index
es

to
create.



The
size of the index, determined by:

o

The
number and size of the columns in the
index

o

The projected, or actual, volume of data in your tables.


Indexes are implemented as a B
-
Tree structure
, with a root level, intermediate level(s), and leaf
pages (the lowest level) which contain actual pointers to the data. A typical index on a mod
erately
sized table would be 3 levels deep. That means that if the optimizer decided to use this index to
read the data, it would have to do a minimum of 4 I/O’s to satisfy the query:

1.

the index root page

2.

one intermediate level page

3.

one leaf page

4.

one data
page


As the size of the index increases, the number of pages needed to store the index entries and/or the
number of intermediate levels increases as well.




The Cardinality of the index
ed

columns



one of the most often overlooked issues when creating
an i
ndex. Consider the following example

to illustrate the point:



CREATE TABLE T1


(COLA

INTEGER

NOT NULL,




COLB

SMALLINT

NOT NULL,



COLC CHAR(10)

NOT NULL)



CREATE UNIQUE INDEX IX1 ON T1

(COLA ASC)
CLUSTER


CREATE INDEX IX2 ON T1

(COLB
ASC)


For this example, assume

1.

T1 has 100,000,000 rows

2.

the cardinality of COLB is 4
, with an even distribution among the rows



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
12

of
35


3.

we are not considering Multi
-
Dimensional Clustering (MDC)

4.

the following query is typically run against this table


SELECT COLA
, COLC

FROM T1

WHERE COLB = 3


On the surface, this looks like it all makes sense. We have a SELECT statement with an
“=” predicate on COLB, and we have an index defined on that column, what could be
better!


This table has a clustering index defined on C
OLA. The intent of a clustering index
is so
that the
sequence of key values closely corresponds to the sequence of rows stored in a
table.

Each table can only have one clustering index, so our index on COLB is not
clustered.


Assuming an equal distributi
on of rows across the COLB predicate, our WHERE clause,
“COLB = 3”, will return 25,000,000 rows to our query: 100,000,000/4.


It is highly unlikely that the DB2 optimizer would select the use of the COLB index (IX2)
to satisfy this query, due to high I/O c
ost. It would more likely decide to scan the entire
table by taking advantage of sequential Prefetch.


Since this index would not be used to satisfy the query, there would only be a negative
impact by creating it as discussed above; i.e. disk, CPU, INSERT

and DELETE processing
,
etc
.




The order of the columns in the index



another of the most often overlooked issues when creating
an index. Consider the following example to illustrate the point



CREATE TABLE T1


(COLA

INTEGER

NOT NULL,




C
OLB

SMALLINT

NOT NULL,



COLC CHAR(10)

NOT NULL,



COLD

INTEGER)



CREATE UNIQUE INDEX IX1 ON T1

(COLA ASC)


CREATE INDEX IX2 ON T1

(
COLB

ASC,



COLC

ASC,



COLD

ASC) CLUSTER


For this example, assume



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
13

of
35


1.

T1 has 100,000,000 rows

2.

the following query is ty
pically run against this table


SELECT COLB, COLD

FROM T1

WHERE COLC

= ‘DATA’

AND COLD


> 42


We have a SELECT statement with an “=” predicate on COLC, a second predicate with a
“>” predicate on COLD, and we have an index
(IX2)
defined on those columns, p
lus
COLB, which is also in our SELECT list.


Will the DB2 optimizer select index IX2 to satisfy this query? The answer is not so clear.
Remember that DB2 indexes are created as a B
-
tree structure. The first column of the IX2
index is COLB, but we have
not provided a predicate for that column in our where clause,
so DB2 can not effectively navigate the B
-
tree since the high order part of the key was not
provided.


This does not, however, eliminate the possibility of the use of this index. Since all of

the
data referenced (COLB, COLC, and COLD) is in the index, DB2 has two possibilities,
either scan the entire table, or scan the entire index. Since there would be fewer pages in
the index, and since there is no need to go to the data pages, an index sca
n would probably
be selected. Index scans are also eligible for Prefetch operations.


If this was a typical query used against this table, a better definition of the IX2 index would
be to order the columns as follows:


CREATE INDEX IX2 ON T1

(
COLC

ASC,



COLD

ASC,



COLB

ASC) CLUSTER


This would allow effective use of the B
-
tree structure by provid
ing

matching values on the
high order part of the key
.


These examples were provided for illustration only. You need to consider all of the database access
qu
eries against your tables, not just an individual SELECT statement, in order to determine the best index
strategy.


3.1.5.2

Index Recommendations

o

Create as few indexes as possible

o

Consider creating the INDEXES with the “ALLOW REVERSE SCANS” option



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
14

of
35


o

Pay close atten
tion to the order of the columns in the index

o

Don’t create redundant indexes

o

Use DB2 “Explain” facilities to determine the actual usage of the indexes



3.2

Database Maintenance

Regular

maintenance
is a critical factor in the performance of a database environ
ment. This involves
running the
Reorg, Runstats,
and
Rebind

facilities, in that order, on the database tables.

A regularly
scheduled maintenance plan is essential in order to maintain peak performance of your system.


3.2.1

REORG

After many changes to table d
ata,
caused by INSERT, DELETE, and UPDATE of variable length
columns activity,
logically sequential data may be on

non
-
sequential physical data pages so that the
database manager must

perform additional read operations to access data.

You can reorganize D
B2

tables
to eliminate fragme
ntation and reclaim space using the REORG Command.



Significant reductions in elapsed times due to improved I/O can result
from
regularly scheduled
REORG’s.



DB2 provides two types of REORG operation.

o

Classic REORG



Provides the

fastest method of REORG



Indexes are rebuilt during the reorganization



Ensures perfectly ordered data



Access is limited to read
-
only during the UNLOAD phase, no access during other
phases



Is not re
-
startable

o

In
-
Place REORG



Slower than the Classic REORG
, ta
kes longer to complete



Does not ensure perfectly ordered data

or indexes



Requires more log space



Can be paused and re
-
started



Can allow applications to access the database during reorganization

3.2.1.1

Recommendations



Implement a regularly scheduled maintenance pl
an



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
15

of
35




If you have an established database maintenance window, use the Classic REORG



If you operate a 24 by 7 operation, use the In
-
Place REORG


3.2.2

RUNSTATS

It was mentioned earlier that the DB2 optimizer uses information and statistics in the DB2 catalog in
orde
r to determine the best access to the database based on the query provided.
Statistical information is
collected for specific tables and indexes in the local

database when you execute the RUNSTATS utility.

When significant numbers of table

rows are added

or removed, or if data in columns for which you collect

statistics is updated, execute RUNSTATS again to update the statistics.

Use the RUNSTATS utility to
collect statistics in the following situations:



When data has been loaded into a table and the app
ropriate indexes have

been created.



When you create a new index on a table. You need execute RUNSTATS for

only the new index if
the table has not been modified since you last ran

RUNSTATS on it.



When a table has been reorganized with the REORG utility.



Whe
n the table and its indexes have been extensively updated, by data

modifications, deletions,
and insertions. (“Extensive” in this case may mean

that 10 to 20 percent of the table and index
data has been affected.)



Before binding
, or rebinding,
application
programs whose performance is critical



When you want to compare current and previous statistics. If you update

statistics at regular
intervals you can discover performance problems early.



When the prefetch quantity is changed.



When you have used the REDIST
RIBUTE DATABASE PARTITION
GROUP

utility
.

There are various formats of the RUNSTATS command, mainly determining the depth and breadth or
statistics collected. The more you collect, the longer the command takes to run. Some of the options are
as follows:



C
ollect either SAMPLED or DETAILED index statistics



Collecting statistics on all columns or only columns used in JOIN operations



Collecting distribution statistics

on all, key, or no columns
. Distribution statistics are very useful
when you have an uneven
distribution of data on key columns


3.2.2.1

Recommendations



Care must be taken when running RUNSTATS since the information collected will impact the
optimizer’s

selection of access paths.



Implement as part of a regularly scheduled maintenance plan if some of the
above mentioned
conditions occur



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
16

of
35




To ensure that the index statistics are synchronized with the table, execute

RUNSTATS to collect
both table and index statistics at the same time.



Consider some of the following factors when deciding what type of statistic
s to collect

o

Collect statistics only for the columns used to join tables or in the WHERE,

GROUP BY,
and similar clauses of queries. If these columns are indexed,

you can specify the columns
with the ONLY ON KEY COLUMNS clause
for

the

RUNSTATS command.

o

Cust
omize the values for num_freqvalues and num_quantiles for specific

tables and
specific columns in tables.

o

Collect DETAILED index statistics with the SAMPLE DETAILED clause to

reduce the
amount of background calculation performed for detailed index

statisti
cs. The SAMPLE
DETAILED clause reduces the time required to

collect statistics, and produces adequate
precision in most cases.

o

When you create an index for a populated table, add the COLLECT

STATISTICS clause
to create statistics as the index is created.

3.2.3

R
EBIND

After running RUNSTATS on your database tables, you need to
rebind your applications to
take
advantage of those new statistics.
This is done to
ensure the best access

plan is being used for your SQL
statements.
How that rebind takes place depends on

the type of SQL you are running. DB2 provides
support for the following:



Dynamic SQL
-

SQL statements that are prepared and executed at run time. In dynamic SQL, the
SQL statement is contained as a character string in a host variable and is not precompil
ed.



Static SQL
-

SQL statements that are embedded within a program, and are prepared during the
program preparation process before the program is executed. After being prepared, a static SQL
statement does not change, although values of host variables spe
cified by the statement can
change
. These Static statements are stored in a DB2 object called a package

Both Dynamic SQL statements and Packages can be stored in one of DB2’s cache’s. Based on the above
types of SQL, a rebind will take place under these
conditions.



Dynamic SQL

o

If the statement is not in the cache, the SQL Optimizer will “bind” the statement and
generate a new access plan

o

If the statement is in the cache, no “rebind” will take place



To clear the contents of

the SQL cache, use the FLUSH P
ACKAGE CACHE

sql
statement



Static SQL

o

An explicit REBIND <package> is executed

o

I
mplicitly
if the package is marked “invalid”



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
17

of
35




This can occur if, for example, an index that the package was using has been
dropped.


3.2.3.1

Recommendations

o

Perform a REBIND after runni
ng RUNSTATS as part of you normal database maintenance
procedures.


4.

Application Design Details

4.1

Key
Factors

4.1.1

Review application SQL for efficiencies

*** IMPORTANT ***


If there is any one thing that you should focus on from this entire paper, it is this topi
c
. In a

significant
majority of
cases, probably the single most important factor when it comes to performance with DB2, is
how efficiently your SQL statements are written. This topic mainly deals with SQL search criteria, which
can be present in SELECT,
UPDATE, DELETE, or INSERT (through a subselect) statements.

Reviewing SQL serves the following purposes:



Provides the database designer with the necessary information they need in order to determine the
proper indexes that should be created on your databa
se tables. These statements are essential for
the designer to be able to create the optimal indexes to support your database access. All of the
considerations mentioned above regarding indexes should be considered.



Allows an independent review of the SQL

for the purpose of utilizing efficient SQL coding
techniques




Determine i
f

locking strategies are appropriate



Assess the impact of changes in your data model or data content



Assess the impact of the application of service to the database manager



How to r
eview the SQL statements will be discussed in Section 8: Tuning Approach

4.1.1.1

Recommendations

o

Implement a formal SQL review process for your application(s)

4.1.2

Concurrency Control and Isolation Level

An isolation level determines how data is locked or isolated from

other

processes while the data is being
accessed. The isolation level will be in effect

for the duration of the unit of work.

DB2 supports the
following isolation

levels
, listed in order of most restrictive to least restrictive
:



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
18

of
35




Repeatable Read

-

An isol
ation level that locks all the rows in an application that are referenced
within a transaction. When a program uses repeatable read protection, rows referenced by the
program cannot be changed by other programs until the program ends the curre
nt transactio
n.



Read Stability

-

An isolation level that locks only the rows that an application retrieves within a
transaction. Read stability ensures that any qualifying row that is read during a transaction is not
changed by other application processes until the tr
ansaction is completed, and that any row
changed by another application process is not read until the change is committed by that process.



Cursor Stability

-

An isolation level that locks any row accessed by a transaction of an application
while the curso
r is positioned on the row. The lock remains in effect until the next row is fetched
or the transaction is terminated. If any data is changed in a row, the lock is held until the change is
committed to the database



Uncommitted Read
-

An isolation level tha
t allows an application to access uncommitted changes
of other transactions. The application does not lock other applications out of the row that it is
reading, unless the other application attempts to drop or alter the table
. Sometimes referred to as
“Di
rty Reads”

4.1.2.1

Recommendations

o

Make sure you know the isolation level under which you are running. Do not count on default
values, which can change based on how you are accessing the database.

o

Because the isolation level determines how data is locked and iso
lated from other processes while
the data is being accessed, you should select an isolation level that balances the requirements of
concurrency and data integrity

for your particular application
.

The isolation level that you specify
is in effect for the d
uration of the unit of work.


4.1.3

Locking

To provide concurrency control and prevent uncontrolled data access, the

database manager places locks
on tables, table blocks, or table rows. A lock

associates a database manager resource with an application,
called t
he lock

owner, to control how other applications can access the same resource.

Locking is a
fundamental process of any database manager and is used to ensure the integrity of the data. But while
maintaining those locks, there is a potential impact on the

concurrency and throughput of your
application.

There are a number of factors that the database manager uses to determine
whether to use row level or
table level locking:



The different isolation levels
described above
are used to control access to uncommi
tted

data,
prevent lost updates, allow non
-
repeatable reads of data, and prevent

phantom reads. Use the
minimum isolation level that satisfies your

application needs.



The access plan selected by the optimizer. Table scans, index scans, and

other methods of

data
access each require different types of access to the

data.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
19

of
35




The LOCKSIZE attribute for the table. Th
is parameter
indicates the granularity of the locks used
when the table

is accessed. The choices are either ROW for row locks, or TABLE for table

locks
.



The amount of memory devoted to locking. The amount of memory

devoted to locking is
controlled by the locklist database configuration

parameter.

4.1.3.1

Recommendations

o

*** IMPORTANT ***

-

COMMIT as frequently as possible and/or practical in order to release
an
y locks your application holds. If possible, design your application so that you can easily vary
the commit frequency for large batch operations. This will allow you to optimally balance the
throughput and concurrency of your system.

o

Use ALTER TABLE...
LOCKSIZE TABLE for read
-
only tables. This reduces the number of
locks required by database activity.


o

If the lock list fills, performance can degrade due to lock

escalations and reduced concurrency on
shared objects in the database. If

lock escalations occ
ur frequently, increase the value of either
locklist or

maxlocks , or both

4.1.4

Fetch needed columns only

T
here is
additional CPU cost associated with each column selected or fetched from the database.

Higher
I/O cost may also be experienced if sorting is requ
ired.


4.1.4.1

Recommendations



Select or fetch the columns that you need



Never code “SELECT *” to retrieve all columns in a table

4.1.5

Reuse resources

Consider reuse of the following components:

4.1.5.1

Recommendations



Database connections


this can be accomplished using
the connection pooling features of DB2.

Connection pooling is a
process in which DB2 drops the inbound connection with an application
that requests disconnection, but keeps the outbound connection to the host in a pool. When a new
application requests a c
onnection, DB2 uses one from the existing |pool. Using the already
-
present
connection reduces the overall connection time, as well as the high processor connect cost on the
host.

Connection pooling is
implemented using:

o

DB2 Connect


o

via JDBC using
the
We
bsphere

connection pooling feature
.


4.1.6

SQL Statement P
reparation

Before an SQL statement can be executed, it must be converted from text form to an executable form, by
submitting it to the SQL compiler. This is referred to as the SQL Statement Prepare proce
ss.
After the


Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
20

of
35


statement is prepared, the bind process occurs. This process converts
the output from the SQL compiler
to a usable control structure, such as an access plan, application plan, or package. During the bind process,
access paths to the data ar
e selected and some authorization checking is performed
.
DB2 supports two
types of SQL processing.



Static
-

SQL statements that are embedded within a program, and are prepared during the program
preparation process before the program is executed. After b
eing prepared, a static SQL statement
does not change, although values of host variables specified by the statement can change
.



Dynamic
-

SQL statements that are prepared and executed at run time. In dynamic SQL, the SQL
statement is contained as a charact
er string in a host variable and is not precompiled.

Static SQL offers the advantage of only executing the statement preparation process once, this eliminating
that processing step each time the statement is executed.

Dynamic SQL statements, by there defin
ition, are prepared and executed at run time. You can, however,
minimize the

effect of statement preparation by writing
your dynamic SQL statement
using
parameter
markers
. Parameter markers act in a similar fashion to host variables in static SQL stateme
nts. If you use
them in
a dynamic SQL

statement
, you would first
issue a single PREPARE

statement
, followed by
multiple
EXECUTE statement
s, which allow you to substitute values for the parameter markers. This
savings can be significant for simple SQL sta
tements, like an INSERT, that are executed many times with
different values.

4.1.6.1

Recommendations



Use static SQL whenever possible

o

If using java, SQLJ supports static SQL



If you use dynamic SQL

o

code them using parameter markers

o

Increase the size of the databa
se package cache. This cache stores dynamic SQL
statements and allows for their reuse

4.1.7

Query Tuning

The following SQL statement clauses may improve the performance of your application.



Use the FOR UPDATE clause to specify the columns that could be updated

by a subsequent
positioned UPDATE statement.



Use the FOR READ/FETCH ONLY clause to make the returned columns

read only.



Use the OPTIMIZE FOR n ROWS clause to give priority to retrieving the

first n rows in the full
result set.



Use the FETCH FIRST n ROWS ON
LY clause to retrieve only a
specified number

of rows.



Use the DECLARE CURSOR WITH HOLD statement to retrieve rows one

at a time
and maintain
cursor position after a commit



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
21

of
35




Take advantage of row blocking, by specifying the FOR READ ONLY, FOR FETCH ONLY,
OP
TIMIZE FOR n ROWS clause, or if you declare your cursor as SCROLLing. This will
improve performance
, and, i
n addition,
improve
concurrency

because exclusive locks are never
held on the rows retrieved.



Avoid DISTINCT or ORDER by if not required. This will

help to eliminate any potential sorting
that may have to occur.

o

Proper i
ndexing
may be used
to eliminate SORT
ing also


5.

Hardware
Design
and Operating
System
Usage

5.1

Key
Factors

This section will just discuss overall considerations for these factors and will
not discuss detailed
calculations for capacity planning purposes.

5.1.1

Memory

Understanding how DB2 organizes memory helps you tune memory use for good performance. Many
configuration parameters affect memory usage. Some may affect memory on the server, some o
n the
client, and some on both. Furthermore, memory is allocated and de
-
allocated at different times and from
different areas of the system. While the database server is running, you can increase or decrease the size
of memory areas inside the database sh
ared memory.
You should understand how memory is divided
among the different heaps before tuning to balance overall memory usage on the entire system.

Refer to
the “DB2 Administration Guide: Performance” for a detailed explanation of DB2’s memory model a
nd all
of the parameters that effect memory usage.


5.1.2

CPU

The CPU utilization goal should be about 70 to 80% of the total CPU time. Lower utilization means that
the CPU can cope better with peak workloads. Workloads between 85% to 90% result in queuing dela
ys
for CPU resources, which affect response times. CPU utilization above 90% usually results in
unacceptable response times. While running batch jobs, backups, or loading large amounts of data, the
CPU may be driven to high percentages, such as to 80 to 1
00%, to maximize throughput.

DB2 supports the following processor configurations:



Uni
-
Processor


a single system that contains only one single CPU



SMP (Symmetric Multiprocessor)


a single system that can contain multiple CPU’s. Scalability
is limited to

the CPU sockets provided on the motherboard.



MPP (Massively Parallel Processors)


a system with multiple nodes connected over a high speed
link. Each node has their own CPU(s). Scalability is achieved by adding new nodes.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
22

of
35


Things to consider regarding

CPU:



Inefficient data access methods cause high CPU utilization and are major problems for database
system. Refer back to section 4.1.1.



Paging and swapping requires CPU time. Consider this factor
while planning your memory
requirements.

5.1.3

I/O

The follow
ing are rules of thumb that can be used to calculate total disk space required by an application.
If you have more detailed information, use that instead of the ROT’s.



Calculate the raw data size

o

Add up the column lengths of your database tables

o

Multipl
y by the number of rows expected



Once you have the raw data size, using the following scaling up ratios to factor in space for
indexing, working space, etc.

o

OLTP ratio: 1:3

o

DSS ratio: 1:4

o

Data warehouse ratio: 1:5

Consider the following to improve disk eff
iciency:



Minimize I/O


access to main memory is orders of magnitude faster than to disk. Provide as
much memory as possible to the database Bufferpools and various memory heaps to avoid I/O



When I/O is needed, reading simultaneously from several disks is

the fastest way. Provide for
parallel I/O operations by:

o

Using several smaller disk rather than one big disk

o

Place the disk drive(s) on separate controllers

5.1.3.1

Choosing Disk Drives

There are several trends in
current
disk technology:



They get bigger every
year, roughly doubl
ing in capacity
every 18 months.



The cost per GB is lower each year.



T
he cost difference of the two smallest drives diminishes until there is little

point in continuing
with the smaller drive.



The disk drives improve a little each year i
n seek time.



The disk drives get smaller in physical size.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
23

of
35


While the
disk drives

continue to increase capacity with a
smaller physical size
, t
he speed improvements
,
seek, etc.,
are

small in

comparison.

A

database that would have taken 36 * 1 GB drives
a nu
mber of
years

ago can now be placed on one disk.

This highlights the database I/O problems.

For example, if each 1 GB
disk drive
can do 80 I/O operations
a second, this means the

system can do a combined 36 * 80 =
2880

I/O operations per second.

But a

s
ingle 36 GB drive with a seek time of 7 ms can do
only
140

I/O operations per

second.

While increased
disk drive capacity is good news,
the
lower numbers of

disks cannot deliver the same I/O throughput.

5.1.3.2

Recommendations



When determining your I/O requiremen
ts consider:

o

OLTP Systems



reading data will involve reading indexes



inserts and updates require data, index, and logs to be written



Provide for parallel I/O operations

o

Use DMS tablespaces and separate data and indexes in separate tablespaces

o

Use the
smalle
st
disk
drive
s

possible purely

on the basis of increasing the number of disks
for I/O throughput.

If buying larger drives, use only half the space (the middle area


it’s
the fastest) for the database, and the other half for:



Backups



Archiving data



Off h
our test databases



Extra space used for upgrades

5.1.4

Network

The network can influence the overall performance of your application, but usually manifest itself when
there is a delay in the following situations:

o

The time between when a client machine sends a re
quest to the server and the server receives this
request

o

The time between when the server machine sends data back to the client machine and the client
machine receives the data

Once a system is implemented, the network should be monitored in order to assur
e that its bandwidth is
not being consumed more than 50%.

5.1.4.1

Recommendations

The

following
techniques
can by used to
improve overall performance and avoid

high

network
consumption:



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
24

of
35


o

Transmit a block of rows to the client machine in a single operation.

T
his
is

accomplished by
using the BLOCKING

option in the pre
-
compile or bind

procedures.

Refer to section 4.1.8, Query
Tuning, for other factors that influence row blocking.

o

Use stored procedures to minimize the number of accesses to the database.

Stored proc
edures are
programs that reside on the RDBMS server and can

be executed as part of a transaction by the
client applications. This way,

several pre
-
programmed SQL statements can be executed by using
only one

CALL command from the client machine.

o

Using
stor
ed procedures

will typically make it more difficult to run your application on
different database platforms, like Oracle or SQL Server, because of the syntactical
differences of their stored procedure implementation. So if you need to run your
application

on multiple database platforms, be aware of this consideration.

6.

Tuning Opportunities

This section contains some additional tuning considerations not already discussed.

6.1

Insert/Delete Performance

Here are some things to consider about insert

and delete

performance. The biggest bottlenecks are
typically:


o

Read and Write I/O for index and data

o

Active Log Write

o

CPU Time

o

Locking


We’ve previously discussed ways to address many of these issues, such as:


o

Using DMS tablespaces and placing the table and index

data into separate tablespaces to enable
parallel I/O.

o

Providing for efficient logging

o

The use of parameter

markers
to prepare an INSERT statement once, and execute it many times.

o

Batching of SQL Statements

o

Via JDBC batch facility

o

Minimize
Indexing


Here

are a few other suggestions to improve insert performance:


o

Consider the use of APPEND MODE

o

Insert multiple rows with one INSERT statement

o

PCTFREE

values for Data, Clustering Index, and non
-
clustering Index components.




Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
25

of
35


The following database parameter is

important.

o

Number of asynchronous page cleaners (NUM_IOCLEANERS)
-

This parameter controls the
number of page cleaners that write changed pages from the buffer pool to disk. You
may
want to
increase this to
the number of physical disk drive devices you h
ave.

The default is 1.


The following database manager parameter may be important.


o

Enable intra
-
partition parallelism (INTRA_PARALLEL)


if you
have a multi
-
processor SMP
system, setting this parameter to YES may improve performance.

The default is NO



The following
registry variable can be used


o

DB2MAXFSCRSEARCH
-

The setting of
this
registry variable determines the

number of Free
Space Control Records (
FSCRs
)

in a table that are searched for an INSERT.

The default

o

value for this registry variable is

five. If no space is found within the specified

number of FSCRs,
the inserted record is appended at the end of the table. To

optimize INSERT speed, subsequent
records are also appended to the end of

the table until two extents are filled. After the two ex
tents
are filled, the next

INSERT resumes searching at the FSCR where the last search ended.

o

To optimize for INSERT speed at the possible expense of faster table

growth, set the
DB2MAXFSCRSEARCH registry variable to a small

number. To optimize for space re
use
at the possible expense of INSERT

speed, set DB2MAXFSCRSEARCH to a larger
number.



6.2

Database
Manager
Configuration Tuning

Each instance of the database manager has a set of database manager configuration parameters (also
called database manager paramet
ers). These affect the amount of system resources that will be allocated
to a single instance of the database manager. Some of these parameters are used for configuring the setup
of the database manager and other, non
-
performance related information.

There

are numerous database
manager
configuration parameters. I will
only
list

some of
the one’s that
have a high impact on performance.
Refer to the “DB2 Administration Guide: Performance” for a detailed
explanation of
all the
database manager configuration
p
arameters
.

o

Agentpri
-

This parameter controls the priority given both to all agents, and to other

database
manager instance processes and threads, by the operating system

scheduler.

Use the default
unless you run a benchmark to determine the optimal value
.

o

Aslheapsz


the
application support layer heap represents a communication buffer

between the
local application and its associated agent. This buffer is allocated

as shared memory by each
database manager agent that is started.

If the request to the data
base manager, or its associated
reply, do not fit into

the buffer they will be split into two or more send
-
and
-
receive pairs. The size

of this buffer should be set to handle the majority of requests using a single

send
-
and
-
receive pair.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
26

of
35


o

Intra_parallel
-

Th
is parameter specifies whether the database manager can use intra
-
partition

parallelism

on an SMP machine
.
Multiple processors can

be used to scan and sort data for index
creation.


o

Java_heap_sz
-

This parameter determines the maximum size of the heap tha
t is used by the

Java
interpreter started to service Java DB2 stored procedures and UDFs

(User Defined Functions)
.

o

Max_querydegree
-

This parameter specifies the maximum degree of intra
-
partition parallelism

that is used for any SQL statement executing on
this instance of the database

manager. An SQL
statement will not use more than this number of parallel

operations within a partition when the
statement is executed. The intra_parallel

configuration parameter must be set to “YES” to enable
the database part
ition

to use intra
-
partition parallelism.

o

Sheapthres


the sort heap threshold determines the maximum amount of memory available for all
the
operations that use the sort heap includ
ing:

sorts, hash joins,

dynamic bitmaps (used for index
ANDing and Star Joi
ns), and operations

where the table is in memory.

Ideally, you should set
t
his parameter to a reasonable

multiple of the largest sortheap parameter you have in your
database manager

instance. This parameter should be at least two times the largest sorthea
p

defined for any database within the instance.


6.3

Database Configuration Tuning

Each database has a set of the database configuration parameters (also called database parameters). These
affect the amount of system resources that will be allocated to that da
tabase. In addition, there are some
database configuration parameters that provide descriptive information only and cannot be changed;
others are flags that indicate the status of the database.

There are numerous database configuration parameters. I will
only list
some of
the one’s that have a high
impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed explanation
of
all the
database configuration parameters
.

o

Chngpgs_thresh
-

Asynchronous page cleaners will write changed

pages from the buffer pool

(or
the buffer pools) to disk before the space in the buffer pool is required by

a database agent. As a
result, database agents should not have to wait for

changed pages to be written out so that they
might use the space in the
buffer

pool. This improves overall performance of the database
applications.

o

Locklist
-

This parameter indicates the amount of storage that is allocated to the lock

list. Th
is has
a high impact on performance if there are frequent lock escalations.

o

Maxlock
s


maximum percent of lock list before escalation. Used in conjunction with the locklist
parameter to control lock escalations.

o

Logbufsz
-

This parameter allows you to specify the amount of the database heap

(defined by the
dbheap parameter) to use as
a buffer for log records before

writing these records to disk.

Buffering the log records will result in more efficient logging file I/O because

the log records will
be written to disk less frequently and more log records

will be written at each time.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
27

of
35


o

Num
_iocleaners

-

This parameter allows you to specify the number of asynchronous page

cleaners
for a database. These page cleaners write changed pages from the

buffer pool to disk before the
space in the buffer pool is required by a

database agent. As a resul
t, database agents should not
h
ave to wait for

changed pages to be written out so that they might use the space in the buffer

pool. This improves overall performance of the database applications.

S
et this parameter to be

between one and the number of phys
ical storage devices used for the

database.

o

Num_ioservers

-

I/O servers are used on behalf of the database agents to perform prefetch I/O

and
asynchronous I/O by utilities such as backup and restore. This parameter

specifies the number of
I/O servers for a

database.

A
good value to use is generally one or two more than the number of
physical

devices on which the database resides.

o

Pckcachesz



The package cache
is used

for caching of sections for static and dynamic SQL
statements on a database.

Caching pac
kages
and statements
allows the database manager to
reduce its internal overhead

by eliminating the need to access the system catalogs when reloading
a

package; or, in the case of dynamic SQL, eliminating the need for compilation.

o

Sortheap

-

This parameter

defines the maximum number of private memory pages to be

used for
private sorts, or the maximum number of shared memory pages to

be used for shared sorts. Each
sort has a separate sort heap that is

allocated as needed, by the database manager. This sort
heap is
the area

where data is sorted. Increase the size of this parameter when frequent large sorts are
required.


6.4

Bufferpools

The buffer pool is the area of memory where database pages (table rows or indexes) are temporarily read
and manipulated. All bu
ffer pools reside in global memory, which is available to all applications using the
database. The purpose of the buffer pool is to improve database performance. Data can be accessed much
faster from memory than from disk. Therefore, the more data (rows an
d indexes) the database manager is
able to read from or write to memory, the better the database performance. The default buffer pool
allocation is usually not sufficient for production applications and needs to be monitored and tuned before
placing your a
pplication in production.


6.5

Registry Variables

Each instance of the database manager has a set of
Registry and Environment variables
.
These affect
various aspects of DB2 processing.

There are numerous
Registry and Environment variables.
I will only list
some of
the one’s that have a
high impact on performance. Refer to the “DB2 Administration Guide: Performance” for a detailed
explanation of
all the Registry and Environment variables.


o

DB2_Parallel_IO
-

While reading or writing data from and to table spac
e containers, DB2 may use
parallel I/O for each

table space value that you specify. The degree of parallelism is determined by


Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
28

of
35


the prefetch size and

extent size for the containers in the table space. For example, if the prefetch
size is four times the exte
nt

size, then there are four extent
-
sized prefetch requests. The number of
containers in the table space

does not affect the number of prefetchers.

To enable parallel I/O for
all table spaces, use the wildcard

character, ″*″.

To enable parallel I/O for a

subset of all table
spaces, enter the list of table spaces.

If

there is more than one container, extent
-
size pieces of any
full prefetch request are broken down into

smaller requests executed in parallel based on the
number of prefetchers.

When this var
iable is not enabled, the number of prefetcher requests
created is based on the number of

containers in the table space.

7.

Monitoring Tools

DB2 provides several tools that can be used for monitoring or analyzing your database. These monitoring
and analyzing

tools, along with their purposes are:

o

Snapshot Monitor


captures performance information at periodic points of time. Used to
determine the current state of the database

o

Event Monitor
-

provides a summary of activity at the completion of events such as s
tatement
execution, transaction completion, or when an application disconnects.

o

Explain Facility
-

provides information about how DB2 will access the data in order to resolve the
SQL statements.

o

db2batch tool
-

provides performance information (benchmarkin
g tool)


8.

Tuning
Methodology

8.1

Tuning Approach


The objective of tuning an RDBMS is to make sure that the system is delivering good performance. As
with most things, the
ability to
meet that objective depends on the effort and resources you apply to the
tun
ing effort. The following are areas that would cause a tuning effort to be started:



Regular task

-

Regular periodic monitoring and tuning is standard practice. Many sites do a

review
of performance on quarterly, half
-
yearly, or yearly intervals.



Generate
d warning

-

The automated monitoring of the system
to
warn that performance is

degrading and has hit some threshold.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
29

of
35




Emergency

-

There is an emergency in performance or response time, which has been

highlighted
by user feedback. The tuning must identify t
he problem,

recommend a solution, and then work out
how to avoid this happening again.



New system

-

A newly built system requires initial tuning for maximum

performance before

going
into production. In
many cases, a new
system might

be
put
in production

be
fore being optimally
tuned

because of the difficultly of generating user workload
s
artificially and not being able to
predict real user workloads
,
work patterns
, and data volume and distribution
.

For this reason, it is
critical that the design principals
outlined in this document be followed, so as to minimize the
effects of the unknown as much as possible.



System change



if the change is significant, this is similar to a new system if testing cannot
effectively simulate production.

Regardless of the r
eason, the tuning approach will l
argely be the same
.

8.2

Skills Needed

While this document focuses on database design and tuning, there are many other factors that influence
the overall performance of any application. Other skills that may be required tune th
e overall system
include:



Hardware Experts


for the various hardware platforms you plan to run on



Operating System Experts


for the various operating system platforms you plan to run on. This
could include:

o

System administration experts

o

Operating system

performance and tuning experts



Relational Database skills including:

o

DBA skills

o

SQL Tuning



Middleware Experts


if using middleware products such as Websphere



Applications Experts

o

Functional knowledge of the application

o

Technical experts on the products
used to build the application



Java



C++

8.3

General Process

The following process is recommended to improve the performance of any system:

1.

Establish performance indicators.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
30

of
35


2.

Define performance objectives.

3.

Develop a performance monitoring plan.

4.

Carry out the plan
.

5.

Analyze your measurements to determine whether you have met your objectives. If you have,
consider reducing the number of measurements you make because performance monitoring itself
uses system resources.

Otherwise, continue with the next step.

6.

Determin
e the major constraints in the system.

7.

Decide where you can afford to make trade
-
offs and which resources can bear additional load.
(Nearly all tuning involves trade
-
offs among system resources and the various elements of
performance.)

8.

Adjust the configura
tion of your system. If you think that it is feasible to change more than one
tuning option, implement one at a time. If there are no options left at any level, you have reached
the limits of your resources and need to upgrade your hardware.

9.

Return to Step

4 above and continue to monitor your system.


On a p
eriodic

basis, or after
significant changes to your system:



Perform the above procedure again from step 1.



Re
-
examine your objectives and indicators.



Refine your monitoring and tuning strategy.


8.4

DB2 Spe
cific Tuning

There are numerous documents and books written describing detailed formal tuning methodologies. This
document is not intended to repeat, or rewrite, an
y

of that. I will discuss some specific DB2
topics,
tools
,

and techniques that can be use
d to tune your DB2 database.

8.4.1

SQL Reviews

As mentioned earlier in this document, SQL Reviews are essential for a good performing RDBMS
system. SQL reviews in DB2 are generally done using
SQL
Explain facility
. Explain
allows you to
capture information abou
t the access plan

chosen by the optimizer as well as performance information
that helps you

tune queries.

Before you can capture explain information, you
need to
create the relational
tables in

which the optimizer stores the explain information and you se
t the special

registers that

determine what kind of explain information is captured.

These tables can be created:



Automatically by the DB2 Control Center



Running the following command from a DB2 command window

o

Db2

tf EXPLAIN.DDL

(located in sqllib/misc

directory)



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
31

of
35



DB2 provides a number of facilities to view the information generated by the EXPLAIN facility. These
include:



Visual Explain


you i
nvoke Visual Explain from the Control Center to see a graphical display of

a query access plan. You can analyz
e both static and dynamic SQL

statements.



db2exfmt


this command line
tool
is used
to display explain information in preformatted

output.



db2expln and dynexpln



these command line
tools

are use t
o see the access plan information
available for one or more

packages of

static SQL statements
. D
b2expln shows the actual
implementation of the chosen access plan. It does

not show optimizer information.

The dynexpln
tool, which uses db2expln within it, provides a quick way to

explain dynamic SQL statements
that
contain no parameter markers. This

use of db2expln from within dynexpln is done by
transforming the input SQL

statement into a static statement within a pseudo
-
package. When this

occurs, the information may not always be completely accurate. If complete

ac
curacy is desired,
use the explain facility.

The db2expln tool does provide a relatively compact and English
-
like

overview of what operations will occur at run
-
time by examining the actual

access plan
generated.


Which tool you use depends on a number of
factors including:



how complicated is the SQL



Do you have access to the actual program/code that will be running



Your familiarity with the tools


Previously, I discussed the functions of the DB2 optimizer. T
he DB2 optimizer uses information and
statistics

in the DB2 catalog in order to determine the best access to the database based on the query
provided
.
This information is generally gathered using the RUNSTATS utility.
The catalog information
includes the following types of information:



Number of rows
in a table



Indexes defined on the table



Whether the table needs to be REORGanized


8.4.2

Database Statistics

Ideally, you would use the explain facility mentioned above against a database that has production level
volumes loaded into it. The access path chosen
may be drastically different if a table has 100 rows in a
test database and 10,000,000 rows in a production database. This difference is exacerbated if the SQL
query involves a join of multiple tables.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
32

of
35


Often times, it is not easy or practical, particul
arly early on, to have a database loaded with production
level volumes. DB2 provides a facility for simulating a production database by allowing you to c
od
e

SQL UPDATE statements that operate against a set of predefined

catalog views

in the SYSSTAT
schema
.
With SYSSTAT, the database administrator is able to simulate
production volumes in
a test
database.

U
pdating these views allows the
optimizer
to
create

different access plans under different
conditions.

The following is a sample of one of these UPDATE

statements:

UPDATE SYSSTAT.TABLES SET CARD =

850000 WHERE TABNAME =’CUSTOMER ’


Although this facility is provided, it should be used only as an initial facility for updating statistics.
There are many inter
-
relationships about the database tables in the

catalog, and understanding how to
update all of them correctly is extremely difficult.


8.4.2.1

Using Database Statistics for SQL Access Plan Analysis

The following
sources
should be
used
in the order listed

when using the EXPLAIN facility to analyze
your SQL st
atements.

Make sure any statistics you use are current by running the RUNSTATS utility.

1.

production database (or an image of one)

2.

a test database loaded with a significant amount of data

3.

a test database updated with production statistics

o

The DB2 tool,

db2l
ook , is designed to capture all table DDL and statistics of
a
production

database to replicate it to the test system.

4.

update the SYSSTAT views

o

the UPDATE statements of the SYSSTAT views are generated by the db2look facility.


8.4.3

SNAPSHOT
and EVENT
Monitor
s

DB2 maintains data about its operation, its

performance, and the applications using it. This data is
maintained as the

database manager runs, and can provide important performance and

troubleshooting
information. For example, you can find out:



The number o
f applications connected to a database, their status, and which

SQL statements each
application is executing, if any.



Information that shows how well the database manager and database are

configured, and helps
you to tune them.



When deadlocks occurred for
a specified database, which applications were

involved, and which
locks were in contention.



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
33

of
35




The list of locks held by an application or a database. If the application

cannot proceed because it
is waiting for a lock, there is additional

information on the l
ock, including which application is
holding it.


C
ollecting
performance
data introduces overhead on the operation of

the database.
DB2

provides
monitor switches to control which information is collected.

You can turn these switches on by using the
follow
ing DB2 commands:



UPDATE MONITOR SWITCHES USING BUFFERPOOL

ON ;



UPDATE MONITOR SWITCHES USING LOCK


ON ;



UPDATE MONITOR SWITCHES USING SORT


ON ;



UPDATE MONITOR SWITCHES USING STATEMENT

ON ;



UPDATE MONITOR SWITCHES USING TABLE


ON ;



U
PDATE MONITOR SWITCHES USING UOW


ON ;

You can access the data that the database manager maintains either by taking

a snapshot or by using an
event monitor.

8.4.3.1

SNAPSHOTs

Use the GET SNAPSHOT command to collect status information and format the output
for your use. The
information returned represents a snapshot of the database manager operational status at the time the
command was issued. There are various formats of this command that are used to obtain different kinds
of information. The specific syn
tax can be obtained from the DB2 Command Reference. Some of the
more useful ones are:



GET SNAPSHOT FOR DATABASE
-

Provides general statistics for one or more active databases
on the current database partition.



GET SNAPSHOT FOR APPLICATIONS
-

Provides info
rmation about one or more active
applications that are connected to a database on the current database partition.



GET SNAPSHOT FOR DATABASE MANAGER
-

Provides statistics for the active database
manager instance.



GET SNAPSHOT FOR LOCKS
-

Provides informatio
n about every lock held by one or more
applications connected to a specified database.



GET SNAPSHOT FOR BUFFERPOOLS
-

Provides information about buffer pool activity for the
specified database.



GET SNAPSHOT FOR DYNAMIC SQL
-

Returns a point
-
in
-
time picture

of the contents of the
SQL statement

cache for the database.




Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
34

of
35


You can create some simple scripts and schedule them to get periodic snapshots during your test cycles.


8.4.4

DB2BATCH

A benchmark tool called db2batch is provided in the sqllib/bin subdirectory of
your
DB2 installation.
This tool can read SQL

statements from either a flat file or standard input, dynamically describe and

prepare the statements, and return an answer set.

You can specify the level of performance
-
related
information supplied,

includin
g the elapsed time, CPU and buffer pool usage, locking, and other

statistics
collected from the database monitor. If you are timing a set of SQL

statements, db2batch also summarizes
the performance results and provides

both arithmetic and geometric means.
For syntax and options, type
db2batch
.

9.

Publications & Online Help

The following books
serve as reference material.


Information Units

Contents

Administration Guide:
Performance

SC09
-
4821


This book
contains i
nformation about
how to
configuring
and tuning
your database environment to

improve
performance
.

Database Performance
Tuning on AIX

SG24
-
5511
-
01

This
Redbook

contains hints and tips from experts that
work on RDBMS

performance every day. It also provides
introductions to general database layout

concept
s from a
performance point of view, design and sizing guidelines,
tuning

recommendations, and performance and tuning
information for DB2 UDB,

Oracle, and IBM Informix
databases.

DB2 UDB V7.1
Performance Tuning Guide

SG24
-
6012

This IBM Redbook will provide

you with guidelines for
system design,

database design, and application design
with DB2 UDB for AIX Version 7.1.

It
also discuss
es

the
methods that are available for performance analysis

and
tuning.



10.

Sizing



Product Name Here


Tivoli Systems

(via Subject
-

File>Prop>Summary)



IBM Confidential

9. November 2013

CMDB RR API SDD1, Ling Tai

Page
35

of
35


The following s
preadsheets
are
p
rovide
d
in t
he Project Database
on an “as
-
is” basis
as samples
to assists
with
t
ablespace

and index sizing
, and DB2 memory utilization
.

1.

TBLDATA
-
zOS
.XLS
-

Spreadsheet to estimate space requirements for a table

on the mainframe

2.

INDXDATA
-
zOS
.XLS
-

Spreadsheet to estimate

space requirements for an index

on the
mainframe

3.

TBLDATA
-
dist
.XLS
-

Spreadsheet to estimate space requirements for a table

in the distributed
environment

4.

INDXDATA
-
dist
.XLS
-

Spreadsheet to estimate space requirements for an index

in the distributed
enviro
nment

5.

DB2UDBMEMORY.
XLS
-

a spreadsheet containing the estimated memory usage of the system


11.

Reference


12.

Addendum