Deep Compression in DB2 9

basesprocketΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

85 εμφανίσεις


Author: Bill Minor, Toronto Lab

Speaker: Christopher Tsounis, Sr. Consulting I/T Specialist



Deep Compression in DB2 9

.

Platform:LUW

Disclaimer




This information could include technical inaccuracies or typographical errors. Changes are periodically
made to the information herein; these changes will be incorporated in new editions of the publication.
IBM may make improvements and/or changes in the product(s) and/or the program(s) described in
this publication at any time without notice. Any references in this information to non
-
IBM Web sites
are provided for convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM product and use of
those Web sites is at your own risk.


The licensed program described in this information and all licensed material available for it are
provided by IBM under terms of the IBM Customer Agreement, IBM International Program License
Agreement, or any equivalent agreement between us.


Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in the operating environments may vary significantly. Some measurements may have
been made on development
-
level systems and there is no guarantee that these measurements will be
the same on generally available systems.


Furthermore, some measurements may have been estimated through extrapolation. Actual results
may vary. Users of this document should verify the applicable data for their specific environment.


All statements regarding IBM's future direction or intent are subject to change or withdrawal without
notice, and represent goals and objectives only.


This information may contain examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals, companies,
brands, and products. All of these names are fictitious and any similarity to the names and addresses
used by an actual business enterprise is entirely coincidental.


Agenda


Compression Concepts and Overview


Enablement and Management


Compression Estimation


Use Cases


Implementation considerations


Logging


Limitations


Proof points: Compression Ratios and Performance


Utilities


Summary


White Papers




DB2 ‘Compression’


V8 GA
-

NULL and Default Value Compression


No disk storage consumed for NULL column values, zero length
data in variable length columns and system default values


V8 GA
-

Multidimensional Clustering (MDC)


Significant index compression can be achieved through block
indexes


One key per thousands of records (vs one key per record with
traditional indexes)


V8 FP4
-

Database Backup Compression


Smaller backup images; compress index and lf/lob tablespaces


DB2 9
-

Deep Compression


DB2 Storage Optimization Solution

The Compression Value Proposition:



Use less data pages to store table data
.

What are the benefits?



Consume
less

storage


Consume storage at a
slower

rate


Possibly,
improve

your performance




Compression Basics


It is substitutional: replaces longer strings with 12 bit symbols e.g.,

“Once upon a time ………… The End”


“Once”
-
> [symbol # 0]


Uses a static
dictionary
: [string, symbol #] list is pre
-
built and stored
in the table e.g.,

a sample dictionary

{(“Once”, 0), (“upon a”, 1), …(“The End”, 4095)}


Compression is just the process of matching a byte string (data row)
into the longest sub
-
strings present in the tree, and substituting the
corresponding symbol #s



Expansion is even easier: given a compressed row

[symbol #][symbol #] …

simply do a look up and substitute each [symbol #] with the
corresponding byte string



DB2 Deep Compression Concepts


Dictionary based
-

symbol table for compressing/decompressing data records


Lempel
-
Ziv (LZ) based algorithm (static dictionary) utilizing 12bit symbols


Dictionary per table stored within the permanent table object (~75KB in size;
disk+memory)


Data resides compressed on pages (both on
-
disk and in bufferpool)


Significant I/O bandwidth savings


Significant memory (bufferpool) savings


CPU costs


Rows must be decompressed before being processed for evaluation


Log data from compressed records in compressed format


Does not compress rows where no storage saving is realized for that row


“red”
-
> ‘#1#2#3’ symbol #1=“r” symbol #2=“e” symbol #3=“d”

“red” == 3 bytes ‘#1#2#3’ == 12bits+12bits+12bits ( or 4.5 bytes)


More beneficial to store “red” versus longer compressed format of “red”

DB2 Data Row Compression


Repeating patterns within the data (and just within each row) is the key to good
compression. Text data tends to compress well because of reoccurring strings as well as
data with lots of repeating characters, leading or trailing blanks



01

Fred

02

Smith

03

500

04

1

05

0000 Plano TX 24355

06

John

07

2

Name

Dept

Salary

City

State

ZipCode

Fred Smith

500

10000

Plano

TX

24355

John Smith

500

20000

Plano

TX

24355

Fred Smith

500

10000

Plano

TX

24355

John Smith

500

20000

Plano

TX

24355

01

02

03

04

05

06

02

03

07

05

Dictionary

‘Fred Smith 500 10000 Plano TX 24355’

‘John Smith 500 20000 Plano TX 24355’

Logical Table

String Inputs seen by Compression Algorithm

Symbol Pattern

Data Rows on Disk

(35 bytes per row)

70 bytes now represented by

10 symbols (1.5bytes x 10 = 15bytes)

DB2 Data Row Compression

Common sequences of
consecutive bytes in row
replaced with 12 bit symbol

Data page with
compressed rows

000 PLANO TX 24355

’05’

Uncompressed Row

Compressed Row

Data page with
uncompressed rows

‘05’

How Do I Compress?

In order to compress data,
two

pre
-
requisites must be satisfied:


1)
The table COMPRESS attribute must be set to YES

2)
A Compression Dictionary must exist for the table object



Once
both

step 1) and step 2) are satisfied, all data

subsequently populated into the table is subject to being

compressed.

STEP 1. Enablement
-

Table DDL


CREATE TABLE <table name>
---
>


|
---
COMPRESS NO
---
|


----
+
-----------------
+
----
>


|
---
COMPRESS YES
--
|



ALTER TABLE <table name>
---
>


--
+
----------------------
+
----
>


|
--
COMPRESS
--
+
-
YES
--
+
--
|


|
--
NO
--
|


Compression is
enabled

at the table level via either the
CREATE or ALTER TABLE statements


Compression will be in
effect

once a table dictionary is built


STEP 2. Dictionary Building




Classic or ‘Offline’ Table REORG


When the compression dictionary is being built a temporary in
-
memory buffer of
10MB is required


This memory will be allocated from the utilities heap


All the data rows that exist in a table will participate in the building of the
compression dictionary



>
--
REORG
--
<table name>
--
+
-----------------------
+
----
>


'
--
INDEX
--
<index name>
--
'


.
-
ALLOW READ ACCESS
-
.

>
--
+
-------------------
+
--
+
---------------
+
--
+
-----------
+
-
>


'
-
ALLOW NO ACCESS
---
' '
-
USE
-
<tbspace>
-
' '
-
INDEXSCAN
-
'


.
-
KEEPDICTIONARY
---
.

>
--
+
-------------
+
-
+
-------------------
+
-
+
--
>


'
-
LONGLOBDATA
-
'

'
-
RESETDICTIONARY
---
'

INSERT

LOAD

EMPTY TABLE

TABLE

REORG

Uncompressed Row Data

Compressed Row Data

Dictionary

INDEX

Compression Dictionary Build in DB2 9

Dictionary Management by Table REORG

RESETDICTIONARY

Table COMPRESS Attr

Dictionary Exists

Result Outcome

YES

YES

Build new dictionary; rows compressed

YES

NO

Build new dictionary; rows compressed

NO

YES

Remove dictionary; all rows
uncompressed

NO

NO

No effect

KEEPDICTIONARY

Table COMPRESS Attr

Dictionary Exists

Result Outcome

YES

YES

Preserve dictionary; rows compressed

YES

NO

Build dictionary; rows compressed

NO

YES

Preserve dictionary; all rows
uncompressed

NO

NO

No effect

Compression Estimator (Advisor)


DB2
INSPECT


This tool looks at all the rows of the table data, and builds a compression dictionary from it.
This dictionary will then be used to test compression against the records contained in the
sample. Results include:


Estimate of compression savings


Dictionary size


Will insert the dictionary if COMPRESS YES is set


Allows for online dictionary creation/insertion


Future rows inserted/updated are compressed


Does not address existing rows (REORG to be used)



>>
-
INSPECT
-
+
-
| Check Clause |
--------------------
+
--------------
>


'
-
| Row Compression Estimate Clause |
-
'



|
--
ROWCOMPESTIMATE
-
TABLE
--
+
-
NAME
--
<table
-
name>
--
+
------------------------
+
-
+
-
|


| '
-
SCHEMA
--

<schema
-
name>
-
' |


'
-
TBSPACEID
--

<tbspc id>
--
OBJECTID
--

<obj id>
---
'





DB2 INSPECT


Compression Evaluation


INSPECT ROWCOMPESTIMATE TABLE NAME <tbname>


SCHEMA <schema> RESULTS KEEP <filename>


In the db2dump directory, you will see <filename>


db2inspf <filename> <outfile>


../sqllib/db2dump/<outfile> will contain the results of compression estimation


Action: ROWCOMPESTIMATE TABLE

Schema name: MIKEW

Table name: EMPLOYEE

Tablespace ID: 2 Object ID: 6

Result file name: emp




Table phase start (ID Signed: 6, Unsigned: 6; Tablespace ID: 2) : MIKEW.EMPLOYEE




Data phase start. Object: 6 Tablespace: 2


Row compression estimate results:


Percentage of pages saved from compression: 46


Percentage of bytes saved from compression: 46


Percentage of rows ineligible for compression due to small row size: 0


Compression dictionary size: 13312 bytes.


Expansion dictionary size: 10240 bytes.



Data phase end.


Table phase end.


INSERT

LOAD

EMPTY TABLE

INSPECT

ROWCOMPESTIMATE

Uncompressed Row Data

Uncompressed Row Data

Dictionary

INDEX

DB2 9 INSPECT COMPRESSION DICTIONARY BUILD

INSPECT
Online

Dictionary Build


ROWCOMPESTIMATE option always
builds

a
compression dictionary as part of it’s processing


Dictionary
insert

is only applicable
if

no dictionary
exists in the table
and

the table COMPRESS attribute
is YES


IX Table Lock is required (same as INSERT)


Alter Table Lock is required

Disabling Table Compression


For a table that is already compressed, compression can be disabled
by setting the table COMPRESS attribute to NO


the dictionary stills exists in the table but is no longer used to compress


subsequent table population is not subject to compression


UPDATE of existing compressed records changes record to uncompressed


The compress attribute can be toggled back to YES and as long as the
dictionary still exists, it will be respected (data is subject to compression again)


The compression dictionary can only be removed during table truncation
operations

Dictionary Removal / Characteristics


The dictionary is deleted only during table truncation, if
the COMPRESS table attribute is turned off


If compress attribute is set, dictionary is
preserved


Operations such as TABLE REORG or LOAD REPLACE
or IMPORT REPLACE drive table truncation
-

dictionary
can be removed


The Dictionary is Static
-

No other alters or updates are
performed against the dictionary, or any other dictionary
records


Row Compression: New Table


Create Table that is Eligible for Compression

CREATE TABLE Sales COMPRESS YES


Get Representative Data Sample

LOAD FROM filesmall OF DEL REPLACE INTO Sales





Creates dictionary on sample data

REORG TABLE Sales


Load respects dictionary


LOAD FROM filerest OF DEL INSERT INTO Sales


Row Compression: New Table
Partitions


Makes table eligible for compression


ALTER TABLE Sales COMPRESS YES


New Sales Data Range




CREATE TABLE NewSales … COMPRESS YES



Load Data into New Partition


LOAD FROM file OF DEL REPLACE INTO NewSales


Data Now Gets Compressed


REORG TABLE NewSales RESETDICTIONARY


New Partition is Added to the Table

ALTER TABLE Sales


ATTACH PARTITION Mar05


STARTING '03/01/2005'


ENDING '03/31/2005'


FROM TABLE NewSales


Execute Set Integrity


DB2 INSPECT


Partitioned Tables

db2 “describe data partitions for table bminor.t_dpart show detail”

db2 "create table t_dpart (c1 int, c2 varchar(100))


partition by range(c1) (starting from (1) ending (100) every(25)) compress yes”



PartitionId PartitionName TableSpId PartObjId LongTblSpId AccessMode Status

-------------

-----------------

-------------

------------

----------------

-----------------------


0 PART0 3 260 3 F



1 PART1 3 261 3 F



2 PART2 3 262 3 F



3 PART3 3 263 3 F




db2 inspect rowcompestimate table
tbspaceid 3 objectid 260

results keep inspd.out

New Administrative Table Function



db2 describe "select * from
table(sysproc.
admin_get_tab_info
(‘MIKEW','STAFF')) as t"



sqltype sqllen sqlname.data sqlname.length


--------------------

------

------------------------------

--------------


449 VARCHAR 128 TABSCHEMA 9


449 VARCHAR 128 TABNAME 7


453 CHARACTER 1 TABTYPE 7


501 SMALLINT 2 DBPARTITIONNUM 14


497 INTEGER 4 DATA_PARTITION_ID 17


453 CHARACTER 1 AVAILABLE 9


493 BIGINT 8 DATA_OBJECT_L_SIZE 18


493 BIGINT 8 DATA_OBJECT_P_SIZE 18


493 BIGINT 8 INDEX_OBJECT_L_SIZE 19


493 BIGINT 8 INDEX_OBJECT_P_SIZE 19


493 BIGINT 8 LONG_OBJECT_L_SIZE 18


493 BIGINT 8 LONG_OBJECT_P_SIZE 18


493 BIGINT 8 LOB_OBJECT_L_SIZE 17


493 BIGINT 8 LOB_OBJECT_P_SIZE 17


493 BIGINT 8 XML_OBJECT_L_SIZE 17


493 BIGINT 8 XML_OBJECT_P_SIZE 17


501 SMALLINT 2 INDEX_TYPE 10


453 CHARACTER 1 REORG_PENDING 13


449 VARCHAR 10 INPLACE_REORG_STATUS 20


449 VARCHAR 12 LOAD_STATUS 11


453 CHARACTER 1 READ_ACCESS_ONLY 16


453 CHARACTER 1 NO_LOAD_RESTART 15


501 SMALLINT 2 NUM_REORG_REC_ALTERS 20


453 CHARACTER 1 INDEXES_REQUIRE_REBUILD 23


453 CHARACTER 1 LARGE_RIDS 10


453 CHARACTER 1 LARGE_SLOTS 11


493 BIGINT 8 DICTIONARY_SIZE 15


Compression Catalog Information


syscat.tables

Column Name

Data
Type

Description

COMPRESSION

Char(1)

B = Both value and row compression are activated

N = No compression is activated; a row format that does not support
compression is used

R = Row compression is activated; a row format that supports
compression might be used

V =Value compression is activated; a row format that supports
compression is used

Blank = Not applicable

AVGROWSIZE

Smallint

Average length (in bytes) of both uncompressed and compressed rows
in this table;
-
1 if statistics are not collected

PCTPAGESSAVED

Real

Approximate percentage of pages saved in the table as a result of row
compression. This value includes overhead bytes for each user data row
in the table, but does not include the space that is consumed by
dictionary overhead;
-
1 if statistics are not collected.

PCTROWSCOMPRESSED

Real

Compressed rows as a percentage of the total number of rows in the
table;
-
1 if statistics are not collected

AVGROWCOMPRESSIONRATIO

Smallint

For compressed rows in the table, this is the average compression ratio
by row; that is, average uncompressed row length divided by the
average compressed row length;
-
1 is statistics are not collected

AVGCOMPRESSEDROWSIZE

Smallint

Average length (in bytes) of compressed rows in this table;
-
1 if statistics
are not collected


New Larger RIDs



More Records/Data Page

Page
Size

REG TBSP

Min Rec
Length

REG TBSP

Max Records

LARGE
TBSP

Min Rec
Length

LARGE
TBSP

Max Records

4 KB

14

251

12

287

8 KB

30

253

12

580

16 KB

62

254

12

1165

32 KB

127

253

12

2335

RID = Record Identifier = (page_number, slot_number)

SQL Access Plans: Explain Tables


db2exfmt


Create explain tables
i.e. db2

tvf $HOME/sqllib/misc/EXPLAIN.DDL


Generate or explain the access plan:
db2 explain all for <statement>

db2exfmt
-
d <dbname>
-
n %
-
e %
-
s %
-
w
-
l
-
# 0
-
o <resultsFile>
-
g TIC



Access Plan:

-----------


Total Cost: 50.4599


Query Degree: 1



Rows


RETURN


( 1)


Cost


I/O


|


35


TBSCAN


( 2)


50.4599


6


|


35


TABLE: BMINOR


STAFF2

Objects Used in Access Plan:

---------------------------


Schema: BMINOR


Name: STAFF2


Type: Table


Time of creation: 2006
-
11
-
22
-
09.34.28.735089


Last statistics update: 2006
-
11
-
22
-
09.37.21.861142


Number of columns: 7


Number of rows: 35


Width of rows: 41


Number of buffer pool pages: 6


Number of data partitions: 1


Distinct row values: No


Tablespace name: IBMDB2SAMPLEREL


Tablespace overhead: 7.500000


Tablespace transfer rate: 0.060000


Source for statistics: Single Node


Prefetch page count: 32


Container extent page count: 32


Table overflow record count: 0


Table Active Blocks:
-
1


Average Row Compression Ratio: 1.63207


Percentage Rows Compressed: 100


Average Compressed Row Size: 27

Compression


Log Records


Data will be stored in log records as compressed


INSERT and DELETE log records will consume less
storage


Although UPDATE log records also contain
compressed data, the overall amount of logged data
may be more than the uncompressed version


Compression and UPDATE Logging


There are three different types of UPDATE log records that can be
written by DB2:


Full before and after row image logging


Most number of bytes written


Only type of UPDATE logging for DATA CAPTURE CHANGES


Full XOR Logging


XOR differences between before and after row images


less logging than full
before/after image logging


Partial XOR



XOR differences between before and after images for UPDATEs where there is no row
size increase


When a compressed row is updated, even if the uncompressed
record lengths do not change, the compressed images can change


There could be a different number of symbols used to represent the new
updated version of the row, hence need Full XOR log record



Compression and UPDATE Processing


With the delivery of larger RIDs in DB2 9, more data records can be
packed onto a data page


If the data page is tightly packed with compressed records and there is
no free space

on a page for size increasing UPDATEs, a
pointer
-
overflow record will be created



a pointer to the new record will be
created on the existing page while the actual record will reside on
another page


For data that is heavily updated, reserve free space within a page via
ALTER TABLE PCTFREE

Log Consumption


Column Ordering Summary


Columns which are updated frequently (changing value) should be:


grouped together


defined towards or at the end of the table definition



These recommendations are actually independent of


Row compression


Row format (default or null/value compression)



The benefits would be:


better performance


less bytes logged


less log pages written


smaller active log requirement for transactions performing a large number
of updates.


What Tables Should be Compressed?

Some considerations:



Compression is about storage savings, so look at the largest tables
first


Generate potential savings via INSPECT ROWCOMPESTIMATE


Compress all tables where there is a 50% savings or more


What is the typical table activity


read only? Some write? Heavy
Write?


Characterize current system load: I/O and CPU


Largely I/O bound systems will benefit


Compress read
-
only tables


Benchmark typical workload against a compressed table(s) and
compare to uncompressed case


Determine in advance AVGROWSIZE and number of rows per page to
determine need for large RIDs


Collect stats and gauge compression effectiveness as data evolves


Limitations of Data Row Compression


Row compression is only supported for tables supported by table REORG. The
following tables cannot be compressed:


RCT tables (range
-
clustered tables)


Catalog tables


Declared global temporary tables


System temporary tables


Row compression is only applicable to rows in data objects table space!! (the key to
this is if it is stored with tables, or somewhere else)


Pure XML in separate object


XML data is not compressed


LOBs are stored in separate object


LOB data is not compressed


LONGs are stored in separate object
-

LONG data is not compressed


Indexes are not compressed


Row compression support and table data replication support will not be compatible.


DATA CAPTURE CHANGES option is not compatible with the COMPRESS YES option.
If the compress row attribute is NO for the table but the table has a dictionary, the
DATA CAPTURE CHANGES option is also not allowed since there may be rows that
are compressed in the table.


Compression is not practical for a table smaller than 100KB, since the space
savings may not offset the storage requirements for the dictionary.

Compression Ratios
(Customer Financial Data)


Compression Type 32KB Page Count

Space Required on Disk


No compression

5893888


179.9GB


Row compression 1392446


42.5GB


% Pages Saved: 76.4%

Compression Savings




Customer Data Warehouse

Data warehouse estimated reduction in size from
2.873TB to 1.453TB

169

72.8%

6.32

23.24

39573969

10

T6

172

66.9%

8.54

25.78

23370458

10

T5

43

74.2%

3.05

11.80

19474261

20

T4

105

74.4%

36.18

141.46

191362309

100

T3

162

75.7%

5.23

21.48

77390826

10

T2

58

81.5%

13.23

71.49

159259747

100

T1

Total Est. Disk
Savings

(GB)

Compression
Ratio

Sample Size
Compressed
(GB)

Sample
Size (GB
)

Number of
Rows

% of Table
Sampled

Table

TPCH
-

Space Savings per Table

TABLE

Number of 8K
Pages

Number of 8K
Pages with
Compression

Total Space Saved

PART

1 118 524

401 494

5.9GB

SUPPLIER

60 201

33 250

220MB

PARTSUP

4 698 630

1 659 084

24.9GB

CUSTOMER

1 004 915

553 667

3.7GB

ORDERS

6 550 873

2 738 548

31.2GB

LINEITEM

30 701 523

13 532 605

140.6GB

TOTAL

44 134 668

18 918 650

206.6GB

TPCH
-

Throughput Comparison

*higher is better


Number of compressed disks

= ½ number uncompressed disks

Query Performance


Sample Warehouse Query Workload


I/O bound system: observed overall improvements in end
-
to
-
end
workload execution time (20
-
30%)



CPU bound system: performance neutral or degradations on the
order of 10% in end
-
to
-
end workload



Compression trades I/O time for CPU time


If the system is CPU bound, or if an agent is already consuming
an entire CPU on its own, then compression will likely have a
negative impact


If there is CPU headroom and I/O waits are happening
compression can be beneficial



Backup Compression and Data Row Compression


Test backup compression in addition to tables with row compression


Backup compression can be expensive and may not provide much added value in
additional savings to backup image size


Time/size/value depends on the percentage of table space content with row
compression. E.g. Are all tables compressed? Are indexes or long data stored in the
same table space?


Scenario

Total User
Time
(seconds)

Pages Used

Table Space
Size (GB)

Backup Image
Size (GB)

No
compression

468

1510400

11.57

12

Backup
Compression
Only

1028

1510400

11.57

4.2

Data Row
Compression
Only

198

610816

4.68

4.7

Data Row and
Backup
Compression

662

610816

4.68

4.2

LOAD
10%,

REORG, Continue LOADing

TABLE

Time to Load
10% of Data
(sec)

Time to
Reorg (sec)

Time to Load
90% of Data
(sec)

Compression
Ratio

Total Time
(sec):
Load+Reorg

PARTSUPP

26.5

82.1

249.4

64%

358

ORDERS

36.4

86.4

411.9

57%

534.7

LINEITEM

163.5

330.4

1877.9

56%

2371.8

TABLE

Time to Load 100%
of Data (sec)

Time to Reorg
(sec)

Compression
Ratio

Total Time (sec):
Load+Reorg

PARTSUPP

260.9

559.6

65%

820.5

ORDERS

378.6

725.5

59%

1104

LINEITEM

1761.4

3572.8

57%

5334

LOAD
100%,

Full Table REORG

In Summary


The storage savings and performance impact of data row
compression are intimately tied to the characteristics of the
data within the database, the configuration of the database
(layout and tuning) as well as application workload.



In general terms, the net performance impact of compression
depends on whether the database is CPU or I/O bound.



Data row compression can significantly reduce table size at
the expense of additional CPU utilization.



Test drive DB2 9 to a get sense for what impact you can
expect with respect to both storage and performance.

Additional References


Chris Eaton blogs
-



http://blogs/ittoolbox.com/database/technology


IBM developerWorks


http://www.ibm/developerworks/

Introducing DB2 9: DB2 9 unveiled

http://www
-
128.ibm.com/developerworks/db2/library/long/dm
-
0607ahuja2/#download


Introducing DB2 9, Part 1: Data compression in DB2 9

http://www
-
128.ibm.com/developerworks/db2/library/techarticle/dm
-
0605ahuja/


Row compression in DB2 9
-

Analysis of a DSS database environment

http://www
-
128.ibm.com/developerworks/db2/library/long/dm
-
0610chang/


Shrink your database using DB2 9 for Linux, UNIX, and Windows row

Compression
-

New capabilities for reducing your disk storage in DB2 9

http://www
-
128.ibm.com/developerworks/edu/dm
-
dw
-
dm
-
0611read
-
i.html