Key/Value Pair versus hstore Benchmarking Entity ... - HSR-Wiki

disturbedoctopusΔιαχείριση Δεδομένων

27 Νοε 2012 (πριν από 4 χρόνια και 10 μήνες)

576 εμφανίσεις

Key/Value Pair versus hstore

Benchmarking Entity
-
Attribute
-
Value Structures in
PostgreSQL



HSR Hochschule für Technik Rapperswi
l

Institut für Software


Oberseestrasse 10

Postfach 1475

CH
-
8640 Rapperswil


http://www.hsr.ch




Advisor:

Prof. Stefan Keller



Author:

Michel Ott



Rapperswil

May 26
th
, 2011











Table of Figures



Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


II

Table

of Contents

Table of Figures

IV

Tables

IV

Fig
ures

IV

Listings

V

List Of Abbreviations

VI

1

Introduction

1

1.1

Project description

1

1.2

Restrictions on the scope of the project

1

2

Overview

2

2.1

PostgreSQL

2

2.2

Key Value Pair

2

2.3

Hstore

3

2.3.1

Functions

4

2.3.2

Working principle

5

2.4

Benchmark Tools

6

2.4.1

Pgbench

6

2.4.2

HSR Texas Geo Database Benchmark

6

3

Benchmark proposal

8

3.1

Terms

8

3.2

Generate / Preprocessing phase

8

3.3

Execution phase

9

3.4

Benchmark / Analysis phase

9

3.5

Performance Benchmark Design

10

3.5.1

Table Schema

10

3.5.2

Statements

11

3.5.3

Datasets

11

3.6

Test Application

13

4

Benchmark Mai 2011

16

4.1

Technical specification

16

4.2

Execution

16

4.2.1

Preparing Database

17

4.2.2

Generating Test Data

17

4.2.3

Executing Benchmark

18

4.3

Results

19

4.4

Findings

25

4.5

Conclusion

31

Bibliography

32

Table of Figures



Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


III

Appendix

33


Table of Figures



Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


IV

Table of Figures

Tables

Table 1: KVP additional information table

................................
................................
................................
.................

3

Table 2: KVP table

................................
................................
................................
................................
................................
.

3

Table 3: Columns of a test dataset record

................................
................................
................................
...............

12

Table 4: Number of dataset length

................................
................................
................................
.............................

13

Table 5: Input parameters for test data generator

................................
................................
..............................

14

Table 6: Input parameters for benchmark application

................................
................................
.....................

15

Table 7: Hardware speci
fication of system under test

................................
................................
......................

16

Table 8: Software specification of system under test

................................
................................
.........................

16

Table 9: Hstore table abstract

................................
................................
................................
................................
.......

26

Table 10: KVP table abstract

................................
................................
................................
................................
.........

28

Table 11: Hstore tuple examples

................................
................................
................................
................................
.

30


Figures

Figure 1: KVP example SQL statemen
t

................................
................................
................................
.........................

3

Figure 2: hstore create table SQL example

................................
................................
................................
................

3

Figure 3: hstore inse
rt SQL example

................................
................................
................................
.............................

4

Figure 5: Example of a test data file

................................
................................
................................
...........................

12

Figure 6:
Number of test cicle

................................
................................
................................
................................
.......

13

Figure 7: Test / benchmark application incl. test data generator

................................
................................

14

Figure 8: Database setup script

................................
................................
................................
................................
....

17

Figure 9: Test data generation script

................................
................................
................................
........................

18

Figure 10: Benchmark script example

................................
................................
................................
......................

19

Figure 11: Overview KVP vs. hstore benchmark

................................
................................
................................
..

20

Figure 12: Benchmark KVP vs. hstore from 10 to 2.5K

................................
................................
.....................

20

Figure 13: Overview

KVP hstore with combined index

................................
................................
....................

21

Figure 14: Benchmark KVP hstore from 10 to 2.5K with combined index

................................
...............

22

Figure 15: Overview of difference between KVP single and combined index

................................
.........

22

Figure 16: 10 to 2.5K:
Difference between KVP single and combined index

................................
...........

23

Table of Figures



Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


V

Figure 17: Overview of KVP with index on ‘key’ and combined index against hstore

........................

23

Figure 18: 10 to 2.5K: KVP with index on ‘key’ and combined index against hstore

...........................

24

Figure 19:
Index size overview

................................
................................
................................
................................
....

24

Figure 20: Index size for 10 to 5’000 records

................................
................................
................................
........

25


Listings

Listing 1: Hstore data type definition

................................
................................
................................
...........................

5

Listing 2: Registering a PostgreSQL operator

................................
................................
................................
...........

5

Listing 3: Defining a PostgreSQL function

................................
................................
................................
..................

5

Listing 4: KVP Benchmark Table

................................
................................
................................
................................
.

10

Listing 5: Hstore Benchmark Table

................................
................................
................................
............................

10

Listing 6: KVP index

................................
................................
................................
................................
..........................

11

Listing 7: Hstore index

................................
................................
................................
................................
.....................

11

Listing 8: KVP select example

................................
................................
................................
................................
.......

11

Listing 9: KVP select example

................................
................................
................................
................................
.......

11

Listing 10: Transformed test data to KVP SQL statement

................................
................................
................

12

Listing 11: Transformed test data to hstore SQL statement

................................
................................
...........

12

Listing 12: Explain Analyze statement

for hstore without index

................................
................................
..

26

Listing 13: Output of Explain Analyze statement for hstore without index

................................
.............

26

Listing 14: Explain Analyze statement for hstore with index

................................
................................
.........

26

Listing 15: Output of Explain Analyze statement for hstore with index

................................
....................

27

Listing 16: Explain Analy
ze statement for KVP

................................
................................
................................
.....

28

Listing 17: Output of Explain Analyze statement for KVP

................................
................................
................

28

Listing 18: Programming example of explain sequences

................................
................................
.................

28

Listing 19: Output of Explain Analyze statement for KVP with index on attribute ‘key’

....................

29

Listing 20: Explain Analyze statement for KVP

................................
................................
................................
.....

29

Listing 21: Output of
Explain Analyze statement for KVP with combined index

................................
...

30

Listing 22: Querying all person how lives in region with zip code ‘8000’

................................
................

30


List Of Abbreviations



Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


VI

List Of Abbreviations

Abbreviation

Description

bash

B
ourne
-
a
gain
sh
ell

CPU

C
entral
P
rocessing
U
nit

DB2

Commercial relational database management system developed by IBM

GiST

G
eneral
i
zed
S
earch
T
ree

KVP

K
ey
V
alue
P
air

ms

m
illi
s
econds

MSSQL

Commercial relational database management system produced by Microsoft

OpenFTS

Open

Source
F
ull
T
ext
S
earch engine

Oracle

Commercial object
-
relational database management system produced by
Oracle Corporation

PgSQL

P
ost
g
re
SQL



Open source object relational database

PostGIS

Adds support for geographic objects to the PostgreSQL object
-
relational d
a-
tabase

SQL

S
tructured
Q
uery
L
anguage

URL

U
niform
R
esource
L
ocator



Chapter
1
:
Introduction


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


1

1

Introduction

The following chapter describes the scope of the project and its boundaries and restrictions. In
general the goal is to benchmark the performance of PostgreSQL key
-
value
-
pairs against Pos
t-
greSQL hstore data

type.

1.1

Project description

As part of this term paper a project evolved to benchmark PostgreSQL key
-
value
-
pairs, further
referred to as KVP, versus PostgreSQL in combination with hstore, further referred to as hstore
data type (probably an abbreviation for

hash storage structure à la Perl hash). Hstore is part of
PostgreSQL distribution since

version

8.2
as an additional module

and storage for semi structural
data with GiST index access. The PostgreSQL core distribution does not know of key value pair
(KVP)

information in a single attribute. That means it is not possible to store an associative array
e.g. {surname : ‘John’, name : ‘Smith’} in a attribute and query John’s name. This additional fun
c-
tionality was introduced by Oleg Bartunov and Teodor Sigaev an
d enhanced by Andrew Gierth
under the synonym ‘hstore’. Hstore is an enhancement for PostgreSQL, which provides a new
data type and a bunch of functions to store and query for KVP information.

Dictionaries or ass
o-
ciative arrays are the parent synonym for k
ey value pairs (KVP) or abstract data structures
(ADT). They handle pairs, also known as items, as keys and
their

corresponding values. Most
modern script languages support dictionaries/associative arrays as a primary container type.
KVP is also called ent
ity attribute value model (EAV) or object attribute value mode.

Both techniques, KVP and hstore, store arbitrary data as objects in the database, however, the
design of the database tables and therefore its schema is quite different in a way how the data i
s
stored, referenced and queried. The goal of this document is to find an answer which technique
performs faster.

1.2

Restrictions on the scope of the project

The purpose of this benchmark is not to test the stability or scalability of PostgresSQL. This is
ra
ther a test of KVP and hstore on a given PostgreSQL environment. This implies that the pe
r-
formance tuning of PostgreSQL is not in scope of this document. Nevertheless this document and
all the applications can be the basis for such a test. In addition, ben
chmarking the insertion of the
test data is also out
-
of
-
scope.

Secondly, the semantics of the data, which is used for the test, is not relevant for any outcome of
the benchmark. For this reason the ‘lorem ipsum dolor’
1

dummy text is used in the test data g
e
n-
eration application and a name generator, which is based on combining syllables.





1

‘lorem ipsum dolor’ is a nonsense paragraph which aims at demonstrating a font to a reader without di
s-
tracting him by the ‘gibberish’ of the text

(Walsh, 1996).

Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


2

2

Overview

Today PostgreSQL has a huge community, not only because it is for free, but also due to the fact
that it has a lot of extensions like the geospatial extension Post
GIS or the hstore mentioned b
e-
fore. The following chapter first describes what PostgreSQL is, then It explains the difference b
e-
tween key
-
value
-
pairs (KVP) as a table structure versus KVP using the hstore data type.

By studying subchapter
2.2

Key Value Pair

and
2.3

Hstore

you will recognize that KVP stores the
key and one or more related values in different table columns whereas hstore introduces a new
abstract data type allowing storing an associated array in the form of unique keys and related
values
with
in a single tabl
e column
. Suitable is KVP for easy data storage and data capture, rows
with many attributes that are rarely examined, and semi structured data.

2.1

PostgreSQL

PostgreSQL is an open source relational database, even an object
-
relational database according
(Postg
reSQL Global Development Group)
. Since its life of over 15 years, PostgreSQL has a proven
standing in different applications fields. This, because it implements a set of capabilities that are
well known from proprietary software vendors like Oracle, IBM DB
2 or Microsoft and of course it
provides all the other features such as scalability, maintainability, asynchronous replication, etc.
This and many more brings PostgreSQL in a position of a real competitor for proprietary sof
t-
ware vendors in companies of di
fferent size and as of actual state, PostgreSQL is an enterprise
class database
(PostgreSQL Global Development Group)
.

Additionally, PostgreSQL’s SQL implementation conforms the ANSI
-
SQL:2008 standard and i
m-
plements beside the standard SQL statements selec
t, insert, etc. also primary and foreign keys
with restrictions, check constraints, unique constraints, cascading and many more.

Its highly customizable environment let users and developers easily extend PostgreSQL. Exa
m-
ples are Generalized Search Tree (Gi
ST), Open Source Full Text Search engine (OpenFTS) and
PostGIS. This is done by using one of dozen different programming languages, including Java,
Perl, Python, Ruby, Tcl, C/C++, and its own PL/pgSQL. Developers and users can draw upon hu
n-
dreds of built
-
i
n functions of the standard library from basic math and string operations to cry
p-
tography and Oracle compatibility
(PostgreSQL Global Development Group)
.

2.2

Key Value Pair

Not to be mistaken, key value pair (KVP) in the context of this paper is not the same a
s known in
database context. The normal interpretation is an associated array in an abstract data type, the
value, along a unique identifier, the key. This purpose belongs in this paper to the term ‘hstore’
described in the chapter
below
.

Thus, KVP means the standard way of creating, maintaining, and using tables in PostgreSQL with
exactly two attributes key and value. That means that the KVP table can store differe
nt unique
key value pairs for specific information. That implies that for a given key value pair a unique
identifier is needed that reverses to this information. Consequently that KVP table needed to be
Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


3

enhanced with an identifier and an additional table i
s needed to store additional information.
From this it follows that the base schema of the KVP structure look likes the following.

Table: bench_kvp_info

id : Integer

attribute_1 : Text

attribute_n : Text

Table
1
: KVP additional
information table

Table: bench_kvp

id_fk : Integer

key : Text

value : Text

Table
2
: KVP table

The bench_kvp_info table holds a unique identifier for specific information and the additional
data to it. For example it could hold the

information of a restaurant such as street, postal code,
phone number and so on. The bench_kvp table stores in addition information that is not forese
e-
able. Such as those, that could describe or give more specific information to the restaurant like
the ty
pe of cuisine, a URL to its homepage, and so on.

Key value pairs are information that specify and information more exactly but not necessary
mandatory for all data in the information table. This structure allows adding easily new non
-
mandatory information
without touching the table schema.

In Postgresql it can be setup as follow:

1

CREATE TABLE

bench_kvp_info (


id
integer

PRIMARY KEY
,


attribute_1
text
,


attribute_n
text

);

2

CREATE TABLE

bench_kvp (


id
integer

REFERENCE
bench_kvp_info(id),


ke
y
text
NOT NULL
,


value
text


);

Figure
1
: KVP example SQL statement

2.3

Hstore

Hstore means in this paper an associated array stored in an abstract data type composed of a
unique identifier based on the ‘hstore’ PostgreSQL
enhancement developed by Oleg Bartunov and
Teodor Sigaev

as an additional module
. The hstore enhancement introduces an abstract data
type called ‘hstore’ which can store an associated array. In addition it provides a bunch of diffe
r-
ent functions and operat
ors as well as indexing possibilities on the abstract data type. An index
can be created on the GiST, GIN, BTree, or Hash engine
(Bartunov, Sigaev, & Gierth)
.

A pure hstore table can be created in the following way:

1

CREATE TABLE

bench_hstore (


kvp_hsto
re
hstore

);

2

CREATE INDEX

hidx
ON

bench_hstore
USING GIST
(kvp_hstore);

Figure
2
: hstore create table SQL example

Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


4

Inserting a tuple is as easy as creating an attribute of type hstore:

1

INSERT INTO

bench_hstore(kvp_hstore)
VALUES
(


hstore
(‘


’’
id
’’
=>
1,


‘‘
surname
’’
=>
’’
McNeal
’’
,


‘‘
forename
’’
=>
’’
Bob
’’


’)

);

2

INSERT INTO

bench_hstore(kvp_hstore)
VALUES
(


hstore
(‘


’’
id
’’
=>
2,


‘‘
surname
’’
=>
’’
Gates
’’


’)

);

Figure
3
: hstore insert
SQL example

2.3.1

Functions

As you can see in the

above

example, the length of the array may vary from tuple to tuple. Impo
r-
tant to see is that each line the associated array has a key and a value separated by comma e.g.

hstore
(‘
’’
id
’’
=>
2,
‘‘
surname
’’
=>
’’
Gates
’’
’)

hst
ore
(‘
‘‘
<key 1>
’’
=>
‘‘
<value 1>
’’
, …,
‘‘
<key n>
’’
=>
‘‘
<value n>
’’
’)

means that we have two different unique keys, ‘id’ and ‘surname’, and each unique key has a va
l-
ue, for ‘id’ it is ‘2’ and for ‘surname’ it is ‘Gates’. Unique means that in a tuple a key can only
be
defined once. For example the id ‘surname’ can only appear once in the same tuple; the following
hstore is not allowed:

hstore
(‘
’’
id
’’
=>1,
‘‘
surname
’’
=>
’’
Gates
’’
’,
‘‘
surname
’’
=>
’’
McNeal
’’
)

As mentioned above in the introduction of this chapter hstore provide in a
ddition to the ‘hstore’
data type a lot of PostgreSQL functions and operators, which can be used for querying, manip
u-
lating, and comparing values. At this point only some of them are explained.

The most important one is to query a specific value in a key.
In the KVP methodology you do it
like this:

SELECT

surname, forname
FROM

bench_kvp
WHERE

id = 2;

On a hstore data type it works like as follow:

SELECT

kvp_hstore
-
>’surname’
AS

surname, kvp_hstore
-
>’forname’
AS

forename
FROM

bench_hstore
WHERE

kvp_hstore
-
>
’id’ = ‘2’;

Although the id is in this example an integer, you need query it as if it is a string (see the WHERE
clause).

Because tuples can have hstores of different length and therefore various keys, it is sometimes
important to find out if a key exists
in the hstore attribute:

SELECT

kvp_hstore ? ‘forename’
AS

available
FROM

bench_hstore;

This statement gives you for each tuple a ‘t’ for true or ‘f’ for false if key ‘forename’ is whether
available or not.

Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


5

Or maybe you want to know all possible keys in a

hstore:

SELECT

skeys
(kvp_hstore)
AS

keys
FROM

bench_hstore
GROUP

BY

keys;

To become the key only once in the result list, a ‘GROUP BY’ clause by keys need to be added to
the statement.

2.3.2

Working principle

Hstore is implemented in C as a PostgreSQL add
-
on an
d provides a SQL script to install the data
type and all the PostgreSQL functions. Hstore tries
to build a buffer

over all the keys, which are in
the hstore data type, if they are in alphabetical order. If not in some special functions the array
will be so
rted to have it alphabetical order. Hstore as data type is defined as follow:

1

CREATE TYPE

hstore (


INTERNALLENGTH =
-
1,


INPUT = hstore_in,


OUTPUT = hstore_out,


RECEIVE = hstore_recv,


SEND = hstore_send,


STORAGE = extended

);

Listing

1
: Hstore data type definition

The important parameter is the ‘INPUT’ one, which is linked to a C method. The ‘hstore_in’ m
e-
thod parses the hstore string to a C structure that holds the key, value, and length of the key and
value

as well as the position in the array. The position is needed because the array is not really
stored as an array in the database but rather as string (see
Table
10
: KVP table abstract
). Quer
y-
ing now for a specific key value pair combination, hstore needs only to loop over this buffer and
find the key right key value combination to get the position of the key and value in the array.

Exemplary we take the comparison opera
tor ‘
-
>’. Each operator, such as the hstore operators,
needs to be registered in PostgreSQL. Executing the following statement can do this:

1

CREATE OPERATOR

-
> (


LEFTARG = hstore,


RIGHTARG = text,


PROCEDURE = fetchval

);

Listing
2
: Registering a PostgreSQL operator

The important part is the ‘PROCEDURE’ parameter. This parameter links the operator to a Pos
t-
greSQL function. So that means the ‘fetchval’ function need to be defined as follow:

1

CREATE OR REPLACE FUNCTION

fetc
hval
(hstore,text)

2

RETURNS

text

3

AS

'MODULE_PATHNAME',
'hstore_fetchval'

4

LANGUAGE

C
STRICT

IMMUTABLE
;

Listing
3
: Defining a PostgreSQL function

As we see, the function returns a text. The PostgreSQL function ‘fetchval’ is now linked
to a C
-
implementation called ‘hstore_fetchval’ defined on line 3 in

Listing
3
.


Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


6

For more information please visit the official PostgreSQL hstore documentation
2
.


2.4

Bench
mark Tools

Currently two programs should be mentioned for benchmarking PostgreSQL. Both, pgbench and
HSR Texas Geo Database Benchmark, are running in sequential mode SQL statements to test the
database under test. For the test proposed in this paper an own

benchmark tools has been wri
t-
ten to fulfill the desired hypotheses.

2.4.1

Pgbench

Pgbench

is shipped in the PostgreSQL distribution package and

runs test on a PostgreSQL i
n-
stance in a sequential mode. Sequential mode means that the same SQL statement is run ove
r and
over in possible multiple concurrent database sessions, which fulfill the multi
-
processing arch
i-
tecture. At the end of the benchmark it calculates the average transaction time per seconds.
Pgbench provides an own scripting language to customize the t
est scripts for using own data sets
and test SQL statements. In addition it includes some industry
-
standard test cases, which let you
compare PostgreSQL with other database products
(Smith, 2010, S. 189)
.

Custom scripts in pgbench allow you to create your
own test scripts. It can handle statements
with variable, which are known in Java as prepared statements. It is possible to define a SELECT
statement like this:

1

SELECT

value

FROM

bench_kvp

WHERE

forname

= :
sforname
;

As you can see ‘:sforename’ is a variabl
e that will be replaced on runtime with the variable value.
All your statements need to be wrapped around the statements ‘BEGIN;’ and ‘END;’, which d
e-
fines the beginning and the end of the benchmark.

Variables like ‘:sforename’ can be filled before the tes
t begins that means before the ‘BEGIN;’
-
statement. Each variable assigned need to be set by the function ‘
\
set’:

1

\
set sforname ‘Greg’

Pgbench provides additional functions like ‘
\
setrandom’ for a random integer number, ‘
\
setshell’
to read the result of a s
hell command into a variable or ‘
\
shell’ to run a shell command but igno
r-
ing the result, and ‘
\
sleep’ causes the script execution to sleep for a specific duration.

2.4.2

HSR Texas Geo Database Benchmark

HSR Texas Geo Database Benchmark is another benchmark progr
am written in Python to test
spatial database systems regarding its performance. The benchmark is based predefined set of
queries consisting simple spatial statements. The queries are run on different data sizes to mon
i-



2

PostgreSQL hstore documentation can be found here:
http://www.postgresql.org/docs/current/static/hstore.html

The explenation in this paper are based on the PostgreSQL hstore documentat
ion 9.0:
http://www.postgresql.org/docs/9.0/static/hstore.html

Chapter
2
:
Overview


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


7

tor the behavior. For all this querie
s this program provides the test data that comes from Texas
USA. A test script looks like this:

1

SELECT

count(*)


FROM

{dataset polygons} pg


WHERE

ST_Intersects(@bbox, pg.geo);

This SELECT statement counts all polygons that intersect with a given
bounding box ‘@bbox’.

In general, the benchmark program is based on a cube.
Different queries can be run on different systems by u
s-
ing different data
set. Each dataset will be installed on
each system and all queries are run on all systems times
the number of datasets. This guarantees that the diffe
r-
ent systems can be compared, because all are using the
same data and statements.




Figure
4
: HSR Texas Geo Database
Benchmark Cube

Source:
(Krummenacher, 2009)

Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


8

3

Benchmark proposal

Before we can have a look at the benchmarking utility and the result, we need first to consider
what the ingredients of a benchmark are. The term benchmark can be substitute into three di
f-
ferent processes described in the following chapter.

3.1

Terms

The term
benchmark has a high cohesion to the term test. Looking into the Cambridge dictionary
benchmark is defined as follows:

“a level of quality which can be used as a standard when comparing other things”
(Cambridge University Press)
.

“That by which the existen
ce, quality, or genuineness of anything is or may be d
e-
termined; […]”
(Oxford University Press)
.

The definition suggests that two different things of approximately the same topic need to be put
in contrast to each other. That means that the ‘things’ need t
o be converted in a form that makes
them comparable. At this point the term test come in place, which does exactly this transform
a-
tion:

“an act of using something to find out whether it is working correctly or how effe
c-
tive it is”
(Cambridge University Pre
ss)

“To evaluate or check (something) by comparison with an established standard; to
measure against a comparable or equivalent point of reference, esp. in order to a
s-
sess performance or set performance standards.”
(Oxford University Press)

(Cambridge Univ
ersity Press)
.

A test tries to find a form, which makes the ‘things’ comparable. The result of the test act is a
standardized input form for the benchmark. To bring the ‘things’ into this comparable form, an
upstream process needs to provide meaningful dat
a that can be transformed by the test process
and analyzed with accurate techniques by the benchmarking process to find at the end concl
u-
sions and define further activities.

3.2

Generate / Preprocessing phase

The generate / preprocessing phase is the first ste
p in a benchmark. This process tries to find
accurate data and need to be prepared for the testing. The steps in this phase must not be und
e-
restimated, because the impact of a statistical irrelevant result is very high. The better the data
and therefore th
e ground for the test itself is the higher is the probability that the analysis of the
test result gives a significant outcome.

Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


9

So some questions need here to be considered:



What exactly I want to test?



What should be the data and can they be transformed
in the test process?



Does the data fit into the given environment?



Etc.

All these questions are very fundamental and often at the first moment very easy to answer.
However, finding the right data that fit into the environment and test process is not that e
asy e.g.



test data has a wrong encoding and cannot be loaded,



data does not cover the whole test design, and no significant result can be achieved,



etc..

3.3

Execution phase

The test execution phase defines the design of the chosen test technique. In general
it can be d
i-
vided into the following fields:

Load Testing
: measures and establishes benchmarks for the system under test by pushing
transactions to the system. It can be incremental, or can be set amount that is proportional to the
values of the system.

Performance Testing
: that is run repeatedly until acceptable performance levels are achieved
through database tuning activities.

Stress Testing
: is the goal to break the system under test to gauge the system limits.

Volume Testing
: is similar to load testi
ng, but involves placing a large amount of requesting o
b-
jects on the system.

The data for the system under test comes from the preprocessing phase and need to be tran
s-
formed into a form that can be loaded into the data source (e.g. inserting data in a data
base). In
the test design need to be defined if the insertion of the data need also to be tested or only the
queries.

3.4

Benchmark / Analysis phase

After all data has been loaded and run on the system under test, the given data from the test e
x-
ecution phase n
eed to be analyzed from different aspects. Here it is important to know in a
d-
vance, how the results could look like to choose the right algorithm and mathematical formulas
for the analysis. Tools can heavily support this process, by automatically trying di
fferent math
e-
matical functions like correlations, regressions, O
-
notation, and probabilities to calculate the
course of a graph.

At the end the user need to interpret the result of the benchmark phase and draw the right co
n-
clusions.

Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


10

3.5

Performance Benchmark D
esign

To benchmark the KVP and hstore in PostgreSQL the decision has been taken in favor of a pe
r-
fromance test. In this benchmark it is not important how stable and scalable PostgreSQL is, it is
more interesting how does KVP and hstore perform on given pre
configured PostgreSQL env
i-
ronment.

3.5.1

Table Schema

As described in chapter
2.2

Key Value Pair

and
2.3

Hstore
, the table schemas need to be defined
in su
ch a way that the comparison between KVP and hstore is “fair”. The goal of the schema def
i-
nitions is to have for both an associative array in matters of the data, which need to be stored. It
is not important to have an equal representation of the key value

pairs in the database, however
the philosophy of what information type at its granularity need to be stored and queried is i
m-
portant. It means that the data are note foreseeable in sense of additional information that could
be provided to a specific data

record.

In this benchmark we use the following table schemas to represent the associative array in a d
a-
tabase table.

For KVP

1

CREATE TABLE

bench_kvp_id (bench_id
BIGINT

PRIMARY KEY
);

2

CREATE TABLE

bench_kvp (


bench_id
BIGINT

REFERENCES

bench_kvp_id(benc
h_id),


key
TEXT

NOT NULL
,


value
TEXT

);

Listing
4
: KVP Benchmark Table

and for hstore:

1

CREATE TABLE

bench_hstore (


bench_id
BIGINT

PRIMARY KEY
,


bench_hstore
HSTORE

NOT NULL

);

Listing
5
:
Hstore Benchmark Table

You can imagine that the KVP and hstore table schema provides the same strategy of storing an
associative array. In spite of the strategy, the way how they store the data are quite different. The
KVP table creates for each key value
pair a new tuple and reference it to a unique identifier,
which holds the additional information to the key value pairs.

In contrast to KVP, hstore needs only one tuple for the key value pairs. Hstore saves the key value
pairs in an associative string arra
y, which looks like an array that developers used to use in their
programming language.

For both a test run will be executed once with an index and once without. For the KVP table the
standard PostgreSQL index take place, which is a bitmap index. In additi
on to the table creation
statement in
Listing
4

we need to create an index for the KVP table:


Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


11

1

CREATE INDEX

kvpidx1
ON

bench_kvp (key);

2

CREATE INDEX

kvpidx2
ON

bench_kvp (key, value);

Listing
6
: KVP index

Index for KVP shall be tested in two different ways. Firstly with a single index on the key
attribute and secondly a combined index on the attributes key and value.

For hstore an index

can be created as follow:

1

CREATE INDEX

hidx
ON

bench_hstore
USING GIST

(bench_hstore);

Listing
7
: Hstore index


3.5.2

Statements

To query a tuple based on a key value pair we have for each, KVP and hstore, an own SELECT
statement. Beca
use KVP needs for each key value pair a new tuple we have first to find the
unique identifier to the key value pair and then we can select the information we need. This e
x-
ample selects all the information of a person with surname McNeal:

1

SELECT

*
FROM

benc
h_kvp
WHERE

bench_id = (


SELECT

bench_id
FROM

bench_kvp


WHERE

key = 'surname'
AND

value = 'McNeal'

);

Listing
8
: KVP select example

By using hstore we need first to convert the attribute which stores the hstore string
into a hstore
object and then we could query the for a specific key value pair. The following statement does
this:

1

SELECT

*
FROM

bench_hstore


WHERE

hstore
(bench_hstore)
-
>'surname'='McNeal';

Listing
9
: KVP select example


3.5.3

Datasets

For the load process an own test data generator (see also the next chapter
3.6

Test Application
)
has been written in Python that generates based on the ‘lorem ipsum dolor’ dummy tex
t and ra
n-
dom numbers different sets of data. ‘lorem ipsum dolor’ is a nonsense paragraph which aims at
demonstrating a font to a reader without distracting him by the ‘gibberish’ of the text
(Walsh,
1996)
. Each dataset includes at maximum 5 attributes and
the amount of attributes varies from
dataset to dataset whereas the first two columns, id and forename, are mandatory in each record
to guarantee a valid key value pair. This variation guarantees that the hstore does not have the
same length and a more sig
nificant test can be achieved.

A sample record has the following attributes:

Column

Description

id : integer, sequence

Mandatory. A unique sequence identifier.

Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


12

surname : Text

Mandatory. A fancy name.

forename : Text

Optional: A fancy name. Can be empty
to have a variable KVP length.

zip : Integer

Optional: A number between 1000 and 9000. Can be empty to have a v
a-
riable KVP length.

comment : Text

Optional: A ‘dummy’ text. Can be empty to have a variable KVP length.

Table
3
:
Columns of a test dataset record

and an abstract of a test file looks as follow:

1

id,forename,surname,zip,comment

2

1,cucyp,,6593,lorem ipsum dolor sit amet consetetur sadipscing
elitr sed diam nonumy eirmod tempor invidunt ut labore et d
o-
lore magna aliquyam
erat sed diam voluptua at vero eos et acc
u-
sam et justo duo dolores et ea rebum stet clita kasd gubergren
no sea takimata sanctus est lorem ipsum dolor sit amet

3

2,kasarzyc,ecnalehad,8463,

4

3,inwa,,,

Figure
5
: Example of a test data f
ile

The test data records needs to be transformed into a valid KVP SQL statement like this:

1

INSERT INTO

bench_kvp_info(id)
VALUES
(1)

2

INSERT INTO

bench_kvp(id, key, value)
VALUES
(1, ‘id’, ‘1’);

3

INSERT INTO

bench_kvp(id, key, value)
VALUES
(1, ‘forname’,
‘cuc
yp’);

4

INSERT INTO

bench_kvp(id, key, value)
VALUES
(1, ‘zip’, ‘6593’);

5

INSERT INTO

bench_kvp(id, key, value)
VALUES
(1, ‘comment’, ‘l
o-
rem ipsum dolor sit amet consetetur sadipscing elitr …’);


6

INSERT INTO

bench_kvp_info(id)
VALUES
(2)

7

INSERT INTO

bench_kvp(id, key, value)
VALUES
(2, ‘id’, ‘2’);

8

INSERT INTO

bench_kvp(id, key, value)
VALUES
(2, ‘forname’, ‘k
a-
sarzyc’);

9

INSERT INTO

bench_kvp(id, key, value)
VALUES
(2, ‘surname, ‘e
c-
nalehad);

10

INSERT INTO

bench_kvp(id, key, value)
VALUES
(2, ‘zip, ‘8463’);


11

INSERT INTO

bench_kvp_info(id)
VALUES
(3)

12

INSERT INTO

bench_kvp(id, key, value)
VALUES
(3, ‘id’, ‘3’);

13

INSERT INTO

bench_kvp(id, key, value)
VALUES
(3, ‘forname’, ‘i
n-
wa’);

Listing
10
: Transformed test data to KVP SQL statement

and
hstore SQL statement like this:

1

INSERT INTO

bench_hstore(bench_hstore)
VA
L-
UES
(hstore(‘
’’
id
’’
=>1,
’’
forename
’’
=>
’’
cucyp
’’
,
’’
zip
’’
=>6593,
’’
com
ment
’’
=>
’’
lorem ipsum dolor …
’’
’))

2

INSERT INTO

bench_hstore(bench_hstore)
VA
L-
UES
(hstore(‘
’’
id
’’
=>2,
’’
forename
’’
=>
’’
kasarzyc
’’
,
’’
surname
’’
=>
’’
e
cnalehad
’’
,
’’
zip
’’
=>8463’))

3

INSERT INTO

bench_hstore(bench_hstore)
VA
L-
UES
(hstore(‘
’’
id
’’
=>3,
’’
forename
’’
=>
’’
inwa
’’
’))

Listing
11
: Transformed test data to hstore SQL statement

The test tool automatically adds an additional unique id of

type ‘String’ respectively of type ‘Text’.

The following dataset length are used to test KVP and hstore:

Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


13

Datasets

10 records

100 records

500 records

1’000 records

2’500 records

5’000 records

10’000 records

20’000 records

35’000 records

50’000 records

100’000 records

250’000 records

Table
4
: Number of dataset length

Additionally, for KVP and hstore one test circle, means test phase and benchmark phase, take
place once with database index and once without for each dataset length.

In total 24 different test
cycle were executed:


Figure
6
: Number of test cicle

‘# of length’ defines the number of different datasets that need to be run, ‘# of types’ means the
amount of different data sources that need to be t
estes. In our case it is KVP and hstore. ‘#of i
n-
dices’ means that the test will be run once with indexed data and once without. This results in 20
different test cycles.

3.6

Test Application

For benchmarking KVP and hstore an own application has been written i
n Python. It supports all
three phases: Generate / Preprocessing, Execution, and Bechmark / Analysis. It has been written
in a way that it can be enhanced for other data sources like MySQL, MSSQL, Oracle, etc. by inheri
t-
ing the general adapter object, whic
h ensures the database connection and runs all the benc
h-
mark task (compare with
Figure
7
).

1
2

[
#

o
f

l
e
n
g
t
h
]

2

[
#

o
f

t
y
p
e
s
]

2

[
#

o
f

i
n
d
i
c
e
s
]

3

[
w
a
r
m

s
t
a
r
t
]


1
4
4

[
c
i
c
l
e
s
]
Chapter
3
:
Benchmark proposal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


14


Figure
7
: Test / benchmark application incl. test data generator

As you can see in the
Figure
7

the test
applications consists of a bundle of two modules. The first
module is accountable for the preprocessing phase, or to be more precise, for test data gener
a-
tion. Based on input parameters it creates a dataset in a format that is readable for the execution
ph
ase. The following input parameters are mandatory:

Parameter

Description

-
a or

a浯畮t

䥮Ie来r㨠䅭潵湴 潦orec潲摳dt漠be create搮

-
琠潲

desc
-
瑹灥

String: Description type can be ‘words’, ‘lines’, or ‘chars’.

-
l 潲

摥sc
-
len杴g

䥮Ie来r㨠:e晩nes 桯h
浡ny 睯r摳d l楮esⰠor c桡rac瑥rs nee搠瑯 be a摤ed
瑯t瑨e a瑴r楢u瑥 c潭oen琠in a 瑥s琠摡瑡 rec潲搮

Table
5
: Input parameters for test data generator

As described in chapter
Error! Reference source not
found.

the test data generator creates a
sequence of unique identifiers (ID), a forname and surname based on the name generator by
Chris Gonnerman, a random zip code between 1000 and 9000, and a

comment based on the l
o-
rem ipsum generator of Per Erik Strandberg.

The name generator creates names by randomly assembling random syllables. No name lists or
other fancy features are included
(Gonnerman, 2003)
. Lorem ipsum generator creates different
length of dummy texts including non
-
ascii characters if needed
(Strandberg, 2007)
.

The second module contains the two other phases, execution and benchmark / analysis, and need
to be run separately. The input param
eters for this module are the following:

n standalone

processes

Processes

(Adapters)

Adapter

< load

Benchmark

Data generator

Log and Graphs

hstore

{name:
‘Joe’, …
}

{name: ‘
Anne

, …
}

{name:’Smith’
, …
}

{name:’Bob’, …}

{name:’Marco’,
...}


KVP

.csv

Data

set >

insert >

insert >

Task queue

Response queue

inh
e-
rit >

< get

< add

< write response

< get

excute task >

excute task >

<
write

write
>

Chapter
3
:
Benchmark pro
posal


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


15

Parameter

Description

-
t or
--
type

String: Type of database test. Currently ‘pgsql’ and ‘pgsqlhstore’ su
p-
ported.

-
x or
--
processes

Integer: The amount of parallel processes that should be allocated. If this
parameter is not set, then the software tries to find out the maximum
parallel processes of the CPU architecture.

-
s or
--
server

String: Server or hostname where the database runs e.g. localhost

-
p or
--
port

Integer: Port of the database server.

-
d or
--
database

String: Database name.

-
u or
--
user

String: A user how has the rights to create tables, do insertions and run
queries.

-
p or
--
password

String: User’s password.

-
a or
--
data

String: File that includes all test records.

-
i or
--
index

Boolean:

Defines whether an index on the tables should be allocated or
not.

-
n or
--
no
-
hot
-
start

Boolean: The flag defines if a hot start is required. If it is not set, it runs
the test 2 times before it measures on the 3 round the transaction time.

-
l or
--
log

Boolean: Whether a log file should be created or not.

-
g or
--
graph

Boolean: Should a Graph be created or not. If it is set, than the
--
log p
a-
rameter need also to be set.

Table
6
: Input parameters for benchmark application

At the
beginning, this module try to find out how many processes the hardware can run in para
l-
lel if the parameter ‘
-
x’ or ‘
--
processes’ is not set. Each process is a separate adapter object, which
is waiting on tasks from the task queue.

In the next steps it tak
es the input data, generates a unique identifier and calls the appropriate
database adapter for insertion. Then it loops over all unique identifier and creates tasks, which it
puts into the task queue. Each adapter takes autonomous a task from the queue an
d executes it.
The time the adapter needs to execute the task flow back over the response queue to the benc
h-
mark object in the module. The benchmark object reads all response, generates the average and
tries to find a suitable regression curve. Additionall
y all this information will be written in a file
and a suitable graphical representation will be created for manual analysis.

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


16

4

Benchmark Mai 2011

The following chapter specifies the hardware and software used for the system under test and
describes the resu
lts as well as the findings.

4.1

Technical specification

The server on which KVP and hstore need to be tested has the following hardware specification:

Type

Comment

Processor

CPU

Intel(R) Xeon(R) CPU E5520 @ 2.27GHz

Instructions set

64
-
bit

# of cores

4

#
of threads

8

# of CPUs

2

Memory

Total

24’732’476 kB about 24 GB

印eed

㄰㘶⁍䡺

䥤Ie 浯摵d

佮l礠yb畮瑵 an搠偯獴Pre卑L are r畮n楮朠on 瑨e same har搠摩獫 睨wreas
23’677’164 kB about 23 GB RAM is free

Table
7
: Hardware
specification of system under test

The software specification is as follow:

Software

Coment

OS

Ubuntu 10.04.2 LTS

Kernel

2.6.32
-
28
-
server

GCC

4.4.2

Database

PostgreSQL 9.0.4 and PostGIS 1.5.2

Python

Version 2.6.5 and the following add
-
ons:



Numpy



Scipy



Matplotlib

Table
8
: Software specification of system under test

4.2

Execution

Before the test can be started a database and the test data need to be created. The following su
b-
chapters describe which steps are needed to be ready for
executing a benchmark. For all steps,
preparing database, generating test data, and executing benchmark, an example bash script is
provided.

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


17

4.2.1

Preparing Database

To prepare the database you need first to login as a user that has the privilege to create a dat
a-
base such as user ‘postgres’:

1

sudo su

postgres

As ‘postgres’ user you can run the first script called ‘step_1.sh’ which creates a new database user
‘benchmark’, a database called ‘benchmark’ and runs the hstore script that install the ‘hsotre’
data type a
nd a bunch of PostgreSQL functions.

1

./step_1.sh

The content of the ‘step_1.sh’ script is as follow:

1

#!/bin/bash

2

# first login as user with privilege to create a database e.g.
sudo su postgres

3


4

# create new user benchmark

5

createuser

-
l
-
D
-
R
-
S benchmark

6


7

#

alter user's password

8

psql

-
U postgres
-
c
"ALTER USER benchmark WITH PASSWORD
'benchmark'"

9


10

# create new database benchmark

11

createdb

-
U postgres benchmark

12


13

# create language plpgsql on benchmark database

14

createlang

-
U postgres
-
d benchmark plpgsql

15


16

# load

hstore data type and functions

17

psql
-
U postgres
-
d benchmark
-
f

/usr/share/postgresql/9.0/contrib/hstore.sql

18


19

# grant access to user benchmark

20

psql

-
U postgres
-
d benchmark
-
c

"GRANT ALL PRIVILEGES ON DAT
A-
BASE benchmark TO benchmark"

Figure
8
: Database setup script


4.2.2

Generating Test Data

As described in chapter
3.6

Test Application

the data generator need to be run separately. The
data can be created by executing the ‘step_2.sh’ bash script which creates the defined data sets
from chapter
Error! Reference source not found.

Error! Reference source not found.
.

1

./step_2.sh

The bash script ‘step_2.sh’ has the followin
g content:

1

#!/bin/bash

2

#
-

for 10

3

python

generator.py
-
a 10
-
t words
-
l 50

4

mv

data/testdata.csv data/testdata_10.csv

5


6

# create test data sets

7

#
-

for 100

8

python

generator.py
-
a 100
-
t words
-
l 50

9

mv

data/testdata.csv data/testdata_100.csv

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


18

10


11

# create test
data sets

12

#
-

for 500

13

python

generator.py
-
a 500
-
t words
-
l 50

14

mv

data/testdata.csv data/testdata_500.csv


# etc.

Figure
9
: Test data generation script


4.2.3

Executing Benchmark

Now all prerequisites are fulfilled and the benchmark can

be started. Also for this step an exa
m-
ple script is available. Run the following command to benchmark KVP and hstore once with index
and once without based on the generated datasets in the previous script.

1

./step_3.sh

The ‘step_3.sh’ script includes the
following statements.

1

#!/bin/bash

2

############

3

#for hstore

4

############

5

#
-

for 10

6

#
-

without index

7

python

benchmark.py
-
t pgsqlhstore

-
s localhost
-
p 5432
-
d
benchmark
-
u benchmark
-
w benchmark
-
a data/testdata_10.csv

-
l
-
g
-
n

8

mv

output/1.png output/hsto
re_10_1.png

9

mv

output/2.png output/hstore_10_2.png

10

mv

output/log.csv output/hstore_10_log.csv

11

mv

output/log_summary.csv output/hstore_10_log_summary.csv

12

psql

-
d benchmark
-
c
"EXPLAIN ANALYZE SELECT * FROM
bench_hstore WHERE hstore(bench_hstore)
-
>'id'='7';"

> ou
t-
put/analyze.log

13

mv

output/analyze.log output/hstore_10_analyze.log

14


15

#
-

with index

16

python

benchmark.py
-
t pgsqlhstore

-
s localhost
-
p 5432
-
d
benchmark
-
u benchmark
-
w benchmark
-
a data/testdata_10.csv

-
i
-
l
-
g
-
n

17

mv

output/1.png output/hstore_10_in
dex_1.png

18

mv

output/2.png output/hstore_10_index_2.png

19

mv

output/log.csv output/hstore_10_index_log.csv

20

mv

output/log_summary.csv ou
t-
put/hstore_10_index_log_summary.csv

21

psql

-
d benchmark
-
c
"EXPLAIN ANALYZE SELECT * FROM
bench_hstore WHERE
hstore(bench_hstore)
-
>'id'='7';"

> ou
t-
put/analyze.log

22

mv

output/analyze.log output/hstore_10_index_analyze.log

23



24

############

25

#for KVP

26

############

27

#
-

for 10

28

#
-

without index

29

python

benchmark.py
-
t pgsql

-
s localhost
-
p 5432
-
d benchmark
-
u benchmark
-
w benchmark
-
a data/testdata_10.csv

-
l
-
g
-
n

30

mv

output/1.png output/kvp_10_1.png

31

mv

output/2.png output/kvp_10_2.png

32

mv

output/log.csv output/kvp_10_log.csv

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


19

33

mv

output/log_summary.csv output/kvp_10_log_summar
y.csv

34

psql

-
d benchmark
-
c
"EXPLAIN ANALYZE SELECT * FROM bench_kvp
WHERE bench_id = (SELECT bench_id FROM bench_kvp WHERE key =
'id' AND value = '7');"

> output/analyze.log

35

mv

output/analyze.log output/kvp_10_analyze.log

36



37

#
-

with index

38

python

benchmark.
py
-
t pgsql

-
s localhost
-
p 5432
-
d benchmark
-
u benchmark
-
w benchmark
-
a data/testdata_10.csv

-
i
-
l
-
g
-
n

39

mv

output/1.png output/kvp_10_index_1.png

40

mv

output/2.png output/kvp_10_index_2.png

41

mv

output/log.csv output/kvp_10_index_log.csv

42

mv

output/log_summ
ary.csv output/kvp_10_index_log_summary.csv

43

psql

-
d benchmark
-
c
"EXPLAIN ANALYZE SELECT * FROM bench_kvp
WHERE bench_id = (SELECT bench_id FROM bench_kvp WHERE key =
'id' AND value = '7');"

> output/analyze.log

44

mv

output/analyze.log output/kvp_10_index_an
alyze.log


# etc.

Figure
10
: Benchmark script example


4.3

Results

The test has been executed in May 2011 based on the test design described in chapter
Error!
Reference source not found.

and the hardware specification in chapter
4.1
. All the test logs
were aggregated into a single file showing the start and end time as well as the duration and the
average time in seconds per SELECT statement. The detail aggregation and t
he full extent di
a-
grams can be reviewed in the appendix on page
32
.

In general it can be said that the hstore performs much better than a KVP table schema. Neve
r-
t
heless this general conclusion needs to be differentiated and a closer look on the different data
sets gives us a detailed understanding why the first assumption is not accurate enough.

On the first test run each data set has been tested twice, once without an index and once with an
index. For the KVP schema an index on the attribute ‘key’ has been used. This benchmark gave
the result as in
Figure
11
. In general, hstore does a really good job when querying a tuple by a
given key value pair. Specifically when using big data sets, which results in a lot of tuples in the
database table. One reason could b
e that hstore needs for storing arbitrary information only one
attribute on the same tuple as all the other information are stored. But more on that later in
chapter
4.4

Findings
.


Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


20


Figure
11
: Overview KVP vs. hstore benchmark

From 10 to approximately 500 records the KVP is much faster by querying a key

value pair. A
f-
terwards hstore demonstrates its strength especially when having more than 2’500 tuples in the
hstore table, whereas by using an index on the KVP table, hstore is faster by a factor of 4.04 and
without an index by a factor of 7.9. This sound
s like KVP is a performance killer, which is not the
truth, because if we look at the absolute querying time per SELECT statement, than KVP needs in
average for 2’500 records heavy data set 0.01954 seconds (19.54 ms) without and 0.01512
seconds (15.12 ms)
with an index compared to hstore 0.00484 seconds (4.84 ms) without and
0.00191 (1.91 ms) with index. The difference is only around 13 milliseconds per SELECT stat
e-
ment, which is not that much.


Figure
12
: Benchmark KVP vs. hstore
from 10 to 2.5K

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


21

The circumstance changes, if we use a combined index for the KVP table on the attributes ‘key’
and ‘value’. Hstore is still a tick faster than the KVP schema, but the difference between the ave
r-
age SELECT transaction time for KVP indexed ta
ble shrunk extremely fast. Overall we can say
that the combined index comes very near to the hstore schema. Despite the fact that the co
m-
bined index gives some performance boost, we can still see the problem, that the more tuples we
store in the KVP table
the higher will be the difference between the hstore and the KVP. This co
n-
cludes that the more arbitrary data results need to be store, the faster grows the combined index
and the longer needs PostgreSQL to find the right key value pair combination. But mo
re on that
later in chapter
4.4

Findings
.


Figure
13
: Overview KVP hstore with combined index


For small data stets the KVP schema is still a little bit faster however this changes quickly by an
amount of c
irca 500 and more records / tuples per associative array.


Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


22


Figure
14
: Benchmark KVP hstore from 10 to 2.5K with combined index

Having a nearer look to the data sets between 10 and 2’500 tuples we see that the average tran
s-
action
time on a 2’500 big data set shrunk to 0.00948 (9.48 ms) from 0.01512 seconds (15.12
ms) that leads to the fact that a combined index is faster by a factor of 1.6.


Applying this calculation to the 35’000 big data set we become a factor of 1.72 because it

shrunk
from 194.98 milliseconds to 113.2 milliseconds, which is not that much.


Figure
15
: Overview of difference between KVP single and combined index

1
5
.
1
2
m
s
9
.
4
8
m
s

1
.
6
Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


23

Small data sets show a contrary perspective. From 10 to 500 records the diffe
rence between the
hstore and a KVP with a combined index is negative. This means that the KVP is faster then the
hstore. From more or less 500 records upwards hstore will be faster even only a little bit.


Figure
16
: 10 to 2.5K: D
ifference between KVP single and combined index


The subsequent two diagrams show the KVP without and with and index on attribute ‘key’ as
well as with combined index against hstore with and without index. It depict that KVP with a
combined index comes ver
y near to the hstore data type.


Figure
17
: Overview of KVP with index on ‘key’ and combined index against hstore

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


24


Figure
18
: 10 to 2.5K: KVP with index on ‘key’ and combined index against hstore


Analyzing the indices of hstore and KVP shows, that KVP needs a lot more size on the disk to
build the index. Both an index on the attribute ‘key’ and a combined index on the attributes ‘key’
and ‘value’ whereat the combined index need more disk size then
the index on the attribute ‘key’.
Surprisingly a GiST index on the hstore nees much less disk size even though GiST creates for
each unique key in the hstore an own index.


Figure
19
: Index size overview


Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


25

Especially in the area of

1’000 tuples and more is the different significant. KVP with an index on
the attribute ‘key’ needs by 1’000 tuples 3.73 times more disk space than the GiST index on a
hstore and 3.55 times for the combined index.


Figure
20
: Inde
x size for 10 to 5’000 records


4.4

Findings

As described in the chapter
0


Results
, hstore is by a factor of 7.9 faster than KVP with an index on the key attribute and 4.95
wit
h a combined index by 2’500 records in an array. This can be traced back to the fact that KVP
needs much more units (cost)
3

of work to get the right tuple, whereupon the size in bytes for a
single tuple is a lot lower. Lets say we want to save 2’500 array
entries in a KVP and hstore
schema. For the hstore we have exactly 2’500 tuples in the database table


as follow:

Database table:

bench_hstore

bench_id : BIGINT

bench_hstore : HSTORE

1

"id"=>"1", "comment"=>"lorem ipsum dolor sit amet consetetur sadip
s
c-
ing elitr sed diam nonumy eirmod tempor invidunt ut labore et dolore
magna aliquyam erat sed diam voluptua at vero eos et accusam et justo



3

The cost factor in the EXPLAIN result defines the cost to read a single database page from disk. It does not
relate to anything and therefore it can only be compared to other ‘cost’ units
(Nasby, 2010)
.

a
r
r
a
y

e
n
t
r
i
e
s


t
u
p
l
e
s

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


26

duo dolores et ea rebum stet clita kasd gubergren no sea takimata san
c-
tus est lorem ipsum dolor sit amet ", "surnam
e"=>"ebsaveq", "for
e-
name"=>"maeznidus"

2

"id"=>"2", "zip"=>"6489", "comment"=>"lorem ipsum dolor sit amet co
n-
setetur sadipscing elitr sed diam nonumy eirmod tempor invidunt ut l
a-
bore et dolore magna aliquyam erat sed diam voluptua at vero eos et a
c-
cusam
et justo duo dolores et ea rebum stet clita kasd gubergren no sea
takimata sanctus est lorem ipsum dolor sit amet ", "surname"=>"epofod",
"forename"=>"teer"




2500

"id"=>"2500", "comment"=>"lorem ipsum dolor sit amet consetetur s
a-
dipscing elitr sed diam

nonumy eirmod tempor invidunt ut labore et d
o-
lore magna aliquyam erat sed diam voluptua at vero eos et accusam et
justo duo dolores et ea rebum stet clita kasd gubergren no sea takimata
sanctus est lorem ipsum dolor sit amet ", "forename"=>"sorietet"

Tab
le
9
: Hstore table abstract

What we need to do is to analyze an example query. As an example we take this one:

1

EXPLAIN ANALYZE

SELECT

*
FROM

bench_hstore


WHERE

hstore(bench_hstore)
-
>
'id'
=
'1735'
;

Listing
12
: Explain Analyze statement for hstore without index

The analysis says that the cost for getting the first tuple is 0 and all tuples needs 213.72 cost
units. PostgreSQL thinks that he will return 45 rows each of a size of 40 bytes.

1

Seq

Scan

on bench_hst
ore (
cost
=0.00..213.72
rows
=45
width
=40)
(actual time=1.318..1.778 rows=1 loops=1)

2


Filter:
((bench_hstore
-
> 'id'::text) = '1735'::text)

3


Total runtime: 1.822 ms

4

(3 rows)

Listing
13
: Output of Explain Analyze statement for
hstore without index

The actual time shows the effectively milliseconds needed to execute the example statement.
PostgreSQL needed 1 loop, returned 1 row and required 1.822 milliseconds.

Hstore with an index needs to have another SELECT statement in the an
alyze query because the
hstore operator to query an indexed attribute is a little bit different. In hstore it can be done by
using the ‘@>’ operator.

1

EXPLAIN ANALYZE

SELECT

*
FROM

bench_hstore


WHERE

bench_hstore @>
'id=>1735'
;

Listing
14
: Explain Analyze statement for hstore with index

The difference between a hstore without and with an index is, the hstore without an index is
doing a sequential scan for each SELECT statement. That means that it begins at tuple one and
continuou
s until the query is satisfied. When using an index at bitmap heap scan takes place.
PostgreSQL has found a small subset of tuples that can fulfill the query. The smaller set of tuples
to loop over enhances the performance to find the right tuple.

Chapter
4
:
Benchmark Mai 201
1


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


27

1

Bitmap H
eap Scan

on bench_hstore


(
cost
=4.27..11.33
rows
=2
width
=218)


(actual time=0.481..0.534 rows=1 loops=1)

2


Recheck Cond: (bench_hstore @> '"id"=>"1735"'::hstore)

3


-
>
Bitmap Index Scan

on hidx_2_5k


(
cost
=0.00
..4.27
rows
=2
width
=0)


(actual time=0.308..0.308 rows=70 loops=1)

4


Index Cond: (bench_hstore @> '"id"=>"1735"'::hstore)

5


Total runtime: 0.721 ms

6

(5 rows)

Listing
15
: Output of Explain Analyze statement for

hstore with index

Now lets have a look at the KVP tables. Remember that we have in the schema two different
tables but we only need the table with the attributes ‘key’ and ‘value’. For the same amount of
array entries in the KVP table a multiple of tuples

will be stored. Each key value pair entry in the
array needs to be a separate tuple in the table. If we have the following keys ‘id’, ‘surname’, ‘fo
r-
name’, ‘zip’, and ‘comment’ for a 2’500 big array and each key in this array has an assigned value,
then i
t will results to 12’500 tuples in the database table. Compared to hstore it has 5 times more
tuples or to be more precise the sum of all filled keys.

, whereas

The database table includes then for each key value pair an own tuple:

Database table:

bench_kvp

bench_id : BIGINT

key : TEXT NOT NULL

value : TEXT

1

id

1

1

comment

lorem ipsum dolor sit amet consetetur sadip
s
c-
ing elitr sed diam nonumy eirmod tempor inv
i-
dunt ut labore et dolore magna aliquyam erat
sed diam voluptua at vero eos et accusam et ju
s-
to duo dolores et ea rebum stet clita kasd g
u-
bergren no sea takimata sanctus est lorem i
p-
sum dolor sit amet

1

surnam
e

Ebsaveq

1

forename

Maeznidus

2

id

2

2

zip

6489

2

comment

lorem ipsum dolor sit amet consetetur sadips
c-
ing elitr sed diam nonumy eirmod tempor inv
i-
dunt ut labore et dolore magna aliquyam erat
sed diam voluptua at vero eos et accusam et ju
s-
to duo
dolores et ea rebum stet clita kasd g
u-
bergren no sea takimata sanctus est lorem i
p-
sum dolor sit amet

t
u
p
l
e
s


v
a
l
u
e

i
n

t
h
e

a
r
r
a
y

v
a
l
u
e

n
u
l
l
Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


28

2

surname

epofod

2

forename

Teer







2500

id

2500

2500

Comment

lorem ipsum dolor sit amet consetetur sadip
s
c-
ing elitr sed diam nonumy eirmod tempor inv
i-
dunt ut labore et dolore magna aliquyam erat
sed diam voluptua at vero eos et accusam et ju
s-
to duo dolores et ea rebum stet clita kasd g
u-
bergren no sea takimata sanctus est lorem i
p-
sum dolor sit amet

2500

for
ename

sorietet

Table
10
: KVP table abstract

Taking the same key value pair we used for hstore gives the following KVP SQL statement:

1

EXPLAIN ANALYZE

SELECT
*
FROM

bench_kvp


WHERE

bench_id = (


SELECT

bench_id
FROM

bench_kvp


WHERE

key

=
'id'

AND

value

=
'1735'


);

Listing
16
: Explain Analyze statement for KVP

which results in the following EXPLAIN output:

1

Seq Scan

on bench_kvp (
cost
=229.38..437.03
rows
=3
width
=60)



(actual time=3.125..3.579 rows=2 loops=1)

2


Filter: (bench_id = $0)

3


InitPlan 1 (returns $0)

4


-
>
Seq Scan

on bench_kvp


(
cost
=0.00..229.38
rows
=1
width
=8)


(actual time=1.406..2.162 rows=1 loops=1)

5


Filter:
((key = 'id'::text) AND


(value = '1735'::text))

6

Total runtime: 3.607 ms

Listing
17
: Output of Explain Analyze statement for KVP

To get the same result as for hstore two different sequences ar
e needed. The inner sequence
looks for the key value pair and gives the unique identifier back. The sequence cost to get all
tuples back is 229.38 whereas each tuple is 8 bytes big. Then with the given identifier a second
sequence collects all tuples that
relates to the given identifier. The cost for the first sequence b
e-
gins where the inner sequence stops and ends at 437.03.

In a procedural language it works like this:

1

For

each

row
in

input_1

2


For

each

row
in

input_2

3


// Do_something

4


Next

5

Nex
t

Listing
18
: Programming example of explain sequences

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


29

The difficulty of creating an own hstore like database schema is, that more sequences and ther
e-
fore more reads of pages on the disk are needed which results in higher cost uni
ts. The lower size
of the tuples in bytes does compensate the amount of pages to be read on the disk. Because in the
short run it looks like that at the best
-
case only 60 bytes for the first sequence and 8 bytes for
second sequence are needed. That is not
the truth because if the first read in the second sequence
finds the key value pair, than 8 bytes are consumed. With the found identifier the first sequence
read all the tuples that match the identifier. In our case we have 5 key value pair combinations
th
at means 5 tuples in the KVP table. Each tuple consumes 60 bytes, which is for all 5 tuples 300
bytes, plus the 8 bytes for the second sequence results in a total size of 308 bytes. Compared to
the hstore in uses in the best
-
case only 213 bytes.

Using an i
ndex on the key attribute can enforce finding the unique identifier for a given key value
pair. Analyzing the technique shows that we have an additional sequence.

1

Seq Scan

on bench_kvp (
cost
=199.48..406.88
rows
=3
width
=60)


(actual time=2.268..2.730 rows=2 loops=1)

2


Filter: (bench_id = $0)

3


InitPlan 1 (returns $0)

4


-
>
Bitmap Heap Scan

on bench_kvp


(
cost
=62.99..199.48
rows
=1
width
=8)


(actual time=0.925..1.227

rows=1 loops=1)

5


Recheck Cond:
(key = 'id'::text)

6


Filter:
(value = '1735'::text)

7


-
>
Bitmap Index Scan

on kvpidx


(
cost
=0.00..62.99
rows
=2499
width
=0)


(actual time=0.373..0.373 rows=2500 lo
ops=1)

8


Index Cond:
(key = 'id'::text)

9

Total runtime: 2.770 ms

Listing
19
: Output of Explain Analyze statement for KVP with index on attribute ‘key’

It reads the index to find all the tuples, which have the given ke
y and afterwards in the second
sequence PostgreSQL needs to find the value in a shorter list of key value pairs. This enables to
have an enhancement of factor 1.29, which is calculated by the average SELECT time without an
index divided by the average SELE
CT time with an index on the ‘key’ attribute.


Interesting is what happens if we take a combined index on the attributes ‘key’ and ‘value’. Again
we take the same statement as before

1

EXPLAIN ANALYZE

SELECT
*
FROM

bench_kvp


WHERE

bench_id = (


S
ELECT

bench_id
FROM

bench_kvp


WHERE

key

=
'id'

AND

value

=
'1735'


);

Listing
20
: Explain Analyze statement for KVP

Using a combined index reduces the cost consumption by a factor of 2.06, calculated in the same
way

as before: Average SELECT time without an index divided by the average SELECT time with a
combined index.


1
9
.
5
4
m
s
1
5
.
1
2
m
s

1
.
2
9
1
9
.
5
4
m
s
9
.
4
8
m
s

2
.
0
6
Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


30

This factor results, because no additional sequence is needed, like we had it when using only an
index on the attribute ‘key’. The first index scan

can directly find the unique identifier for a given
key value pair and reading afterward all tuples for that identifier. The bytes, which are needed to
gather the data, are exactly the same, as we needed for the first index alternative. In contrast to
the

first alternative the combined index on the attributes ‘key’ and ‘value’ will grow very fast,
because for each key value pair combination a new entry in the index is needed. It is in the nature
of the key value pair philosophy, that only arbitrary unfores
een information is stored as key va
l-
ue pair and therefore the probability that an equal key value pair appears in the table is very u
n-
likely.

1

Seq Scan

on bench_kvp (
cost
=8.27..215.91
rows
=3
width
=60)


(actual time=1.376..1.954 rows=5 loops=1)

2


Filter: (bench_id = $0)

3


InitPlan 1 (returns $0)

4


-
>
Index Scan

using kvpidx2 on bench_kvp


(
cost
=0.00..8.27
rows
=1
width
=8)


(actual time=0.0
48..0.049 rows=1 loops=1)

5


Index Cond:
((key = 'id'::text) AND


(value = '1735'::text))

6


Total runtime: 2.028 ms

7

(6 rows)

Listing
21
: Output of Explain Analyze statement for KVP with combined index


Lastly we need to have a short look on the hstore implementation.

Please consider for that cha
p-
ter
2.3.2

Working principle
.

Lets take the following hstore example:

bench_id : BIGINT

bench_hstore : HSTORE

1

"zip"=>"8000 ", "surname"=>"ebsaveq", "forename"=>"maeznidus"

2

"zip"=>"6489", "surname"=>"epofod", "forename"=>"teer"

3

"zip"=>"8000 ", "surname"=>"kjuefs", "forename"=>"beer"

Table
11
: Hstore tuple examples

Now we want to find all names how lives in the region with zip code 8000. For that, we need the
following SQL statement:

1

SELECT

hstore(bench_
hstore)
-
>
’surname’

FROM

bench_hstore


WHERE

hstore(bench_hstore)
-
>
'zip’
=
’8000'
;

Listing
22
: Querying all person how lives in region with zip code ‘8000’

In this example PostgreSQL calls six times the ‘fetchval’ function, because it loops over each tuple
and calls the function twice. Once to get the value of the attribute ‘surname’ and once the get the
‘zip’ so that PostgreSQL can compare the returned value

with the value ‘8000’. When PostgreSQL
is calling the ‘fetchval’ function he hands the array over to the C function ‘hstore_fetchval’, which
finds the key in the buffer. Based on the buffer entry it knows on which position begins the value
and how long it

is. For the key ‘zip’ and the first tuple it would be something like start position 8
and length 4. So know hstore can substring the array and return the value ‘8000’ back to Pos
t-
greSQL, which will then compare it with the given value after the equals sig
n.

Chapter
4
:
Benchmark Mai 2011


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Struc
tures in PostgreSQL.


University of Applied Science Rapperswil


31

4.5

Conclusion

As described in the previous chapters hstore perform much faster then a KVP schema described
in chapter
2.2
. The stored data of type hstore are not
lost in the database and can be migrated
with a minimum of effort to another schema, because it is stored as string in the form of an ass
o-
ciative array in the database. In addition, hstore provides PostgreSQL functions to transform the
associative array in
to a column row like table, as it is know in every database management sy
s-
tem. The keys in the associative array are transposed to columns and each row in the array is a
tuple in the column row like table. The values are transformed to the values in a tupl
e. Therefore
the fear for a later migration should not be the criteria of not using hstore.

Also the way how it is implemented nees much less size on the disk for the indices and costs less
performance. This shows the explain analysis in chapter
4.4

and the graphs on page
24

and
25
.
The cost of reading data is much lower then the one of a KVP schema. To remember, the cost d
e-
fines a factor of reading a page from the disk. The higher it is the more it needs to read on the
disk and the
slower it will be. In addition hstore buffers all the keys and values to provide a faster
read and along a single buffer entry it stores the position of the key and value in the string, which
is of type hstore in the database table that represents the asso
ciative array. That means when
hstore found the key it does not need to substring the string because it already knows the pos
i-
tion in the string.

Thus, it is to consider that for small datasets hstore is not the preferable method to store key va
l-
ue pairs.
Especially when having an array size of 1 to more or less 500 records. At this size a KVP
schema is preferable. The problem we face here is at the beginning we do not have a lot of
records in the table. But when the time goes over the 500 records limit can

be easily exceeded
and then switching from the KVP schema to the hstore is not that easy, because not only the data
need to be transformed to the hstore array but also the database tables and all the SQL stat
e-
ments for querying, inserting, modifying and d
eleting records need to be changed to the new
schema. The conclusion is to first think about how many data will be expected during the dat
a-
base table lifetime and then choosing the right schema. If you are unsure which one you should
take, my succession is

to use hstore as default, because the difference of the average querying
time between the KVP and the hstore schema is at 500 records only
-
0.45 milliseconds, which
means that KVP is 0.45 ms faster.


To summarize, hstore provides in general a faster mecha
nism to store key value pairs. It is easy
to use and offers a lot of operations to compare, transform, and search for data. Not only the pe
r-
formance but also the pagings and bytes which are needed to get the data for a given key value
pair is better than s
etting up an own key value pair schema. All in all hstore is the preferable way
of storing arbitrary unforeseen information in a database table.


Bibliography


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


32

Bibliography

Bartunov, O., Sigaev, T., &

Gierth, A. (n.d.).
PostgreSQL 9.0: hstore
. Retrieved Mai 1, 2011, from
http://www.postgresql.org/docs/9.0/static/hstore.html

Cambridge University Press. (n.d.).
Cambridge Dictionary Online
. Retrieved April 26, 2011, from
http://dictionary.cambridge.org/

G
onnerman, C. (2003).
Python Name Generators
. Retrieved April 27, 2011, from Alderon's Tower:
http://tower.newcenturycomputers.net/namegen.html

Krummenacher, R. (2009, December 21).
HSR Texas Geo Database Benchmar
. Retrieved June 3,
2011, from Wiki GISpunkt

HSR:
http://www.gis.hsr.ch/wiki/HSR_Texas_Geo_Database_Benchmark

Nasby, J. (2010, May 13).
Introduction to VACUUM, ANALYZE, EXPLAIN, and COUNT
. Retrieved
May 27, 2011, from PostgreSQL wiki:
http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_
EXPLAIN,_and_C
OUNT

Oxford University Press. (n.d.).
Oxford English Dictionary
. Retrieved April 28, 2011, from
http://www.oed.com

PostgreSQL Global Development Group. (n.d.).
PostgreSQL: About
. Retrieved Mai 26, 2011, from
http://www.postgresql.org/About

Sm
ith, G. (2010).
PostgreSQL 9.0. High Performance.

Olton, Birmingham, United Kingdom: Packt
Publishing Ltd.

Strandberg, P. E. (2007).
Lorem Ipsum Generator
.
Retrieved April 28, 2011, from
http://www.pererikstrandberg.se/blog/index.cgi?page=LoremIpsumGenerat
or

Walsh, N. (1996).
What does `lorem ipsum dolor' mean?

Retrieved April 26, 2011, from
http://nwalsh.com/comp.fonts/FAQ/cf_36.htm


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil


33


Appendix

Benchmark with KVP index on attribute ‘key’

Benchmark of hstore and KVP once with index (w) and once without index

(o). For KVP an index
on the attribute ‘key’ has been choosen.


Appendix


Key/Value Pair versus hst
ore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil



34


Appendix


Key/Value Pair versus hst
ore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil



35


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQL.


University of Applied Science Rapperswil



36

Benchmark with co
mbined KVP index

Benchmark of hstore and KVP once with index (w) and once without index (o). For KVP a co
m-
bined index on the attribute ‘key’ and ‘value’ has been choosen.




Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



37


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



38


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



39

Differences between KVP and hstore




Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



40






Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



41

Average SELECT time for KVP and hstore


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



42


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



43

KVP and hstore index sizes


Appendix


Key/Value Pair versus hstore


Benchmarking Entity
-
Attribute
-
Value Structures in PostgreSQ
L.


University of Applied Science Rapperswil



44