Overview of PostgreSQL 9.0 - SSDI - Universidade Nova de Lisboa

pridefulauburnData Management

Dec 16, 2012 (4 years and 7 days ago)

467 views

1

|
P a g e




Overview of Postgre
SQL

9.0


2

|
P a g e


Faculdade de Ciências e Tecnologia da

Universidade
Nova de Lisboa


Overview of Postgre
SQL

9.0

Project for the course Sistemas de Bases de Dados

Ivana Miniarikova 33793

João Rodrigues 25648

Pedro Dias 31228

December 2010

3

|
P a g e


Table of contents

1.

Introduction

4

1.1.

System applicability and Historical introduction

5

2.

SQL syntax coverage


2.1.

DDL

9

2.2.

DML

13

2.3.

DCL

14

3.

Storing and File Structure


3.1.

Introduction

15

3.2.


Pagination

17

3.3.


Management

18

3.4.

Partitioning

21

3.5.

Clustering

22

4.

Indexing and Hashing


4.1.

Introduction

23

4.2.

Index types

23

4.3.

Using indexes for file organization

26

4.4.

Multi
-
column indexation

27

4.5.

Combination of multiple indexes

27

4.6.

Unique and Partial Indexes

28

4.7.

Temporary inconsistency of structures

29

5.

Processing and
Optimization of queries


5.1.

Introduction

30

5.2.

Establishing the connection and sending the query

31

5.3.

The Parser Stage

31

5.4.

The Planner

33

5.5.

The Executor

37

5.6.

Algorithms for basic operations

37

5.7.

The statistics used by the planner

39

6.

Transactions


6.1.

Introduction

46

6.2.

Savepoints

47

6.3.

Explicit table locks

47

6.4.

Nested Locks

50

6.5.

Transaction Isolation

50

6.6.

Write ahead
-
log

53

7.

Distributed databases

54

8.

Other aspects of PostgreSQL

56

8.1.


XML

56

8.2. Triggers and Procedures

56

Conclusion

58




4

|
P a g e


1
.
Introduction

This project was developed within the scope of Database Systems course (Sistemas de
Bases de Dados) and covers the main features of PostgreSQL 9.0 database
management system.

Each main
feature is addressed in a separate chapter, containing sub
-
chapters for each
topic within the corresponding chapter.

The feature descriptions include examples and tests performed on a simple database
created by us for the effect, in which all data was rand
omly generated in order to give
the database enough complexity to efficiently test the algorithms.

The database
“teste”, on which we performed all tests, has the following structure:



5

|
P a g e


1.1
System applicability and Historical introduction

1.1.1
System
applicability

PostgreSQL

is

an open source object
-
relational databas
e system that runs on all major

operating systems like windows, Linux and all the others of UNIX family.

PostgreSQL

DBMS SQL implementation strongly conforms to ANSI
-
SQL:2008 standard
and
has full support for:


Data integrity features including (compound)primary keys, foreign keys with
restricting and cascading updates/deletes, check constraints, unique
constraints, and not null constraints;


Auto
-
increment columns through sequences;


Limit/of
fset that allows the return of partial result sets;


Compound, unique, partial and functional indexes witch can use any of its B
-
tree, R
-
tree, hash, or GiST storage methods;


Joins
, views
;


Multiple schemas per database;


Sub
-
queries (including sub
-
selects in the
FROM

clause);


Real
-
committed and serializable transaction isolation levels;


Table inheritance which puts an object oriented slant on table creation,
allowing database designers to derive new tables from other tables, treating
them as base cl
asses. It supports both single and multiple inheritances in this
manner;


Rules system, also called the query rewrite system, allows the database
designer to create rules which identify specific operations for a given table or
view, and dynamically transfor
m them into alternate operations when they are
processed;


Database events, the events system is an inter
-
process communication system
in which messages and events can be transmitted between clients using the
LISTEN

and
NOTIFY

commands, allowing both simple

peer to peer
communication and advanced coordination on database events. Since
notifications can be issued from triggers and stored procedures,
PostgreSQL

clients can monitor database events such as table updates, inserts, or deletes
as they happen

(t
rigg
ers
)
;


S
tored procedures;


Includes most of SQL:2008 data types such as:
INTEGER
,
NUMERIC
,
BOOLEAN
,
CHAR
,
VARCHAR
,
DATE
,
INTERVAL
,
TIMESTAMP
;


Supports storage of large binary objects, inclu
ding pictures, sounds, or video;


Has native programming interfaces
for C/C++, Java, Perl, Pyton
, Ruby, Tcl,
ODBC, among others;

6

|
P a g e


PostgreSQL

also boasts other advanced features like: MVCC, point in time recovery,
tablespaces, asynchronous replication, nested transactions

(savepoints), online/hot
backup, write ahead logging f
or fault tolerance and support for international character
sets, multibyte character encodings, Unicode, and it is locale
-
aware for sorting, case
-
sensitivity, and formatting.

Database system limits:

Limit:

Value:

Maximum Database Size

Unlimited

Maximum
Table Size

32TB

Maximum Row Size

1.6TB

Maximum Field Size

1GB

Maximum Rows per Table

Unlimited

Maximum Columns per Table

250
-
1600 (depending on column types)

Maximum Indexes per Table

Unlimited


1.1
.2

Historical Introduction

PostgreSQL

as many other open source projects, started at University of California at
Berkeley (UCB), originally called Postgres, was developed by a computer science
professor named Michael Stonebraker, who went on to become the CTO of Informix
Corporation.

Stonebra
ker started Postgres in 1986 as a follow
-
up project to its predecessor, Ingres,
now owned by Computer Associates. The name Postgres thus plays off of its
predecessor (as in "after Ingres"). Ingres, developed from 1977 to 1985, had been an
exercise in creat
ing a database system according to classic RDBMS(Relational Database
Management System) theory

Postgres, developed between 1986 and 1994, was a
project meant to break new ground database concepts such as exploration of "object
relational" technologies.

Sto
nebraker and his graduate students actively developed Postgres for eight years.
During that time, Postgres introduced rules, procedures, time travel, extensible types
with indices and object
-
relational concepts. Postgres was later commercialized to
become
Illustra which was later bought by Informix and integrated into its Universal
Server. Informix was purchased by IBM in 2001.

In 1995 two Ph.D. students from Stonebraker's lab, Andrew Yu and Jolly Chen, replaced
Postgres' POSTQUEL query language with an ext
ended subset of SQL. They renamed
the system to Postgres95.

In 1996, Postgres95 departed from academia and started a new life in the open source
world when a group of dedicated developers outside of Berkeley saw the promise of
the system, and devoted thems
elves to its continued development over the next eight
years. With many new features and enhancements, the database system took its
current name:
PostgreSQL
.

7

|
P a g e


PostgreSQL

began at version 6.0, over the next four years

(versions 6.0
-
7.0),
improvements and new

features were made such as:


MVCC


Multi
-
version Concurrency Control(has replaced the table
-
level locking)
which allows readers to continue reading

consistent data during writer activity
and enables online (hot) backups while database is running;


SQL enha
ncements were made including sub
-
selects, defaults, constraints,
primary keys, foreign keys, quoted identifiers, literal string type coercion, type
casting, and binary and hexadecimal integer input among others;


New native types(built
-
in) were added includ
ing a wide
-
range of date/time
types and additional geometric types;


Major speed and performance increases in the order of 20
-
40% were made,
and backend start
-
up time was decreased by 80%.


The four years following (versions 7.0 to 7.4) brought:


WAL
-

Write
-
Ahead Log is a standard approach to transaction logging, the
concept is that changes to data files must only be written after those changes
have been logged;


SQL schemas;


Prepared queries;


Outer joins;


Complex queries;


SQL92 join syntax;


TOAST;


IPv6
support;


SQL
-
standard information schema;


Full
-
text indexing;


Auto
-
vacuum;


Perl/Python/TCL procedural languages;


Improved SSL support;


An optimizer overhaul;


Database statistics information;


Added security;


Table functions;


Logging enhancements;


Significan
t speed improvements.



8

|
P a g e


Version 8.0 is
PostgreSQL
's long awaited debut into the enterprise database market,
bringing features such as:


Tablespaces;


Java stores procedures;


Point in time recovery;


Nested transactions

(savepoints).



9

|
P a g e


2.
SQL syntax coverage

2
.1
DDL

DDL stands for Data Definition Language and it’s used to define tables and views.

In
PostgreSQL

supports primary keys, foreign keys, default attributes values, attributes
constraints but doesn’t support assertions

(yet).

With DDL we can create, alte
r or
delete tables:

Create table:

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE
table_name


OF
type_name

[ (


{
column_name

WITH OPTIONS [ DEFAULT
default_expr

] [
column_constraint

[ ... ] ]


|
table_constraint

}


[, ... ]

) ]

[ WITH (
storage_parameter

[=
value
] [, ... ] ) | WITH OIDS |
WITHOUT OIDS ]

[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]

[ TABLESPACE
tablespace

]

TEMPORARY

or
TEMP

tables will be dropped in the end of session the
GLOBAL

or
LOCAL

can be written before
TEMPORARY

or
TEMP

but will make no difference.

For each column we can define the default values and values or types constraints. We
can also inherit from other table using the optional

INHERITS

clause:

INHERITS (
parent_table

[, ...
] )

With LIKE clause we can define a table from which the new table automatically copies
all column names, data types and not null constraints. Unlike inherits LIKE clause
doesn’t have any persistent relation with parent table.

LIKE
parent_table

[ { INCLU
DING | EXCLUDING } DEFAULTS ]


10

|
P a g e


Where
column_constraint

is:

[ CONSTRAINT
constraint_name

]

{ NOT NULL |


NULL |


CHECK (
expression

) |


UNIQUE
index_parameters

|


PRIMARY KEY
index_parameters

|


REFERENCES
reftable

[ (
refcolumn

) ] [ MATCH FULL | MATCH
PARTIAL | MATCH SIMPLE ]


[ ON DELETE
action

] [ ON UPDATE
action

] }

[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED |
INITIALLY IMMEDIATE ]


Where
table_constraint

is:

[ CONSTRAINT
constraint_name

]

{ CHECK (
expression

) |


UNIQUE (
column_name

[, ... ] )
index_parameters

|


PRIMARY KEY (
column_name

[, ... ] )
index_parameters

|


EXCLUDE [ USING
index_method

] (
exclude_element

WITH
operator

[, ... ] )
index_parameters

[ WHERE (
predicate

) ] |


FOREIGN KEY (
column_name

[, ... ] ) REFERENCES
reftable

[ (
refcolumn

[, ... ] ) ]


[ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE
action

] [ ON UPDATE
action

] }

[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED |
INITIALLY IMMEDIATE ]



Where
index_parameters

in
UNIQUE
,
PRIMARY KEY
, and
EXCLUDE

constraints are:

[ WITH (
storage_parameter

[=
value
] [, ... ] ) ]

[ USING INDEX TABLESPACE
tablespace

]


And where
excluse_element

in an
EXCLUDE

constraint is:

{
column

| (
expression

) } [
opclass

] [ ASC | DESC ] [ NULLS
{ FIRST | LAST } ]


Drop table:

DROP TABLE
name

[, ...] [ CASCADE | RESTRICT ]

Drop table removes a table from de database system, it can only be dropped by its
owner, if the table is referenced by other table throw a foreign
-
key constraint we must
use
CASCADE
,
CASCADE

will only remove other’s table constraints, it will not remove
th
e entire table.
RESTRICT

is default.

11

|
P a g e


Example:

CREATE TABLE CLIENTS(

ID INTEGER PRIMARY KEY DEFAUL nextval(„clientid‟),

NAME VARCHAR(200),

GENDER VARCHAR(1))

In this example we don’t specify the name for
PRIMARY KEY

constraint so it will
generate a default constraint name (tablename_pkey) in this case clients_pkey, we
also have a sequence number which is very common in tables with sequential id’s:

CREATE SEQUENCE clientid START 1;

(By default sequences start at 1)

Now we create a new table called
ACCOUNTS
:

CREATE TABLE ACCOUNTS(

ID INTEGER PRIMARY KEY DEFAULT nextval('accountid'),

CLIENT_ID INTEGER REFERENCES CLIENTS(ID),

BALANCE INTEGER)
;

Now if we try to drop CLIENTS table we will get an error, because
ACCOUNTS

refer to
a client through a foreign key(
CLIENT_ID
) so we can use
CASCADE
, in this case the
ACCOUNTS

table will lose its foreign key constraints but data stays untouched.

We can also create and drop triggers:

CREATE TRIGGER
name

{ BEFORE | AFTER } {
event

[ OR ... ] }


ON
table

[ FOR [ EACH ] { ROW | STATEMENT } ]

EXECUTE PROCEDURE
funcname

(
arguments

)

(Triggers are associated with one or more events on a table, these events will unleash a
procedure)

DROP TRIGGER
name

ON
table

[ CASCADE | RESTRICT ]

If we drop a
TRIGGERCASCADE

the table that has the associated trigger will be
dropped too.

Alter table:

ALTER TABLE [ ONLY ]
name

[ * ]

action

[, ... ]

ALTER TABLE [ ONLY ]
name

[ * ]


RENAME [ COLUMN ]
column

TO
new_column

ALTER TABLE
name


RENAME TO
new_name

ALTER TABLE
name


SET SCHEMA
new_schema


12

|
P a g e


Where
action

is:

ADD [ COLUMN ]
columntype

[
column_constraint

[ ... ] ]


DROP [ COLUMN ]
column

[ RESTRICT | CASCADE ]


ALTER [ COLUMN ]
column

TYPE
type

[ USING
expression

]


ALTER [ COLUMN ]
column

SET DEFAULT
expression


ALTER [ COLUMN ]
column

DROP DEFAULT


ALTER [ COLUMN ]
column

{ SET | DROP } NOT NULL


ALTER [ COLUMN ]
column

SET STATISTICS
integer


ALTER [ COLUMN ]
column

SET STORAGE { PLAIN | EXTERNAL |
EXTENDED | MAIN }


ADD
table_constraint


DROP CONSTRAINT
constraint_name

[ RESTRICT | CASCADE ]


DISABLE TRIGGER [
trigger_name

| ALL | USER ]


ENABLE TRIGGER [
trigger_name

| ALL | USER ]


CLUSTER ON
index_name


SET WITHOUT CLUSTER


S
ET WITHOUT OIDS


OWNER TO
new_owner

SET TABLESPACE
new_tablespace


With
ALTER

TABLE

we can add, remove or change columns, add, remove or change
constraints, enable or disable associated triggers, set the owner, set clustering, set
tablespace.




13

|
P a g e


2
.
2

D
M
L

DML stands for Data Manipulation Language and it’s used to retrieve or change data.
With DML we can insert, update or delete data:

Select data:

SELECT [ ALL | DISTINCT [ ON (
expression

[, ...] ) ] ]


* |
expression

[ AS
output_name

] [, ...]


[ FROM
from_item

[, ...] ]


[ WHERE
condition

]


[ GROUP BY
expression

[, ...] ]


[ HAVING
condition

[, ...] ]


[ { UNION | INTERSECT | EXCEPT } [ ALL ]
select

]


[ ORDER BY
expression

[ ASC | DESC | USING
operator

] [,
...] ]


[ LIMI
T {
count

| ALL } ]


[ OFFSET
start

]


[ FOR { UPDATE | SHARE } [ OF
table_name

[, ...] ] [ NOWAIT
] [...] ]

Where
from_item

can be one of:


[ ONLY ]
table_name

[ * ] [ [ AS ]
alias

[ (
column_alias

[, ...] ) ] ]


(
select

) [ AS ]
alias

[ (
column_alias

[, ...] ) ]

function_name

( [
argument

[, ...] ] ) [ AS ]
alias

[ (
column_alias

[, ...] |
column_definition

[, ...] ) ]

function_name

( [
argument

[, ...] ] ) AS (
column_definition

[, ...] )

from_item

[ NATURAL ]
join_typefrom_item

[ ON
join
_condition

|
USING (
join_column

[, ...]
) ]


Select retrieves data
from_item

(can be a table a view, etc), with
LIMIT

and
OFFSET
we can specify how many rows it will return and where to start returning them. Select
can also be inserted in Data Query
Language (DQL)
.

Insert data:

INSERT INTO
table

[ (
column

[, ...] ) ]


{ DEFAULT VALUES | VALUES ( {
expression

| DEFAULT } [,
...]
) [, ...] |
query

}

[ RETURNING * |
output_expression

[ AS
output_name

] [, ...]
]

(Inserts data into a specified table)



14

|
P a g e


Update data:

UPDATE [ ONLY ]
table

SET
column

= {
expression

| DEFAULT } [,
...]

[ FROM
fromlist

]


[ WHERE
condition

]

(Updates all the tuples that don’t violate the WHERE condition)

DELETE data:

DELETE FROM table WHERE column = x;


2.3
D
CL

DCL stands for Data Control Language and it’s used to protect data from harm. This
language is mostly used by the database system administrators. With this language we
can
CREATE
,
ALTER
, or
DROP

users from DBMS we can
GRANT

or
REVOKE

privileges
to the user
s, we can create schemas to simplify database queries etc…



15

|
P a g e


3.
Storing and File Structure

3.1
Introduction

The logical system of
PostgreSQL

works
hierarchically, at the highest point we have the
cluster
,
which is a collection of
databases
. Each
cluster

exists within a single directory tree, and the entire
cluster

is serviced by a single
postmaster

(the program that listens for connections requests from
client applications). A
cluster

cannot be named, there is no way to refer to a
cluster

in
PostgreSQL

D
BMS, other than by contacting the
postmaster

servicing that
cluster
.

There are three system tables that are shared between all databases in a cluster:

1.

pg_group


the list of user groups;

2.

pg_database


the list of databases within cluster;

3.

pg_shadow


the l
ist of valid users.

Each
cluster
contains one or more databases.
Database

names must be unique within a
cluster.
Each

database

is a collection of tables (which can be grouped by schemas), functions, operators,
views, indexes, and so on. Each
database

must have a name contrary to
clusters,
this name can
be not unique, and to remove the ambiguity problem we have to use
schemas

to group tables,
within a query we can specify the used
schema.

All the table,function, type names and operators must be unique
within a
schema
. We can refer
to a table in a
schema

by prefixing the table name with
schema

(
schemaName.table
).

CREATE TABLE schemaName.table (…);

SELECT * FROM schemaName.table;

PostgreSQL

has a defined a search path in which we can define the order to search in multiple
schemas if not specified in query.
To show the current path we use:

SHOW SEARCH_PATH;

To set a new path we use the SET command, which change a run
-
time parameter:

SET
SEARCH_PATH TO „schema1‟ „schema2‟ „schema3‟;

(This command is redefines search priorities in between schemas)

Example:

CREATE SCHEMA schm1;

CREATE TABLE1(var0 type, var1 type);

CREATE schm1.TABLE1(var2 type, var3 type);

We had created a new schema, then we create a table without schema by default public
schema
,
then we create a table with the same name as the first but in the created
schema
.

16

|
P a g e


When we do a query over TABLE1 it will get the through the search path and get the table in the
first
schema
, but if we SET the SEARCH_PATH we can define a higher priority to the other
schema
. The
schemas

are very suitable to separate data, we could always

use the same queries if
the different tables have the same name, but we will have different results depending on our
SEARCH_PATH.



17

|
P a g e


3.2
Pagination

The data in
PostgreSQL

is stored in ‘data/base’ directory inside the
PostgreSQL

installation path.
Inside ‘b
ase’ we have multiple directories each one corresponding to a different database, an
oid
. Then inside of these directories we have multiple files each named and corresponding to an
oid

of a table. Each table in the database has its own file in data system.

“Each file/table holds up to 1GB of data. Up 1GB it is created a new file name node2, each file is
handled like a series of 8KB blocks, like written previously.”

The catalog
pg_class

catalogs tables and everything else that has columns or is similar to a table.
This includes indexes, sequences, views, composite types, and TOAST tables.

In this catalog we have a column named
relnamespace

that stores the
oid

of the relation. We
also hav
e columns that define the owner a specific table; that defines the on
-
disk file name (if 0
the file has no name) (
reltablespace)
; one that defines the size of the table in pages(relpages),
and the number of rows in the table(reltuples) whose are used by th
e planner, and can be
updated by VACUUM, ANALYSE, and other DDL commands such as CREATE INDEX; one that
defines the relation type(if is a (o)rdinary table, a (i)ndex, a (s)equence, etc…); and many other
very useful columns .

To see all the relations and th
e associated file names in the catalog we can simply:

SELECT relname,
reltablespace
FROM pg_class; (OR)

SELECT relname,
oid

FROM pg_class;

In the first query we will get pretty much ‘0’s in the
reltablespace

column, which means that
there is no associated file to the relation.

With
reltuples

and
relpages

(in pg_class) we can see how many tuples there are in the relation
and in how many pages this tuples occupies. If we analyze the files sizes, and the number
of
pages we will see that (size of the file)/(number of pages) is approximated 8KB(8192B) this value
correspond to the quantity of information read from the disk in one disk access(hard disk buffer),
this optimizes the disk access which is a common proced
ure in good databases management
systems. The page size is hard defined in
PostgreSQL

source code and there is no reason to use a
larger or a smaller page size other than new hard disk technology.



18

|
P a g e


3.3
Buffer Management

Each page is constituted by a heade
r followed by a list of items of fixed size that actually are
pointers to the tuples in that same page, this tuples can be anywhere in the page and haven’t a
fixed size in other words a page stores variable size tuples and had an index with fixed size
entr
ies (slotted pages). Each tuple is also constituted by a header, and a set of values whose
correspond to the various tuple attributes. When we execute query, the system reads the table
(can also be TOAST) through an index (or not) and load the corresponden
t pages into a shared
buffer in shared memory that contains the most used pages, this buffer is indexed by the Buffer
Descriptors that have Pin Count(prevents page replacement) and LWlock(when the page is
changed). The pages in the Buffer expiration time i
s determined by statistical data (usage
count).Every timea page is accessed the usage count increment and a clock that in each cycle
decrement the usage count the time of life of the page, when it get to zero the page can be
replaced/removed freeing space
for new pages . Basically
PostgreSQL

doesn’t use FIFO (used by
the operating system) it uses statistical data AKA LRU (least recently used)with heap
implementation that in some way is better.


19

|
P a g e


3.3.1
VACUUM and Space Management

PostgreSQL

DBMS has an internal system called MVCC (Multi Version Concurrency Control) that
guaranties ACID (atomicity, consistency, isolation, durability) proprieties, this system does
multiple versions of each tuple, to grants multiple reading and writing access a
voiding database
locks when someone is querying in a table. But there is a downside that it is the garbage created
by this MVCC, in some point we have to remove the unwanted old versions of data.
PostgreSQL

uses a mechanism that does garbage collecting ove
r old versions of tuples that were deleted or
updated in system memory this mechanism is called vacuum. This mechanism is veryconvenient
when we have a table that is constantly updated and consulted.

VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [
table

]
;

If we

don’t pass any parameters to VACUUM command, all the tables in current database will be
processed. VACUUM without FULL parameter simply reclaims space and makes it available for re
-
use. VACUUM FULL does more extensive processing, including moving of tuple
s across blocks to
try to compact the table to the minimum number of disk blocks, to lower the access times.
Example: If we have a table with multiple indexes, and we insert a new tuple in this table, after
insertion there will be updates (new rows) in all

the tables’ indexes, this updates creates ghost
data that must be cleaned with VACUUM cleaner.

When we insert or update data in a table in
PostgreSQL

DBMS we need to remember what pages
in the table have free space available and use one of those, to solve

this problem
PostgreSQL

has
a structurecalled FSM(Free Space Map) which is associated with a table and has its own file in
“base/database
-
oid” named with table
-
oid_fsm and it is always on shared memory. FSM is where
PostgreSQL

keeps track of the pages tha
t have free space available for insertion. Every time
DBMS needs space in a table it checks FSM if it doesn’t find any free space it adds the new info to
the end of the table.

VACUUM updates FSM when it runs over a table. FSM is limited in space (which can

be
redefined)it only supports a limited number of relations (
max_fsm_relations
) and each table can
only allowed to have limited amount of room to store information about pages that have free
space (
max_fsm_pages
). If we have multiple pages with free space

VACUUM gives priority to
pages with more free spaceand put them in FSM, the others that have less free space are never
used for insertion unless that in posterior VACUUM operations they enter into FSM.



20

|
P a g e


FSM is implemented in a binary
-
tree in which each n
ode leaf corresponds to the amount of free
space in each page, the parent nodes are the max of free space between the two children nodes.
Example:


If we need 3 slots of free space in the table we start from any of the leafs (pages) in this case the
one w
ith 1 slot of free space and we go up until we find the needed (or more) space in this case 3
than we go down the three until we find the page with the needed value.

3.3.2
TOAST

It is impossible to insert directly into a table a tuple with very large field

values because of the
fixed page size. So most recent versions of
PostgreSQL

have TOAST (The Oversized
-
Attribute
Storage Technique) which compress and/or broke the large field values into multiple physical
rows. This system is transparently to the user. T
OAST has some limitations like it only support
some data types. When the field is too much large is created a TOAST table corresponding to the
table where the tuple was inserted.



21

|
P a g e


3.4
Partitioning

PostgreSQL

supports basic table partitioning. Partitioning consists in splitting one larger table
into smaller physical pieces, this brings many benefits like improved query performance
especially in huge tables and allow a table to be split into various database se
rvers if the physical
memory of the database server is exceed.
PostgreSQL

supports partitioning via table inheritance.
Each partition must be created as a child table of a single parent table. The parent table is
normally empty; it exists to represent the
entire data set.
In
PostgreSQL

there are two types of
partitioning:

-

Range partitioning in which the table is partitioned into ranges defined by a key column or a
set of columns, for example we can partition a log table by year, with this partition we help
the
buffer management and the query optimization for queries on the most recent table(which is the
most common query);

-

List partitioning in which we specify the key values of each partition that is very useful to
queries on very common values of an attribu
te, in practice there are no differences between the
two partitioning methods, its only two different definitions.To implement partitioning:

1
st
-

We have to create the master table;

2
nd
-

We will create how many children tables we want inheriting from the
master table, normally
this tables does not have any owned attributes (partitions);

3
rd
-
We add table constraints to the children tables to define values allowed in each partition.

4
th
-
For each partition we should create an index on the key column(s), as we
ll as other indexes
that might be useful;

5
th
-
Optionally we should create a trigger that avoids insertion directly on the partition, inserting
the tuple in the master table, avoiding check constraint violation;

6
th
-
We have to make sure that
constraint_excl
usion
(it tells to the query planner to use table
constraints to optimize queries) is not disabled.

Example:

CREATE TABLE log (id int not null,


logdate date not null,


description varchar(200)

);

CREATE TABLE log_2010(


CHECK(logdate>= „2010
-
1
-
1‟ AND logda
te<=‟2010
-
12
-
31‟)

)INHERITS(log);

CREATE TABLE log_2009(


CHECK(logdate>= „2009
-
1
-
1‟ AND logdate<=‟2009
-
12
-
31‟)

)INHERITS(log);


In this case if we insert multiple tuples into log table, if we do a SELECT query over
log_2010

we
will get all the 2010 entries in log table, the same for
log_2009

table.

22

|
P a g e


3.5
Clustering

PostgreSQL

supports clustering. When we cluster a database according to a index it is
physically reordered. The used index must already has been defined over the
clustering table. The clustering is one time operation once is done all the new
insertions or updates w
ill not be reordered according the index (changings are not
clustered).

CLUSTER
indexname

ON
tablename
;

CLUSTER
tablename
;

Once we have clustered the table according to index we can recluster whatever we
want without specifying the index used because the
index is memorized. When the
table is clustering it’s locked, it’s impossible to query the table.



23

|
P a g e


4.
Indexing and Hashing

4.1
Introduction

The use of indexes is motivated by the need for efficient querying in large database
environments when there is a demand for quick response queries. Indexes work by
storing data in a data structure in an organized way, based on user
-
specified attributes
or

expressions, thus allowing the database system to know where to look for a
particular value or set of values in a table.

Although crucial in many cases, when used inappropriately indexes can worsen the
system performance. This happens because creating an
d updating an index demands
computation power which represents overhead for the system, therefore one should
make sure the implemented index will bring more benefits than setbacks.

PostgreSQL

provides several distinct index implementations by using differ
ent data
structures, mu
lti
-
column indexes and multiple
-
indexation. Moreover,
PostgreSQL

takes
care of deciding whether to use an index or not, based on statistical analysis of the
distribution of values in a table, analysis which must be performed by the u
ser
regularly and explicitly by running the command
ANALYZE
.
It also automatically updates
the index every time the targeted table is modified, saving the user from handling it.

4.2
Index types

The index types provided by
PostgreSQL

are: B+ trees (common
ly called B
-
trees,
although they're not the same), Hash, GiST and GIN. The algorithms used on each
index type differ and must be applied according to the query type but, by default,
PostgreSQL

will implement a B
-
tree index, which is the best choice in most

situations.

4.2.1
B
-
tree indexes

B
-
trees (which are actually B+ trees, as mentioned earlier) represent sorted data


which implies that the data involved must be sortable in some way


in a way that
allows for efficient access to tuples, which are identified by a key. In this data
structu
re all values are stored at the bottom level of the tree, also called
leaf level
,
while keys are stored in interior nodes. On the
leaf level
, all values are connected to
each other in a linked list, as shown on the figure below.

Fig. 1


Btree structure example

24

|
P a g e



B
-
trees can handle queries

involving comparisons of an indexed column using the
operators <, <=, =, >=, > or combinations of these operators such as
BETWEEN

and
IN
, as well as the
IS NULL

and
IS NOT NULL.

Additionally, the optimizer can use B
-
tree indexes for queries involving patt
ern
-
matching operators such as
LIKE
or

~



but
only if the pattern is a constant and can be found in the beginning of the string


and for
ILIKE
or
~*
-

but only for patterns on which their first characters are not
affected by upper/lower case conversion (
non
-
alphabetic characters).

PostgreSQL

implementation of B
-
trees is not an
“as is”

one, though. These trees
have between n+1/2 and n sub
-
nodes for each node that is neither a
leaf

nor the
root
, which makes it have few depth levels and therefore provide a
fast access to the
leaf level

where the searched values are.

However,

there are some problems that
need to be solved so that some features can be provided, such as the user need for
a fixed and persistent “image” of the index while performing the query. If

user1

is
adding data to an indexed table while
user2

is doing an index scan this persistence is
jeopardized. To deal with this problem
PostgreSQL

implements high
-
concurrency B
-
trees, based on Lehman
-
Yao algorithm, that represent a good solution at the
exp
ense of a little extra complexity in the data structure.

The usage for B
-
tree indexes is simple, explicit selection of B
-
trees is not necessary
because the indexes in
PostgreSQL

are, by default, B
-
tree indexes. An example on
our test database is:

CREATE INDEX
bT_cIndex

ON
COUNTRIES

(
NAME
);

4.2.2
Hash indexes

Hash indexes use a technique called
hashing

to evenly distribute keys among a
number of
hash buckets
. Each key value added to a hash index is run through a
hashing function. Indexes based on ha
shing provide fast access to the searched
tuple when the query involves finding a specific value within the table. However,
hash indexes have the limitation of being useful only for equality comparisons,
therefore limiting its use to queries involving the
= operator.

The algorithm used by
PostgreSQL

was developed by
W. Litwin

for disk
-
based systems with one processor.
This hashing is dynamic and does not make place for rehashing that would allow the
removal of
overflow
buckets, which reduces the number of s
eek requests to the
disk.
W. Litwin
’s algorithm divides the buckets when any bucket’s capacity is
exceeded and it performs this division following the bucket order.

Hash indexation is not, in any case, better than B
-
tree indexation, it consumes more
resour
ces, it does not allow multi
-
column indexation and it is not WAL
-
logged,
which means that hash indexes might need to be rebuilt after a database crash. For
these reasons, the use of hash indexes is not encouraged.
The command for creating
a hash index is:

CREATE INDEX
hash_cIndex

ON
COUNTRIES

USING hash (
NAME
);

25

|
P a g e



4.2.3
GiST

and GIN

Generalized Search Tree (
GiST
)

and Generalized Inverted Index (GIN)
indexes are not
a single kind of index, but rather an infrastructure within which many different
tree
indexing
strategies can be implemented
, such as B
-
trees, R
-
trees, hB
-
Trees or RD
-
trees
.
T
he p
articular operators with which

GiST
and GIN
index
es

can be used vary
depending on the indexing strategy

and can be selected by the user through the
opclass

parameter on the
CREATE INDEX

method
.

PostgreSQL

provides

several
GiST
and GIN
operator classes
.

To create a GiST index the operator classes must be defined, including comparison
operators, which
PostgreSQL

provides
for several two
-
dimensional geometric
data
types

(for queries using
<<
,
&<
,

&>
,

>>
,
<<|
,
&<|
,
|&>
,
|>>
,

@>
,

<@
,
~=

and
&&
),
and search strategies, and then included in the
CREATE INDEX

function.
The

column

can be of

tsvector

or

tsquery

type
and t
he syntax is as follows:

CREATE INDEX
gist_cIndex

ON
COUNTRIES

USING gist (
NAME
opclass
);

GIN indexes are inverted indexes which can handle values that contain more than
one key, arrays for example.
PostgreSQL

includes GIN operator classes for one
-
dimensional arrays which support indexed queries (using the operators
<@
,
@>, =

and
&&
) and
m
any other GIN operator classes are available in
the

contrib

collection or as separate projects
. The column must be of
tsvecto
r

type and the syntax is as follows:

CREATE INDEX
gin_cIndex

ON
COUNTRIES

USING gin (
NAME
opclass
);

When comparing, we notice that t
here are substantial performance differences
between the
se

two index types
:


A GiST index is

lossy
, meaning that the index may produce false matches, and it is
necessary to check the actual table row to eliminate
them, which
PostgreSQL

does
automatically when needed.

This
causes performance degradation due to
unnecessary fetches of table records that t
urn out to be false matches. Since
random access to table records is slow, this limits the usefulness of GiST indexes.

GIN indexes are not lossy for standard queries, but their performance depends
logarithmically on the number of unique words.

In choosing

which index type to use, GiST or GIN, consider these performance
differences:


GIN index lookups are about three times faster than GiST


GIN indexes take about three times longer to build than GiST

26

|
P a g e



GIN indexes are moderately slower to update than GiST index
es, but about 10
times slower if fast
-
update support was disabled


GIN indexes are two
-
to
-
three times larger than GiST indexes

GIN

indexes are best for static data because lookups are faster. For dynamic data,
GiST indexes are faster to update.
In particula
r
,

GiST

indexes are very good for
dynamic data and fas
t if the number of unique words
is under 100,000,
while

GIN

indexes will handle 100,000+
words

better but are slower to update.

However,

GIN

index build time can often be improved by increasing

maintena
nce_work_mem

(
which s
pecifies the maximum amount of memory to be
used by maintenance operations, such as

VACUUM
,

CREATE INDEX
, and

ALTER TABLE
ADD FOREIGN KEY
)

while

GiST

index build time is not sensitive to that parameter.

4.3

Using indexes for file
organization

PostgreSQL

organizes files in
hea
p
, with no clustering process involved. Nevertheless, it
is possible to explicitly perform file organization based on a previously created index.
To do so, using our test database, the
CLUSTER

command can be invoked with the
syntax:

CLUSTER [VERBOSE]
COUNTRIES

[USING
bT_cIndex
];

The
VERBOSE

option prints a progress report as each table is being clustered.

By using the
CLUSTER

command, the table is physically reordered based on the
information of the index as long as the specified table is already indexed with the
specified index. The clustering operation is performed only once, meaning that the
subsequent modifications operated

on the table will not be clustered. Afterwards, one
can re
-
cluster the table by performing the command again.
Also, setting the
table's

FILLFACTOR

storage parameter to less than 100% can
help

preserving cluster
ordering during updates, since updated rows
are kept on the same page if enough
space is available there.

Because
PostgreSQL

records the index with which the
clustering was performed on a table, it allows re
-
clustering by just invoking:

CLUSTER [VERBOSE]
COUNTRIES;

The same happens with any other
clustering that has been performed on any other
table, which allows re
-
clustering of all tables that have been previously clustered, using
the same indexes used before for each table, by invoking:

CLUSTER;

I
t is advisable to run

ANALYZE

after clustering,
so that the planner can update the
statistics about the ordering of tables
. Otherwise, the planner might make poor choices
27

|
P a g e


of query plans.

4.4

Multi
-
column indexation

PostgreSQL

allows the creation of multi
-
column indexes but only for B
-
tree, GiST and
GIN
indexes, Hash indexes do not support this functionality. A multi
-
column index can
cover up to 32 columns, although this limit can be modified when building
PostgreSQL
,
and can be used with queries that involve any subset of the indexed columns.

Multi
-
colum
n B
-
tree indexes are most efficient when there are constraints on the two
leading (leftmost) columns. In particular, considering the index
IND
(a,b,c)
,
when
there is an equality constraint over
a

and an inequality constraint over
b

the index scan
will be reduced to the portion of the index limited by those constraints. Additionally, if
there is also a constraint over
c
, once it’s on the right of the two leading columns this
constraint will be checked on the index, thus avoiding tabl
e visits, but will not reduce
the portion of the index that has to be scanned through. This index could be used for
queries that don’t apply constraints on
a
, but since the index is primarily sorted by
a

the entire index would have to be scanned, so in mos
t cases the planner would choose
a sequential table scan over using the index.

When using GiST multi
-
column indexes the efficiency is highly related with the amount
of distinct values on the first column: if there are only a few, the efficiency will be lo
w
even if there are many distinct values on the remaining columns. Moreover, the
constraint on the first column is more important than any other constraints on the
remaining columns for determining the portion of the index to be scanned.

Unlike B
-
tree and

GiST multi
-
column indexes, GIN index search efficiency does not
depend on which columns are indexed.

In every case, each column must be used with operators that fit the index type. Also,
multi
-
column indexes must be used moderately because, in most situat
ions, a single
column index ought to be enough to cover the search efficiency needs one might have,
while saving space and time.

An example for multi
-
column indexes usage is:

CREATE INDEX
trans_Index

ON
TRANSACTIONS
(
id
,
branch_id,value
);


4.5

Combination of

multiple indexes

Combination of multiple indexes arises because there are two major limitations on the
use of single indexes:

-

Only query clauses that use the index's columns with operators of its
operator class can be used;

-

Only clauses joined with

AND

can be used:

a query like

WHERE a = 5 OR b =
6

could not directly use the index.

28

|
P a g e




To overcome these limitations,
PostgreSQL

provides the usage of multiple indexes
combined, including multiple usage of the same index. On the example above,
combining multip
le indexes would be performed by operating over the same index
twice (once for each column) and retrieve the OR join of both results. In detail, this
involves scanning through the needed indexes and creating for each index a bitmap
containing the location
of every row matching that index’s conditions, and then joining
the acquired bitmaps (one for each index) through
OR

or
AND

operations, as needed
by the query. Once the bitmaps are adequately merged, the represented table rows are
visited in physical order, which means that any previously defined order of the indexes
is lost and consequently any ORDER BY clause on the query wi
ll need a separate sort
step to take effect. For this reason, taking into account the extra index scans that might
eventually be needed, it may happen that the planner chooses to use a simple index
scan despite the existence of additional indexes.

This sol
ution can out
-
perform multi
-
column indexation, but each situation deserves a
close analysis before making any decision about whether to create multi
-
column
indexes or trust
PostgreSQL
’s combination of multiple indexes.

4.6

Unique and Partial Indexes

Postgr
eSQL

also allows the creation of Unique Indexes and Partial Indexes.

A Unique Index is used to force the uniqueness of a column’s value or the uniqueness
of combined multi
-
column values and must be B
-
tree based. On a multi
-
column index
situation only the r
ows where ALL indexed columns are equal would be rejected. This
kind of index is automatically created whenever there is a
UNIQUE

constraint over a
table attribute or when the table has a primary key defined, so the explicit creation of a
Unique Index can
lead to a duplicate index situation on a table. The syntax to use this
index is:

CREATE UNIQUE INDEX
un_Index

ON
CLIENTS
(
NAME
);


A Partial Index is used to apply an index over a subset of a column, thus indexing only
the most relevant part of the table. Co
nsidering an auction system where both active
auctions and already finished auctions are stored in the database table
auction
s
.

If
the system has been running for a long time, the biggest fraction of the table is the one
with the already finished auctions and yet it’s the least accessed one. In this situation,
creating an index aiming only at the active auctions would improve the
performance of
queries for the most accessed fraction (the one with the active auctions) and save time
and space by not indexing the heavy and least accessed part of the table (the one with
the already finished auctions). Such index could be created with t
he following syntax:


CREATE INDEX
indexName

ON
auctions
(
codAuction)


WHERE
active

IS NOT true;

29

|
P a g e




4.7

Temporary inconsistency of structures

PostgreSQL

allows the indexes to be temporarily inconsistent because when there is
the need to
delay

the
verification of integrity constraints
, which happens mostly in
transaction operations. This can be achieved with the clause
DEFERRABLE

on the
CREATE TABLE

and
ALTER TABLE

commands when creating the tables on which the
indexes will operate. This topic will
be further approached in the Transactions Section
(Section 6).



30

|
P a g e


5.
Processing and Optimization of queries

5
.1
Introduction

Processing and optimizing queries is a crucial issue in database management systems.
When systems demand a heavy load of complex queries, being able to determine the
most efficient way of performing a query is a valuable task. The utility of the system
may

even depend on it. Therefore, query optimization is given a great deal of attention
by DBMS’s and it’s one of the steps of query processing. Additionally,
PostgreSQL

uses
C as intermediate language.

Processing a query in
PostgreSQL

until obtaining a resu
lt can be briefly described in 5
distinct steps:

1.

The inquiring application program establishes a connection with the
PostgreSQL

server and sends the query.

2.

The

parser

checks
the

query for correct syntax and creates a

query tree
.

3.

The

rewrite system

takes t
he query tree created by the
parser

and looks for
any

rules
, which are
stored in the

system catalogs
,

to apply to the query tree. It
then
performs the transformations given in the

rule bodies
.

4.

The

planner/optimizer

takes the rewritten

query tree and creates a

query
plan

by estimating the cheapest path to take towards the final result, and this
plan will

be the input to the

executor
.

5.

The executor recursively steps through the

plan tree

and retrieves rows in the
way represented by the pl
an. The executor makes use of the

storage
system

while scanning relations, performs

sorts

and

joins
,
evaluates

qualifications

and finally hands back the rows derived.


Figure 2 represents graphically these steps:



Fig. 2


The path of a query

31

|
P a g e


5.2

Establishing the connection and sending the query

In
PostgreSQL

connections to the server are established on “process per user” basis.
The
client process

can be any program that understands
PostgreSQL
’s
Frontend/B
ackend protocol
. Each
client process

connec
ts to one exclusive
server
process

and the system doesn’t know beforehand how many connections will be made.
To deal with this uncertainty,
PostgreSQL

uses a
master process

called
postgres
to
create a new
server process

every time a new connection request is made, by listening
at a specified TCP/IP port for incoming connections. The tasks performed by the server
communicate with each other through
semaphores

and
shared memory

to ensure data
integrity throughout concurren
t data access. After establishing a connection, the
client
process

can send a query, which is transmitted using plain text having no parsing done
at this point, to the
server process
. The server then processes the query and transmits
the resulting rows to
the client over the same previously established connection.

5.3

The Parser Stage

At the Parser Stage, queries are analyzed syntactically and their tokens are orderly
gathered into a
parse tree
, transformed to a detailed internal representation of a SQL
que
ry resulting in the form of a
query tree

and then passed to the next stage of the
process. These steps can be detailed and divided as follows:

5.3
.1
Parsing

The parser starts by verifying the String
-
form (in ASCII text) query in terms of syntax
correctness
. If the query is correct a
parse tree

is generated; otherwise, an error
message is returned.

The

lexer
,

defined in the file

scan.l
,

is responsible for recognizing

identifiers
, the

SQL
key words
,

etc. For every key word or identifier that is found, a

toke
n

is gen
erated
and handed to the parser, which is defined in
the file

gram.y

and consists of a set
of

grammar
rules

and

actions

that are executed whenever a rule is
fired. The code
(in C language) of the actions
is used to build up the parse tree.



32

|
P a g e


5.3
.
2

Transforming

The parse tree is then passed to the
rewriter
(which is an implementation of
PostgreSQL
’s Rule System), that will apply a set of rules in order to transform the
parse tree

into zero or more
query trees

that will be later passed to the
planner
. A
query tree

is an internal representation of a SQL query in which all elements are
stored separately, using the following structure:



Command type


indicates which command produced the query tree (
SELECT, INSERT, UPDATE, DELETE
)



Range table


stores th
e list of relations used in the query



Resulting relation


it’s the index of the range table that identifies the
relation on which the query results will take effect. Ex.: The affected table or
view after a
DELETE
operation.



Target list


it’s the

list of
expressions that define the result of the query.

Ex.:
rows to be added after an
INSERT
; rows that will replace the old rows after
an
UPDATE
. There is no need for a target list after a
DELETE

operation since
no new rows will be produced after it.



Qualificat
ion


i
t’
s an expression
with a

Boolean
value
that tells if the
operation (INSERT, UPDATE, DELETE or SELECT) for the final result row should
be executed or not. It is the WHERE clause of an

SQL

statement.



Join

tree



shows the structure of the
FROM

clause

(JOIN expressions)
. For a

simple query like
SELECT

FROM t1, t2, t3

the join tree is
simply
a list of the
FROM

items,
since the join order is irrelevant
. But when JOIN expressions are
used, we have to
respect

the order shown by the joins. The restrictions
associated with particular JOIN clauses (from
ON

or
USING

expressions) are
stored as qualification expressions attached to those join tree nodes. It
is

convenient to store the top
-
level
WHERE

expression as a
qualification
attached to the top
-
level join tree item, too. So really the join tree
represents both the
FROM

and
WHERE

clauses of a
SELECT
.



Others


remaining parts of the query tree, such as the
ORDER BY

clause.

The query tree created by the transformati
on process is structurally similar to
the raw parse tree in most places, but it has many differences in detail. For
example, a

FuncCall
node
in the parse tree represents something that looks
syntactically like a function call. This might be transformed to e
ither
a

FuncExpr

or

Aggref

node depending on whether the referenced name turns
out to be an ordinary function or an aggregate function. Also, information about
the actual data types of columns and expression results is added to the query
tree.

In terms of
structure, the resulting
query tree

is similar to the original
parse tree
.
However, there can be pointed some differences when observing the details like,
for example, a function node (
FuncCall
) in the
parse tree

can be transformed to a
function expression (
FuncExpr
)

node or to an aggregation of functions
(Aggref)

node in the
query tree
; the data types of columns and expression results are
added to the
query tree
, unlike what happened with the
parse tree
.

33

|
P a g e


It’s pos
sible to view these trees in the log file of the server by changing the
configuration parameters
debug_print_parse, debug_print_rewritten
and

debug_print_plan.

5.4

The Planner

The
planner

is responsible for elaborating the best possible query execution plan. To do
so, the
planner

must contemplate all possible ways of executing the same query


whenever it’s reasonable to do it


and then select the one which is expected to be the
fastest o
ne.

Whenever it isn’t reasonable to examine all possible execution alternatives, to
determine an acceptable query plan
PostgreSQL

uses a
Genetic Query Optimizer

for
queries with a number of joins superior to the threshold defined by the parameter
geqo_thre
shold
.

5.4
.1
Generating possible plans

The generated plans are directly related with the available indexes. If a query has an
attribute restriction and this attribute matches the key attribute of an index or if
there are ordering clauses in the query then
an execution plan is created for the
respective index. In every situation, a sequential scan plan is also created because
there’s always the possibility of performing a sequential scan on a relation.

If the analyzed query demands joining different relation
s, plans for joining relations
are considered after all doable plans have been determined for scanning single
relations. The available join strategies are
nested loop join
,
merge join

and
hash
join
.

For testing purposes on our database “teste”, we will exp
licitly disable different
PostgreSQL
’s functionalities, depending on the example, so that we can show the
exact type of join we want. Thanks to
PostgreSQL
’s graphic explain of queries, we’ll
be able to better represent these operations.

Nested Loop Join
-

On a
nested loop join

the relation on the right is scanned once for
every row found in the left relation. This strategy is easy to implement but can take
a long time. Nevertheless, if the relation on the right can be scanned with an index
scan this can be
a good option. It’s possible to use values from the current row of
the relation on the left as keys for the index scan on the right.

Example:

SELECT * FROM CLIENTS INNER JOIN BRANCHES
USING(COUNTRY_ID);

34

|
P a g e




Merge Join
-

On a
merge join

it is necessary to,
first of all, sort each table according
to the join attributes, through an explicit sort step or by scanning the relation in the
proper order using an index on the join key. Then, tables are joined by scanning
each table only once and joining the tables by

their join attributes.

Example:

SELECT * FROM ACCOUNTS INNER JOIN BRANCHES USING(ID);



Hash join
-

On a
hash join
, the process starts by scanning and loading the relation on
the right into a hash table, using the join attributes as hash keys. Then, the

relation
on the left is scanned and the values corresponding to the join attributes are used as
hash keys to find the matching rows in the previously generated hash table.

Example:

SELECT * FROM ACCOUNTS INNER JOIN BRANCHES USING(ID);



If the query inv
olves more than two relations, a tree of join steps, each with two
inputs, is generated. The planner examines different possible join sequences in
order to find the cheapest one.

If there are less than
geqo_threshold

(12 by default) relations, an exhaust
ive search
is performed in order to find the best join sequence. Preferentially, the planner will
consider joins between any two relations for which there is a
WHERE

clause. Join
pairs with no join clause are considered only if there are no join clauses to

any other
35

|
P a g e


relation. All possible plans are generated for every join pair considered by the
planner and the one estimated to be the cheapest is chosen.

The finished plan tree consists of sequential or index scans of the base relations,
plus nested
-
loop, me
rge, or hash join nodes as needed, plus any auxiliary steps
needed, such as sort nodes or aggregate
-
function calculation nodes. Most of these
plan node types have the additional ability to do

selection
(discarding rows that do
not meet a specified Boolean c
ondition) and

projection

(computation of a derived
column set based on given column values, that is, evaluation of scalar expressions
where needed). One of the responsibilities of the planner is to attach selection
conditions from the

WHERE

clause and comp
utation of required output expressions
to the most appropriate nodes of the plan tree.


5.4.2

Genetic Query Optimizer (GeQO)

Among all operations in a database, the most complex to deal with is the join
operation. The number of possible plans increases exponentially with the number of
tables involved. The fact that
PostgreSQL

implements
nested loop join, hash join

and
merge join
) makes it even more difficult.

GeQO is the genetic algorithm used by
PostgreSQL

for query optimization. This
algorithm is used to reduce the time of planning the execution of a query for
complex queries. It uses the standard planner code to generate plans

for scans of
individual relations and then join plans are generated using the genetic approach. In
this context, the

set of possible
join sequences for the execution of a given query
is
considered as a

population

of

individuals
. The degree of adaptation

(the cost, in this
context)

of an individual to its environment is specified by its

fitness
.

The
implementation of this approach follows certain characteristics based on usual
genetic algorithms:

-

Replacement of the least fit individuals in a population (n
ot whole
-
generation
replacement);

-

Individuals recombination crossover;

-

Mutation as genetic operator is deprecated so that no repair mechanisms are
needed.

The

G
e
QO

module allows the

PostgreSQL

query optimizer to support large join
queries effectively throu
gh non
-
exhaustive search.

Using GeQO algorithm or not is a decision that can be made by the user with the
function
geqo(true|false)
. Also, it’s possible to define the number of join
operations considered enough to justify the use of GeQO, with the function

geqo_threshold(integer n)
;



36

|
P a g e


5.4.3

Materialization

In some plan types, such as nested loops and merge joins, the process of planning
involves rescanning the inner input of these operation nodes, which means backing
it up for further use. What usually happ
ens is: the executor requires the node to
retrieve the same output tuples, all of them or just a specific portion, over and over,
which can be very expensive, especially if the plan used to retrieve those tuples has
a high cost associated. Materializing is

useful for these cases. When facing this kind
of situation, the planner considers inserting a Materialize node on top of the target
inner plan to store the generated tuples in an in
-
memory buffer or in a temporary
file, making it cheap to retrieve these t
uples whenever needed.

By default, materialization is enabled in
PostgreSQL

but, although it is not possible
to demand the query planner to completely suppress the use of materialization, the
user can limit its use with the function
enable_material(boolean
)
, so that
the planner only materializes nodes whenever it is required to assure correctness.

Example:

SELECT * FROM
CLIENTS

INNER JOIN
ACCOUNTS

USING(
BRANCH_
ID);


37

|
P a g e


5.5

The Executor

Finally, the
executor
recursively processes the plan generated by the
planner

to get the
required set of rows, which works by delivering a row each time a plan node is called or
report that there are no more rows to deliver. If the query type is SELECT, the executor
simply returns all rows delivered by the
planner
; if the qu
ery type is INSERT, the
executor will insert each row delivered by the
planner

into the respective table; if the
query type is UPDATE, the executor will insert into the respective tables the rows
delivered by the
planner
(which are the ones to be updated a
nd come with a row
identifier) and marks the identified rows as deleted; if the query type is DELETE then
the executor will simply take the row identifier column (the only one delivered by the
planner
) and
mark

the target rows as deleted.

5.6

Algorithms f
or basic operations

Because there are many available algorithms for the
planner

to “choose” from (i.e. to
determine execution costs and pick the cheapest one), in this section we will describe
them, leaving out the join algorithms that were already mentioned.



5.6
.1
Algorithms for select operations

-

Seq Scan algorithm
: the sequential scan algorithm is the simplest one and
consists of scanning through all rows checking if a given condition is verified. In
large tables it will probably not be the most efficient option but in small tables,
or in situations where the desir
ed tuple is likely to appear on the first rows, it
may be worth to use it. The query plan for this algorithm is always generated,
even though it may end up not being used. For queries with no related indexes,
the
Seq Scan

will probably be chosen by the pla
nner.

Example:

SELECT * FROM
CLIENTS
;

-

Bitmap Scan algorithm
: the
Bitmap Scan

uses the different indexes created on
the target table to achieve a better performance on complex queries, giving use
to the multiple indexation possibility (discussed in Chapte
r 4). Although it is a
sophisticated scan method, it may demand a lot of seek operations, therefore it
should be used with few values.

Example (using a previously created index over the column
NAME
):

SELECT

* FROM CLIENTS WHERE NAME > „P‟;



38

|
P a g e


-

Index Scan
algorithm
:

the index scan algorithm is based on a search over an
index. If a query has a > or < condition (for example:
SELECT * FROM
TABLE WHERE A
>5
), the scan will only start from (or stop at) the value
involved in the condition. On the given example, th
e
Index Scan

would start
from the first row with A>5 and stop at the end of the table. Unlike the
Seq
Scan
, the
Index Scan

returns the resulting rows ordered according to the index
order (not in the table appearance order), which may lead to extra seek
ope
rations on the table.

-

Example:

-

SELECT

* FROM CLIENTS WHERE ID > 43;


5.6.2

Algorithms for ordering operations

When explicitly requested, a user can obtain an ordered output of a performed
query by using the command
ORDER BY
. This command has the optional
parameters
[ASC|DESC]

with which the user can determine whether he wants
the results in ascendant or descendant order, respectively.

The
ORDER BY

command can be used as the following example:

SELECT
*
FROM
table


ORDER BY
sortExp1
[
ASC|DESC
],
sortExp2
[
ASC|DESC
] [,…]


There are 2 kinds of sorts that can be performed by
PostgreSQL
, depending on the
available memory for the operation, which is defined in the parameter
work_mem
.
If the sort operation demands more m
emory that the amount specified by this
parameter, then the sort operation to be performed will be a disk
-
based one called
external

merge sort
, in which the target values are divided in several blocks that will
be ordered separately and then merged; otherwise, the sort operation to be
performed will be a memory
-
based one, with the well
-
known
quicksort
algorithm.

Additionally, the sorting can be
performed through the use of index properties, like
B
-
Tree indexes, which can be used to order a table just by running through the
leaves of the B
-
Tree.

-

Example (the ordering operation being performed using the previously created
index over the column
NAME
):

-

SELECT

* FROM CLIENTS ORDER BY NAME;



39

|
P a g e


5.7

The statistics used by the planner

The planner has to estimate the number of rows returned by a query in order to make
good cost estimates. Therefore, there is a need for statistical information about the

system data and operations. One of the main components of these statistics is the
number of entries in each table and index, as well as the number of blocks in disk
occupied by each index and table, which is usually 8K per table, defined in the
parameter
BLOCKSIZE. This information is stored in the system table
pg_class
, on
the columns
reltuples

and
relpages
.

For each line of the
pg_class

table there is information about a table in the system,
including information about indexes, sequences, views, composed

types and TOAST
tables.

We can access this info with the query:

SELECT relname, relnamespace from pg_class;


It is important to remember that the values in this table are not updated on the fly,
meaning that sometimes they may be slightly out
-
of
-
date. Anyway, the user can
explicitly ask the system to update this data by using the commands
VACUUM

and
ANALYZE
. The
VACUUM

command physically removes tuples that have been erased
from tables but are still taking memory space.

Many queries return only a set of rows from a table, mostly when associated to the
WHERE

clause. Thus, the planner makes an estimate of the select
ive data of that clause
and stores in the system table
pg_statistics
, which keeps detailed information
about data on the tables as well as index expressions. The system table
pg_statistics

cannot be accessed directly by common users, though. To do so,
ther
e is a system view called
pg_stats

that presents information stored in
pg_statistics

table.

The
most_common_vals

column shows the values that appear more frequently in a
given table, and by default is assigned the value 10 upon an
ANALYZE

operation. This
value can be changed manually with the instruction
ALTER TABLE SET
STATISTICS
.

40

|
P a g e


The view
pg_stats

also provides statistic correlations between the physical order of tuples
and logical order of columns.


This information can be easily accessed with the
query:

SELECT * FROM
pg_stats
;








As shown earlier, the planner chooses the plan with lower cost, among all generated
plans, to be executed. The calculation of this cost is based on estimated values stored
as statistic data in the system. The statisti
cs are collected and updated every time the
command
ANALYZE

is issued.

These statistics are collected by a subsystem called
statistics collector

which counts
accesses to tables and indexes in both disk
-
block and individual
-
row terms, tracks the
number of r
ows in each table and the last time the commands
VACUUM

and
ANALYZE

were issued for each table. Also, it can count calls to user
-
defined functions as well as
the total time spent in each one.

Since collection of statistics adds some overhead to query execu
tion, the system can be
configured to collect or not collect
this
information. This is controlled by configuration
parameters that are
usually

set in

PostgreSQL
.conf
:


The parameter

track_counts

enables/disables collection of statistics about
table and
index accesses.


The parameter

track_functions

enables
/disables

tracking of usage of user
-
defined functions.


The parameter

track_activities

enables
/disables
monitoring of the current
command being executed by any server process.

In each session, the user (s
uperusers only) can enable or disable these parameters
by using the command SET.

As mentioned before, one should have in mind that the information is not updated on
the fly. Each running process transmits its statistics to the collector right before
finishing any activity, therefore, a query or transaction that is still taking place wi
ll not
affect the current statistics. This allows that the system to perform several
simultaneous queries on the statistics without these operations compromising the
each others. Also, the collector produces new reports in time intervals defined by the
par
ameter
pgstat_stat_interval
, which by default is set to 500 milliseconds.

41

|
P a g e


PostgreSQL

has many predefined views allowing the user to analyze the views from
different perspectives. Also, there are many predefined functions (used to obtain these
views) that
can be used directly to perform custom queries on the statistics.


5.7.1

The Explain command

Visualizing the plan chosen by the planner can be done by using the command
EXPLAIN

(we’ve been using it to show the examples). Besides showing each step
taken to

perform a given operation, it also shows the estimated costs of every
node, before and after the current operation, the number of rows returned by the
operation and their average size. Additionally, we can add some parameters to the
EXPLAIN

command in ord
er to obtain more information. These parameters are:

ANALYZE



it makes the
EXPLAIN

operation execute the query instead of only
generating and showing the plans for it;

VERBOSE



shows the internal representation of the plan tree instead of only a
summary.

Here’s an example:

EXPLAIN VERBOSE SELECT * FROM CLIENTS, ACCOUNTS

WHERE CLIENTS.NAME>'P' AND ACCOUNTS.BRANCH_ID = 534;


Also,
PostgreSQL
’s pgAdmin application has a very enlightening visual
representation of the sequence of operations:



42

|
P a g e


5.7.2

How
the planner uses statistics

To analyze better how statistics are used by
the
query planner, we’ll use the
regression tables

of
PostgreSQL
’s

database. Here’s a first example:

EXPLAIN SELECT * FROM tenk1;



QUERY PLAN

---------------------------------------------------------
----


Seq Scan on tenk1 (cost=0.00..458.00 rows=10000
width=244)


The number of pages and rows is obtained from the table
pg_class
, as
mentioned ea
rlier.


An example with a
WHERE

clause is now descr
ibed:

EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;


QUERY PLAN

---------------------------------------------------------
----------


Bitmap Heap Scan on tenk1 (cost=24.06..394.64 rows=1007
width=244)


Recheck Cond: (unique1 < 1000)


-
>

Bitmap Index Scan on tenk1_unique1
(cost=0.00..23.80 rows=1007 width=0)


Index Cond: (unique1 < 1000)


The planner examines the
WHERE

clause and checks the selection function for the
operator
“<”

in the

table
pg_operator
. It then obtains a histo
gram for the