4. PostgreSQL - Under The Hood.doc - FTP Directory Listing

bossprettyingData Management

Nov 28, 2012 (4 years and 6 months ago)

236 views










P O S T G R E S Q L

U n
d e r t h e H o o d

W o r k b o o k

N o v e m b e r 1 7
, 2 0 0 8

PostgreSQL Basics


Page
i

TABLE OF CONTENTS

TABLE OF CONTENTS

................................
................................
................................
........................

I

1

UNDER THE HOOD

................................
................................
................................
......................

1

1.1

D
ATABASE
F
ILE
L
AYOUT

................................
................................
................................
...........

1

1.2

R
ELATION
S
IZES

................................
................................
................................
.......................

3

1.3

A
NALYZING
T
ABLES
................................
................................
................................
..................

5

1.4

V
ACUUMING
T
ABLES

................................
................................
................................
.................

7

1.5

C
LUSTERING
T
ABLES
................................
................................
................................
...............

10

PostgreSQL Basics


Page
1

1

UNDER THE HOOD

1.1

DATABASE FILE LAYOUT

PostgreSQL is organized such that
data is stored in tables and

tables are grouped
in

schemas and
schemas in databases.

Above this, databases are grouped into clusters and controlled by a single
postgres instance running on a particular port.

Using psql, postgres’ command
-
line interactive terminal,
list

all dat
abases

in a PostgreSQL
instance by issuing the following SQL query:

SQL Query

SELECT

oid, datname

FROM

pg_database;

Sample Output


oid | datname

-------
+
------------------


1 | template1


11510 | template0


16425 | postgis_template


17215 | po
stgis


17230 | postgres

(5 rows)

From this list, it’s clear to see that there are
five

databases in this cluster or instance. To get a
better idea how things are arranged physically on disk,
look in the $PGDATA directory
(typically
C:
\
Program Files
\
Postg
reSQL
\
8.3
\
data

in Windows or
/usr/local/pgsql/data

in
Linux).

Command line

cd $PGDATA

ls

-
l

Sample Output

drwx
------

6 postgres users 4096 Apr 4 15:50 base

drwx
------

2 postgres users 4096 Apr 30 23:20 global

drwx
------

2 postgres users 4096 Feb

6 20:48 pg_clog

-
rw
-------

1 postgres users 3429 Feb 6 20:48 pg_hba.conf

-
rw
-------

1 postgres users 1460 Feb 6 20:48 pg_ident.conf

drwx
------

4 postgres users 4096 Feb 6 20:48 pg_multixact

-
rw
-------

1 postgres users 17566 Apr 30 23:20 pgsql.l
og

drwx
------

2 postgres users 4096 Feb 6 20:48 pg_subtrans

drwx
------

2 postgres users 4096 Feb 7 21:41 pg_tblspc

drwx
------

2 postgres users 4096 Feb 6 20:48 pg_twophase

-
rw
-------

1 postgres users 4 Feb 6 20:48 PG_VERSION

drwx
------

3 po
stgres users 4096 Feb 12 14:08 pg_xlog

-
rw
-------

1 postgres users 16654 Feb 7 22:43 postgresql.conf

-
rw
-------

1 postgres users 50 Apr 28 23:17 postmaster.opts


PostgreSQL Basics


Page
2


This listing reveals a directory called base. This is where your databases reside.

Co
mmand line

cd $PGDATA/base

ls

-
l

Sample Output

drwx
------

2 postgres users 4096 May 22 20:46 1

drwx
------

2 postgres users 4096 Feb 6 20:48 11510

drwx
------

2 postgres users 4096 Mar 16 22:32 16425

drwx
------

2 postgres users 4096 May 22 20:45 1721
5

drwx
------

2 postgres users 4096 May 22 20:46 17230

No
tice that
in this particular installation, the base

subdirectory contains
five

additional
directories with the same name as the oid field from your SQL query earlier. You would be
correct to observ
e that PostgreSQL stores its databases in the file system in the $PGDATA/base
directory. In this particular installation, the subdirectory named
1

contains the
template1

database, the subdirectory named
11510

contains the
template0

database, and the
subdi
rectory named
17230

contains the
postgres

database.

Let’s dive a little deeper:

Command line

cd $PGDATA/base
/17230

ls


Sample Output

11429 11451 1259 2612 2654 2667 2684 2702 2835 3603

11431 11453 2600 2613 2655 2668 2685 2703 2836 360
4

11433 11454 2601 2614 2656 2669 2686 2704 2837 3605

11434 11456 2602 2615 2657 2670 2687 2753 2838 3606

11436 11458 2603 2616 2658 2673 2688 2754 2839 3607

11438 11459 2604 2617 2659 2674 2689 2755 2840 3608

11439 114
61 2605 2618 2660 2675 2690 2756 2841 3609

11441 11463 2606 2619 2661 2678 2691 2757 3501 3712

11443 1247 2607 2620 2662 2679 2692 2830 3502 3764

11444 1248 2608 2650 2663 2680 2693 2831 3503 3766

11446 1249 2609 26
51 2664 2681 2696 2832 3600 3767

11448 1250 2610 2652 2665 2682 2699 2833 3601 pg_internal.init

11449 1255 2611 2653 2666 2683 2701 2834 3602 PG_VERSION

Again we see a directory listing that looks similar to our base directory wi
th numeric filenames.
If you were to assume that these filenames correspond to
the
oids of tablenames you would be
mostly correct. In postgres, all tables and indexes are stored physically on disk using such
filenames in their appropriate database direct
ory. When a table
(or index)
is first created, it is
assigned a new oid and stored as a new file. However, if you
ALTER

a table in a way that forces
the database to create new replacement copy of the table, a new numeric filename is generated
and used to

store the modified table. All tables in a database are catalogued in the
pg_class

table and these filenames are
referenced by the
pg_class.relfilenode

attribute.


PostgreSQL Basics


Page
3

To compare the
relfilenode

attribute in the
pg_class

system table and our directory
listin
g, enter the following sql statement:

SQL Query

SELECT

relfilenode, relname, reltuples, relpages

FROM

pg_class

WHERE

reltablespace = 0

ORDER BY

relfilenode
;

Sample Output


relfilenode | relname | reltuples | relpages

-----------
--
+
------------------------------
+
-----------
+
----------


1247 | pg_type | 269 | 6


1248 | pg_autovacuum | 0 | 0


1249 | pg_attribute | 1774 | 3
0


1250 | pg_autovacuum_vacrelid_index | 0 | 1


1255 | pg_proc | 2143 | 50


... | ...


1.2

RELATION SIZES

In the previous section we learned that every table is catalogued in the
pg_class

sy
stem table.
To find a table in th
is catalogue
, search for a row whose
relname

equals your tablename (or
more precisely, whose
oid

equals the
oid

of your table
)
. For example, to find
a table called
pg_attribute

stored in the system schema
pg_catalog
, ente
r the following query:

SQL Query

SELECT

relfilenode, relname, reltuples, relpages

FROM

pg_class

WHERE

oid = 'pg_catalog.pg_attribute'::regclass;

Sample Output


relfilenode | relname | reltuples | relpages

-------------
+
--------------
+
-----------
+
----------


1249 | pg_attribute | 1774 | 30

(1 row)

T
he
reltuples

column tell us approxima
tely how many tuples are in each
table and t
he
relpages

column tells us how many disk pages are required to hold the table’s tuples. In our
previ
ous query, we see that the
pg_attribute

table has about 1774 tuples spanning 30 pages.

Before that, we noted that the
pg_type

table has about 269 tuples spanning 6 pages.


PostgreSQL Basics


Page
4

Let’s look again at the directory system.

Command line

cd $PGDATA/base
/17230

ls


l 1247 1249

Sample Output

-
rw
-------

1 postgres users 49152 May 22 20:46 1247

-
rw
-------

1 postgres users 245760 May 22 20:46 1249

Doing a little math, we see that 49152

/

6

=

8192 and 245760

/

30

=

8192!

Each

disk

page is
8192 (8K) bytes long.

Y
ou can use the
pg_class.relpages

column to determine the size of your table or index.
For example, let’s determine the table size of
pg_catalog.pg_description
:

SQL Query

SELECT

relfilenode, relname, reltuples, relpages

FROM

pg_class

WHERE

oid =
'pg_cat
alog.pg_description'::regclass;

Sample Output


relfilenode | relname | reltuples | relpages

-------------
+
----------------
+
-----------
+
----------


2609 | pg_description | 1988 | 16

(1 row)

Now that we know
pg_description

spans
16 disk pages and that each page is 8192 bytes
long, we conclude that the table should be 16*8192 =

131072

bytes.

Let’s find out.

Command line

cd $PGDATA/base
/17230

ls


l 2609

Sample Output

-
rw
-------

1 postgres users 131072 May 22 20:46 2609

Postgr
es provides a shortcut for this that makes things even easier:

SQL Query

SELECT

pg_relation_size('pg_catalog.pg_description');

Sample Output


pg_relation_size

------------------


131072

(1 row)



PostgreSQL Basics


Page
5

Used with the
pg_tables

system view, you can
quickly retrieve the size of all your tables in
the entire database:

SQL Query

SELECT

schemaname, tablename,


pg_relation_size(schemaname || '.' || tablename)
AS

size

FROM

pg_tables

ORDER

BY

schemaname, tablename;

Sample Output


schemaname |

tablename | size

--------------------
+
-------------------------
+
--------


information_schema | sql_features | 40960


information_schema | sql_implementation_info | 8192


information_schema | sql_languages | 8192


i
nformation_schema | sql_packages | 8192


information_schema | sql_parts | 8192


information_schema | sql_sizing | 8192


information_schema | sql_sizing_profiles | 0


pg_catalog | pg_aggregate

| 8192


pg_catalog | pg_am | 8192


pg_catalog | pg_amop | 24576


pg_catalog | pg_amproc | 16384


pg_catalog | pg_attrdef | 0


pg_catalog

| pg_attribute | 245760


... | ...

Or retrieve the size of all the databases in a cluster:

SQL Query

SELECT

datname
,
pg_database_size
(
datname
)
AS

size

FROM

pg_
database

ORDER BY
datname;

Sample Output


datname | size


------------------
+
---------


postgis | 7397380


postgis_template | 7372804


postgres | 4324580


template0 | 4243460


template1 | 4324580

(5 rows)


1.3

ANALYZING TABLES

Before we continue, let’s create a sample table to work

with.

SQL Query

CREATE

TABLE

sample AS


SELECT

generate_series(1, 1000000)::integer
AS

id,




round(random()*1000)::double precision
AS

data;


PostgreSQL Basics


Page
6

This query creates a simple table with
and id attribute as an integer and a random data attribute
as a double spanning 1,000,000 tuples. I.E.

SQL Query

SELECT

*
FROM

sample
LIMIT

5
;

Sample Output


id | data

----
+
------


1 | 379


2 | 985


3 | 960


4 | 528


5 | 725

(5 rows)

Practicing our table
-
sizing skills, let’s compute the size of our

new token table.

SQL Query

SELECT

pg_relation_size('sample');

Sample Output


pg_relation_size

------------------


40157184

(1 row)

This shows that our table is 40MB. Let’s verify this
through
pg_class
.

SQL Query

SELECT

relfilenode, relname,

reltuples, relpages

FROM

pg_class

WHERE

oid = 'public.sample'::regclass;

Sample Output

relfilenode | relname | reltuples | relpages

-------------
+
---------
+
-----------
+
----------


17231 | sample |

0 | 0

(1 row)

Well, this is interestin
g. Why doesn’t
pg_class

have a value for the number of tuples and
pages? This is because we forgot to
ANALYZE

our new table. According to the Postgres
documentation, “
ANALYZE

collects statistics about the
contents of tables in the database”.


PostgreSQL Basics


Page
7

Let’s try this again.

SQL Query

ANALYZE
public.sample;


SELECT

relfilenode, relname, reltuples, relpages

FROM

pg_class

WHERE

oid = 'public.sample'::regclass;

Sample Output

relfilenode | relname | reltuples | relpages

-------------
+
---------
+
-----------
+
----------


17231 | sample | 999995 | 4902

(1 row)

That’s better. Now we know that our sample table consumes 4902 pages, which is equivalent to
40157184 byte
s, as noted earlier. As mentioned earlier, the reltuples column contains an
estimate of the number of tuples

in a table which is used by the query planner. This explains
why the value is 999995 and not 1000000 as expected.


1.4

VACUUMING

TABLES

Two great sys
tem views are
pg_stat_all_tables

and
pg_sta
t
io_all_tables
.
These views reveal statistics collected by the system. The former view show
s information like
the number of sequential scans while the later view shows information like the number of pages
read f
rom disk or cache. For this tutorial, we’ll create a simplified view for our own uses.

SQL Query

CREATE

VIEW

statview
AS

SELECT
a.relname, a.seq_tup_read, b.heap_blks_read, b.heap_blks_hit

FROM
pg_stat_all_tables a, pg_statio_all_tables b

WHERE
a.relid =

b.relid;

This view shows the number of sequential tuples read, the total number of heap blocks read, and
the number of those heap blocks that were already in memory.


PostgreSQL Basics


Page
8

Let’s give it a try:

(To ensure we aren’t referencing cached pages, we’ll restart the d
atabase
cluster. Also, the
EXPLAIN ANALYZE

is simply used to force a sequential scan through the
table so we can view the statistic results).

SQL Query

SELECT
pg_stat_reset();

Command line

/etc/init.d/pgsql

restart

SQL Query

EXPLAIN ANALYZE

SELECT
*
FROM
sample;


SELECT

*
FROM

statview
WHERE

relname = 'sample';

Sample Output


relname | seq_tup_read | heap_blks_read | heap_blks_hit

---------
+
--------------
+
----------------
+
---------------


sample | 1000000 | 4902 | 0

(1 r
ow)

This makes sense. We already know that our table has 4902 pages and we had to read all of them
from disk in order to perform the
SELECT *

query.

Now let’s try something interesting. Let’s update our token table
using a simple UPDATE
statement. (Don’
t read too much into this query since all we’re trying to do here is to update
every row). Before we do, however, let’s
again reset

the table’s stats.

SQL Query

UPDATE
sample
SET

data = data + 1;

SELECT
pg_stat_reset();

Command line

/etc/init.d/pgsql

re
start

SQL Query

EXPLAIN ANALYZE

SELECT
*
FROM
sample;


SELECT

*
FROM

statview
WHERE

relname = 'sample';

Sample Output


relname | seq_tup_read | heap_blks_read | heap_blks_hit

---------
+
--------------
+
----------------
+
---------------


sample | 10
00000 | 9804 | 0

(1 row)

Well, this is interesting. Why is did our table scan suddenly have to load 9804 pages from disk
when last time it only had to load 4902?


PostgreSQL Basics


Page
9

Let’s look again at pg_class:

SQL Query

SELECT

relfilenode, relnam
e, reltuples, relpages


FROM

pg_class


WHERE

relname = 'sample';

Sample Output


relfilenode | relname | reltuples | relpages

-------------
+
---------
+
-----------
+
----------


17231 | sample | 1e+06 | 4902

(1 row)

No help there: the cat
alogue still thinks the table has 1000000
rows

over 4902 pages.

Did the number of rows change?

SQL Query

SELECT

count(*)
FROM

sample;

Sample Output


count

---------


1000000

(1 row)

No clue there either. Did physical file size change?

Command lin
e

ls
-
l 17231

Sample Output

-
rw
-------

1 postgres users 80314368 May 22 23:55 17231

There’s something. Why is the table suddenly twice the size it was earlier?

When you
UPDATE

a row, PostgreSQL performs the following operations:

1.

The new row values ar
e written to the table.

2.

The old row is deleted from the table.

3.

The deleted row
remains

in the table, but is no longer accessible.


Now we know the difference between a tuple and a row. A tuple is some version of a row. Our
token table now has 1000000 row
s and 2000000 tuples


half of which are dead.


The reason postgres does things this way has to due with it’s transactional integrity model. If a
client were to begin a transaction, perform an update as we have done, and then cancel or
rollback the transa
ction, the database must be able to rollback to a consistent state for other
concurrent users.


PostgreSQL Basics


Page
10

But then why does
pg_class

not show 9804 pages? We again forgot to
ANALYZE

our table.

SQL Query

ANALYZE

sample;


SELECT

relfilenode, relname, reltuples, relpa
ges


FROM

pg_class


WHERE

relname = 'sample';

Sample Output


relfilenode | relname | reltuples | relpages

-------------
+
---------
+
-------------
+
----------


17231 | sample | 1.00068e+06 | 9804

(1 row)


Now this is consistent. Whenever
you create
or modify

a table, don’t forget to
ANALYZE

your
table, otherwise the query planner won’t know how large your table is and will more than likely
chose the incorrect query plan.


So, now we know why the table is larger after performing the
UPDATE
,

but how do we reduce
it’s size again? The answer here is
VACUUM
.


VACU
UM

by itself (or followed by a tablename) will reclaim dead tuples in a table


that is, make
them useable as writable space. It will not reduce the size of the table.


VACUUM FULL

will both reclaim dead space and tries further to compact the table to the
minimum number of disk blocks


making our table again 40MB instead of 80.


1.5

CLUSTERING TABLES

Let’s now focus our attention to the way tuples are ordered in a table. We’ll start by

making sure
our
sample

table is restored to
its

original size and add a
index on the id column by declaring it
a primary key.

SQL Query

VACUUM FULL
sample;


ALTER TABLE
sample

ADD

PRIMARY

KEY

(id);


ANALYZE
sample;


PostgreSQL Basics


Page
11


SQL Query

\
d
sample

Sample Output


Table "public.sample"


Column | Type | Modifiers

--------
+
------------------
+
-----------


id | integer | not null


data | double precision |

Indexes:


"sample_pkey" PRIMARY KEY, btree (id)

Let’s further create a s
econd sample table, one where the tuples are ordered randomly (to
simulate the random order rows are often inserted into a table).

SQL Query

CREATE TABLE
sample_random

AS


SELECT
*

FROM
sample

ORDER BY
random();


ALTER TABLE
sample_random

ADD PRIMARY KE
Y
(id);


ANALYZE
sample_random;



Suppose our daily SQL queries
frequently

ORDER

results, the following query has the some
interesting consequences

(after a database restart).

SQL Query

SELECT
pg_stat_reset();


EXPLAIN ANALYZE SELECT

*

FROM
sample_random

ORDER BY
id;


SELECT
*

FROM
statview

WHERE
relname = 'sample_random';

Sample Output


relname | seq_tup_read | heap_blks_read | heap_blks_hit

---------------
+
--------------
+
----------------
+
---------------


sample_random | 0 | 1
72504 | 827279

(1 row)

Wow, that was expensive! We read 172504 pages from disk and 827279 pages from memory.
But our table is only 4902 pages long!


PostgreSQL Basics


Page
12

Let’s perform the same query on our ordered table:

SQL Query

SELECT
pg_stat_reset();


EXPLAIN AN
ALYZE SELECT

*

FROM
sample

ORDER BY
id;


SELECT
*

FROM
statview

WHERE
relname = 'sample';

Sample Output

relname | seq_tup_read | heap_blks_read | heap_blks_hit

---------
+
--------------
+
----------------
+
---------------


sample | 0 |

4902 | 19608

(1 row)

This looks better. Our table is 4902 pages long and we had to load them all from disk in order to
fulfill the query. Further, we had to read 19608 pages from memory to fulfill the ORDER BY
clause.

The difference between the

two tables is simply their ordering of tuples physically on disk.
Consider the following diagram showing what each page on disk might look like.

Page 1

id

data

5007



897123



45



132465




Page 2

id

data

9



504478



156



12




Page 3

id

d
ata

789456



7456



16497



812456






PostgreSQL Basics


Page
13

When performing an index scan and sorting the results, postgr
es has to read all the index pages
from disk and then for every index
-
lookup, find the table row in the appropriate table disk page.

This is how disk

access might look when querying the
sample_random

table.

Index scan

Page 1

id

9

12

45

156



id

data

5007



897123



45



132465




Page 2

id

5007

7456

16497

50478



id

data

9



504478



156



12




Page 3

id

132465

78
9456

812456

897123



id

data

789456



7456



16497



812456






PostgreSQL Basics


Page
14

Now consider the same index scan on our ordered
sample

table

where the tuples are already
ordered physically within the disk page
.

Index scan

Page 1

id

9

12

45

156



id

da
ta

9



12



45



156




Page 2

id

5007

7456

16497

50478



id

data

5007



7456



16497



50478




Page 3

id

132465

789456

812456

897123



id

data

132465



789456



812456



897123





Upon investigation, if we regularly

use
ORDER

in our SQL queries, it’s definitely faster to order
the tuples in our tables of interest. This can be accomplished by doing as we’ve done here
(
CREATE TABLE … AS SELECT * FROM …
).

Alternatively, you can use the postgres
CLUSTER

command.

SQL Q
uery

CLUSTER

sample_random_pkey

ON

sample_random;