Introduction to PostgreSQL Database for Miami Ruby Meetup

arizonahoopleData Management

Nov 28, 2012 (4 years and 8 months ago)

465 views

Page 1
Introduction
to
PostgreSQL Database
for
Miami Ruby Meetup
Monday April 20
th
2009
Page 2
Postgres Overview..........................................................................................................................3
Featureful and Standards Compliant...........................................................................................3
Highly Customizable..................................................................................................................4
Installing Postgres from Source (Linux).........................................................................................5
Summary of Steps:......................................................................................................................5
Technical Steps to install............................................................................................................6
Installing Postgres from Source (MAC).........................................................................................7
Installing Postgres on Windows......................................................................................................7
Editing the pg_hba.conf file............................................................................................................7
Tweaking/Tuning Postgres.............................................................................................................8
Installing PGADMIN......................................................................................................................9
About MVCC and the need to vacuum databases........................................................................10
About Vacuuming.........................................................................................................................12
About PL/PGSQL language..........................................................................................................13
Installing PL/RUBY......................................................................................................................15
Explain Analyse is your friend.....................................................................................................17
About Contributions.....................................................................................................................17
About Postgres TSEARCH2.........................................................................................................19
XML Datatype Support.................................................................................................................21
About PGBENCH.........................................................................................................................22
About Postgres Replication...........................................................................................................22
About Large Objects.....................................................................................................................22
About Partitioning.........................................................................................................................23
About Upgrading Databases.........................................................................................................23
Backing up Postgres databases.....................................................................................................23
Commands in psql.........................................................................................................................25
Who are big dataset users of Postgres...........................................................................................27
Other Resources online:................................................................................................................27
Page 3
Postgres Overview
PostgreSQL is a powerful, open source object-relational database system. It has more than 15
years of active development and a proven architecture that has earned it a strong reputation for
reliability, data integrity, and correctness. It runs on all major operating systems, including
Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows. It is
fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored
procedures (in multiple languages). It includes most SQL92 and SQL99 data types, including
INTEGER, NUMERIC, BOOLEAN, CHAR, VARCHAR, DATE, INTERVAL, and
TIMESTAMP. It also supports storage of binary large objects, including pictures, sounds, or
video. It has native programming interfaces for C/C++, Java, .Net, Perl, Python, Ruby, Tcl,
ODBC, among others, and exceptional documentation
.
An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version
Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication,
nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer,
and write ahead logging for fault tolerance. It supports international character sets, multibyte
character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting.
It is highly scalable both in the sheer quantity of data it can manage and in the number of
concurrent users it can accommodate. There are active PostgreSQL systems in production
environments that manage in excess of 4 terabytes of data
. Some general PostgreSQL limits are
included in the table below.
Limit Value
Maximum Database Size Unlimited
Maximum Table Size 32 TB
Maximum Row Size 1.6 TB
Maximum Field Size 1 GB
Maximum Rows per Table Unlimited
Maximum Columns per Table 250 - 1600 depending on column types
Maximum Indexes per Table Unlimited
PostgreSQL has won praise from its users
and industry recognition
, including the Linux New
Media Award for Best Database System and five time winner of the The Linux Journal Editors'
Choice Award for best DBMS.
Featureful and Standards Compliant
PostgreSQL prides itself in standards compliance. Its SQL implementation strongly conforms to
the ANSI-SQL 92/99 standards
. It has full support for subqueries (including subselects in the
FROM clause), read-committed and serializable transaction isolation levels. And while
PostgreSQL has a fully relational system catalog which itself supports multiple schemas per
database, its catalog is also accessible through the Information Schema as defined in the SQL
standard.
Page 4
Data integrity features include (compound) primary keys, foreign keys with restricting and
cascading updates/deletes, check constraints, unique constraints, and not null constraints.
It also has a host of extensions and advanced features. Among the conveniences are auto-
increment columns through sequences, and
LIMIT/OFFSET
allowing the return of partial result
sets. PostgreSQL supports compound, unique, partial, and functional indexes which can use any
of its B-tree, R-tree, hash, or GiST storage methods.
GiST
(Generalized Search Tree) indexing is an advanced system which brings together a wide
array of different sorting and searching algorithms including B-tree, B+-tree, R-tree, partial sum
trees, ranked B+-trees and many others. It also provides an interface which allows both the
creation of custom data types as well as extensible query methods with which to search them.
Thus, GiST offers the flexibility to specify what you store, how you store it, and the ability to
define new ways to search through it --- ways that far exceed those offered by standard B-tree, R-
tree and other generalized search algorithms.
GiST serves as a foundation for many public projects that use PostgreSQL such as OpenFTS
and
PostGIS
. OpenFTS (Open Source Full Text Search engine) provides online indexing of data and
relevance ranking for database searching. PostGIS is a project which adds support for geographic
objects in PostgreSQL, allowing it to be used as a spatial database for geographic information
systems (GIS), much like ESRI's SDE or Oracle's Spatial extension.
Other advanced features include table inheritance, a rules systems, and database events. Table
inheritance puts an object oriented slant on table creation, allowing database designers to derive
new tables from other tables, treating them as base classes. Even better, PostgreSQL supports
both single and multiple inheritance in this manner.
The rules system, also called the query rewrite system, allows the database designer to create
rules which identify specific operations for a given table or view, and dynamically transform
them into alternate operations when they are processed.
The events system is an interprocess communication system in which messages and events can
be transmitted between clients using the
LISTEN
and
NOTIFY
commands, allowing both simple
peer to peer communication and advanced coordination on database events. Since notifications
can be issued from triggers and stored procedures, PostgreSQL clients can monitor database
events such as table updates, inserts, or deletes as they happen.
Highly Customizable
PostgreSQL runs stored procedures in more than a dozen programming languages, including
Java, Perl, Python, Ruby, Tcl, C/C++, and its own PL/pgSQL, which is similar to Oracle's
PL/SQL. Included with its standard function library are hundreds of built-in functions that range
from basic math and string operations to cryptography and Oracle compatibility. Triggers and
stored procedures can be written in C and loaded into the database as a library, allowing great
flexibility in extending its capabilities. Similarly, PostgreSQL includes a framework that allows
developers to define and create their own custom data types along with supporting functions and
Page 5
operators that define their behavior. As a result, a host of advanced data types have been created
that range from geometric and spatial primitives to network addresses to even ISBN/ISSN
(International Standard Book Number/International Standard Serial Number) data types, all of
which can be optionally added to the system.
Just as there are many procedure languages supported by PostgreSQL, there are also many
library interfaces as well, allowing various languages both compiled and interpreted to interface
with PostgreSQL. There are interfaces for Java (JDBC), ODBC, Perl, Python, Ruby, C, C++,
PHP, Lisp, Scheme, and Qt just to name a few.
Best of all, PostgreSQL's source code is available under the most liberal open source license: the
BSD license
. This license gives you the freedom to use, modify and distribute PostgreSQL
in any form you like, open or closed source. Any modifications, enhancements, or changes you
make are yours to do with as you please. As such, PostgreSQL is not only a powerful database
system capable of running the enterprise, it is a development platform upon which to develop in-
house, web, or commercial software products that require a capable RDBMS.


Installing Postgres from Source (Linux)
Installing postgres from source is a quick and easy way to get the system configured the way you
like it. My most common frustration with Linux has been with the PostgreSQL version that
comes in the Linux version you are running. So anymore I just never install the Postgres
database from the Distro Repositories. Also, I have never installed Postgres on a distro that is
running SELINUX.

Another frustration with Linux is the way different distros have to get application source and the
installers. You need to find the right instructions for your distro. I will focus on Centos 5.X
which is similar to Fedora and Red Hat. These instructions will need to be modified for
OpenSuse 11, Ubuntu, Debian, Solaris, BSD, AIX etc.

Installing from source will place the postgres files in /usr/local/pgsql. (unless you tell it
otherwise). There are several components that we will want to install on the server/workstation.
In this demo we will be installing PostgeSQL version 8.3.7 from
http://www.postgresql.org/ftp/source/v8.3.7/


1. Postgres Server
2. Psql – interactive command line tool for managing database
3. PG Admin – GUI Database management tool

Summary of Steps:
1. Create a user and group to run the postgres process
2. Get the Source packages
Page 6
3. Compile and install
4. Edit your profile environment variables
5. Initialize the database
6. Edit the pg_hba.conf file (host based access)
7. Edit the postgresql.conf file
8. Modify the startup scripts

Technical Steps to install
(assumes CentOS with GCC installed)

1. Download the current Source from
1
http://www.postgresql.org/ftp/source/
to a directory I will be
using a directory named “/home/tpackert/downloads” (Note:
we are installing version 8.3.7)
2. cd ~/downloads
3. tar xvzf postgresql-8.3.7.tar.gz
2

4. cd postgresql-8.3.7
5. ./configure
--with-libxml –-with-openssl -–with-libxslt

6. make
7. su -
3

8. make install
9. vi /etc/profile and add the 4 lines below
a. PATH=$PATH:/usr/local/pgsql/bin
b. export PATH
c. PGDATA=/usr/local/pgsql/data
d. export PGDATA
10. Logoff the system and log back in to make the profile
changes active
11. /usr/sbin/groupadd –r –g333 postgres
12. /usr/sbin/adduser –r –g333 –u333 postgres
13. mkdir /usr/local/pgsql/data
14. chown postgres:postgres /usr/local/pgsql/data
15. su - postgres
16. /usr/local/pgsql/bin/initdb –endcoding=UTF8 -D
/usr/local/pgsql/data
17. /usr/local/pgsql/bin/postmaster -D
/usr/local/pgsql/data >logfile 2>&1 &
18. /usr/local/pgsql/bin/createdb test
19.
/usr/local/pgsql/bin/psql test

20.
exit
4



1
If you are accessing a remote server through ssh and don’t have a graphical browser and scp seems too
complicated then use the text based browser links with the command “links www.postgressql.org” . The links b
2
If you downloaded the bz2 version instead of the gz version then the tar command is tar xvjf instead of tar xvzf
3
The text in red is to be done while in super user mode the text in green is to be done as user postgres
4
Exit from the postgres user back to super user
Page 7
21.
cd /home/tpackert/downloads/postgres-
8.3.7/contrib./start-scripts

22.
cp linux /etc/init.d/postgresql

23.
chmod a+x /etc/init.d/postgresql

24.
/sbin/chkconfig –add postgresql

25.
restart your server to see if the postgres server
started automatically

26.


Notes:

Installing Postgres from Source (MAC)
There is an excellent Blog by Robbie Russell on installing Postgres on a Mac using Macports so
I won’t revisit how to do it here. Just Google “Installing Postgres MAC Robbie Russell” or go
here
http://www.robbyonrails.com/articles/2008/01/22/installing-ruby-on-rails-and-postgresql-on-os-
x-third-edition



Installing Postgres on Windows
Why bother, but you can use a one click installer from EnterpriseDB or PGInstaller for
Windows. http://www.postgresql.org/download/windows

There is even a youtube video on Installing for Windows
http://www.youtube.com/watch?v=aFZnPw4p4eo



Editing the pg_hba.conf file
The Postgres Host Based Access configuration file is what allows specific IP addresses or ranges
connect to the postgres server. In the example below I am allowing access to the postgres server
from three local development machines 192.168.123.31, .32 and .33. The CIDR Address
(Classless Inter-Domain Routing
) of 192.168.123.31/32 indicates that all 32 bits are used as a
subnet mask the same as 255.255.255.255 which means only the address 192.168.123.31 will
gain access to the machine with this line in the pg_hba.conf file.

Note: you will also need to open port 5432 (tcp only not udp) on your firewall to allow access to
the database. The quick way I test this is to “telnet 192.168.123.31 5432” if the port is blocked
by a firewall it will say “Connecting To 192.168.123.31...Could not open connection to the host,
on port 5432: Connect failed” But if the port is open on the firewall you will be able to connect,
you wont get a response and after 4 characters it will quietly drop out of telnet without any
messages.

Example pg_hba.conf file from a development machine
Page 8

# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 192.168.123.31/32 trust
host all all 192.168.123.32/32 trust
host all all 192.168.123.33/32 trust
# IPv6 local connections:
host all all ::1/128 trust


Note, the rules are processed top down so you can create confusion/conflicts by allowing some
addresses through larger subnet masks. I prefer to allow only specific IP addresses using the /32
option. Below it looks like .32 and .33 should not be able to access postgres, but the line above it
grants access to 192.168.123.0/24 which is 192.168.123.1-254


# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 192.168.123.0/24 trust
host all all 192.168.123.32/32 reject
host all all 192.168.123.33/32 reject
# IPv6 local connections:
host all all ::1/128 trust

Tweaking/Tuning Postgres
Out of of the box, Postgres is configured to run on minimal hardware. By minimal hardware I
think they are thinking of Circa 1990 hardware so it really is minimal and needs to be addressed.
Googling “tuning PostgreSQL” gets lots of hits, but the High Priests of Postgres, don’t seem to
speak clearly for newbies to actually do the tweaking.

Note, the postgresql.conf file that is used to tune the parameters changes with each database
release, so you need to reapply tuning changes with each version, you should not copy the old
postgresql.conf file into a new version’s directory. A good editor with diff/comparison features
helps here.

In postgresql.conf there is a parameter to tell postgres what to prefix each log line with. I like to
set it to time, remote IP and database. We have a lot of repote devices and this makes tailing and
greping the logs easier. Changing the log files, line prefix log_line_prefix =
’<%t%%%r%%%d> ’ makes it include the Time Stamp, Remote host IP and the database name
all separated by a percent sign.
On a server with 8GB ram, here what you probably want to change on the postgresql.conf as
compared to the default for Postgres 8.3
Page 9
1. shared_buffers = 2048MB # min 128kB or max_connections*16kB
2. work_mem = 16MB # min 64kB
3. max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes # wal_sync_method =
fdatasync # the default is the first option
4. checkpoint_segments = 100 # in logfile segments, min 1, 16MB each
5. effective_cache_size = 6144MB
Comparing: [ 1 ] to [ 2 ]
[ 1 ] R:\my_server\old_postgresql.conf
[ 2 ] R:\my_server\new_postgresql.conf
----------------------------------
56* #listen_addresses = 'localhost' # what IP address(es) to listen on;
* listen_addresses = '*' # what IP address(es) to listen on;
106* shared_buffers = 24MB # min 128kB or max_connections*16kB
* shared_buffers = 2048MB # min 128kB or max_connections*16kB
113* #work_mem = 1MB # min 64kB
* work_mem = 16MB # min 64kB
119* max_fsm_pages = 153600 # min max_fsm_relations*16, 6 bytes each
* max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each
153* #wal_sync_method = fsync # the default is the first option
* wal_sync_method = fdatasync # the default is the first option
161* #wal_buffers = 64kB # min 32kB
* wal_buffers = 100 # min 32kB
170* #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
* checkpoint_segments = 100 # in logfile segments, min 1, 16MB each
207* #effective_cache_size = 128MB
* effective_cache_size = 6144MB
239* #logging_collector = off # Enable capturing of stderr and csvlog
* logging_collector = on # Enable capturing of stderr and csvlog
246* #log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
* log_filename = 'postgresql-%a.log' # Log file name pattern.
248* #log_truncate_on_rotation = off # If on, any existing log file of the same
* #log_truncate_on_rotation = on # If on, any existing log file of the same
259* #log_rotation_size = 10MB # Automatic rotation of logfiles will
* #log_rotation_size = 0 # Automatic rotation of logfiles will
380* #autovacuum = on # enable autovacuum subprocess?
* autovacuum = on # enable autovacuum subprocess?



Installing
PGADMIN
PgAdmin is an Add on GUI tool
to administer a Postgres
Database. It does almost
everything you can do in psql
except sometimes you just
want to be graphical and lazy
instead, especially when psql is
new to you.

You can install PGADMIN for
Mac, Linux or Windows
from the postgres.org site. Or
just google “PGADMIN
Page 10
DOWNLOAD”

Installing PGAdmin from source on linux is not for the faint of heart. I did it on OpenSuse and it
was not quite straightforward with lots of Windowing component dependencies. The current
version is 1.8.4 but version 1.10.0 beta 2 is out now. Once I loaded all the required dependencies
for PGAdmin, installing was easy.
a. download source
b. tar xvzf pgadmin3-1.10.0-beta2.tar.gz
c. cd pgadmin3-1.10.0-beta2
d. ./configure
e. make
f. sudo make install
g. Add Pgadmin to KDE menus using Kpanel

http://wwwmaster.postgresql.org/download/mirrors-ftp/pgadmin3/release/v1.10.0-
beta1/win32/pgadmin3-1.10.0-beta1.zip



About MVCC and the need to vacuum databases
Quoted from an from Onlamp Article:
5


Anyone who has worked in a large multi-user database environment can relate to the
frustration that "locks" can cause by making users wait. Whether the database system is
using table-level, page-level, column-level, or row-level locking, the same annoying
problem persists: Readers (SELECTs) wait for writers (UPDATEs) to finish, and writers
(UPDATEs) wait for readers (SELECTs) to finish. Wouldn't it be nice to use a database
with a "no-locking" capability?

If you use PostgreSQL, you know that "no-locking" is already a reality. In PostgreSQL,
readers never wait for writers, and writers never wait for readers. Before anyone objects
to the claim that there is "no-locking" in PostgreSQL, let me explain PostgreSQL's
advanced technique called Multi-Version Concurrency Control (MVCC).

While most other database systems use locks to maintain concurrency control and data
consistency, PostgreSQL uses a multi-version model. Think of a version as a data
snapshot at a distinct point in time. When users query a table, the current version of the
data appears. If they run the same query again on the table, a new version appears if any
data has changed. Data changes occur in a database through UPDATE, INSERT, or
DELETE statements.

A simple example of selecting data from one table shows the difference between
traditional row-level locking and PostgreSQL's MVCC.


5
http://www.onlamp.com/pub/a/onlamp/2001/05/25/postgresql_mvcc.html

Page 11

SELECT headlines FROM news_items

This statement reads data from a table called news_items and displays all the rows in the
column called headlines. For data systems that use row-level locking, the SELECT
statement will not succeed and will have to wait if another user is concurrently inserting
(INSERT) or updating (UPDATE) data in the table news items. The transaction that
modifies the data holds a lock on the row(s) and therefore all rows from the table cannot
be displayed. Users who have encountered frequent locks when trying to read data know
the frustration this locking scheme can cause, forcing users to wait until the lock releases.

In PostgreSQL, however, users can always view the news_items table. There is no need
to wait for a lock to be released, even if multiple users are inserting and updating data in
the table. When a user issues the SELECT query, PostgreSQL displays a snapshot, or
version, of all the data that was committed before the query began. Any data updates or
inserts that are part of open transactions or were committed after the query began will not
be displayed. Doesn't that make complete sense?

Here is an example you can follow along with, open two terminal sessions and start psql on the
same database on both sessions. Session 1 is in green and session 2 is in blue.

Postgres has several hidden columns on all tables and records that allow it to know which tuple
is current for your transaction.
myapp_dev=# select oid, id, code_value, xmin, xmax from my_sample_codes;

oid | id | code_value | xmin | xmax
-------+----+------------+------+------
67701 | 1 | A | 1756 | 0
67702 | 2 | M | 1756 | 0
67703 | 3 | S | 1756 | 0
67704 | 4 | D | 1756 | 0
67705 | 5 | W | 1756 | 0
67706 | 6 | L | 1756 | 0
67707 | 7 | C | 1756 | 0
67708 | 8 | I | 1756 | 0
67709 | 9 | P | 1756 | 0
67710 | 10 | U | 1756 | 0
(10 rows)

myapp_dev=# begin;

myapp_dev-# update my_sample_codes set code_value = 'a' where id = '1';

myapp_dev=# commit;
COMMIT
myapp_dev=# select oid, id, code_value, xmin, xmax from my_sample_codes;
oid | id | code_value | xmin | xmax
-------+----+------------+------+------
67702 | 2 | M | 1756 | 0
67703 | 3 | S | 1756 | 0
67704 | 4 | D | 1756 | 0
67705 | 5 | W | 1756 | 0
67706 | 6 | L | 1756 | 0
67707 | 7 | C | 1756 | 0
67708 | 8 | I | 1756 | 0
67709 | 9 | P | 1756 | 0
67710 | 10 | U | 1756 | 0
67701 | 1 | a | 1850 | 0 <- notice this Xmin value changed
(10 rows)
Page 12

myapp_dev=# begin;

myapp_dev-# update my_sample_codes set code_value = 'a' where id = '1';


myapp_dev=# select oid, id, code_value, xmin, xmax from my_sample_codes;
oid | id | code_value | xmin | xmax
-------+----+------------+------+------
67702 | 2 | M | 1756 | 0
67703 | 3 | S | 1756 | 0
67704 | 4 | D | 1756 | 0
67705 | 5 | W | 1756 | 0
67706 | 6 | L | 1756 | 0
67707 | 7 | C | 1756 | 0
67708 | 8 | I | 1756 | 0
67709 | 9 | P | 1756 | 0
67710 | 10 | U | 1756 | 0
67701 | 1 | a | 1850 | 1851  notice while the transaction is uncommitted
(10 rows)

myapp_dev=# commit;
COMMIT

myapp_dev=# select oid, id, code_value, xmin, xmax from my_sample_codes;
oid | id | code_value | xmin | xmax
-------+----+------------+------+------
67702 | 2 | M | 1756 | 0
67703 | 3 | S | 1756 | 0
67704 | 4 | D | 1756 | 0
67705 | 5 | W | 1756 | 0
67706 | 6 | L | 1756 | 0
67707 | 7 | C | 1756 | 0
67708 | 8 | I | 1756 | 0
67709 | 9 | P | 1756 | 0
67710 | 10 | U | 1756 | 0
67701 | 1 | A | 1851 | 0
(10 rows)

About Vacuuming
After we did all that work on updating the my_sample_codes table with changes there are
actually the other tuples with older XMIN values out in the database taking up space. They are
perfectly valid rows in a database except postgres won’t let you get to them. These are called
‘dead rows’. Postgres always writes a new record and leaves the old one right where it was.

At Visible Assets we have small Asset tables that are constantly updated by the RuBee readers to
indicate they have been seen or not seen by the reader. So a table with 500 assets that is updated
6 times per hour 24 hours a day will actually have 72,000 tuples in it at the end of a day even
though a select on the asset table will reveal only 500 assets. This is where vacuuming comes in.
Vacuuming is a necessary task when dealing with MVCC architectures. Postgres allows you to
turn on Auto-Vacuum or you run it manually at your convenience. Notice in my backup script at
the end, vacuuming is an option for the nightly backup process. Only recently did auto-vacuum
come into existence.

myapp_dev=# VACUUM VERBOSE my_sample_codes;
INFO: vacuuming "public.my_sample_codes"
INFO: index "my_sample_codes_pkey" now contains 10 row versions in 2 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "my_sample_codes": found 4 removable, 10 nonremovable row versions in 1 pages

Page 13
DETAIL: 0 dead row versions cannot be removed yet.
There were 4 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

About PL/PGSQL language
One of the most powerful features of PostgreSQL is its support for user-defined functions written
in various programming languages, including pure SQL, C, Perl, Python, Ruby, and PHP.
Perhaps the most common programming language for PostgreSQL functions, however, is
PL/pgSQL (don't ask me to pronounce it), because it comes with PostgreSQL and is easy to set
up.

PL/pgSQL is a procedural language similar to Oracle's PL/SQL. It's much more powerful than
pure SQL in that it supports variables, conditional expressions, looping constructs, exceptions,
and the like. Because it natively supports all of PostgreSQL's SQL syntax, you can consider it a
superset of PostgreSQL SQL. It also respects all data types and their associated functions and
operators, and is completely safe for use inside of the server. Get all of the details in the
PL/pgSQL documentation.


http://www.postgresonline.com/journal/index.php?/archives/58-Quick-Guide-to-writing-
PLPGSQL-Functions-Part-1.html


http://www.postgresonline.com/journal/index.php?/archives/58-Quick-Guide-to-writing-
PLPGSQL-Functions-Part-2.html


http://www.postgresonline.com/journal/index.php?/archives/58-Quick-Guide-to-writing-
PLPGSQL-Functions-Part-3.html


Here is an example of a simple trigger written in PL/PGSQL that is execute whenever an item in
our asset table is updated. This trigger is defined as an after update trigger on the asset table.
This trigger creates rows in the events table for each change in the asset’s location.



-- Function: update_event()
-- DROP FUNCTION update_event();
CREATE OR REPLACE FUNCTION update_event()
RETURNS trigger AS
$BODY$ begin
if (NEW.shelf_id != OLD.shelf_id
or (NEW.shelf_id is null and OLD.shelf_id is not null)
or (NEW.shelf_id is not null and OLD.shelf_id is null)) then
insert into event( tracking_id, from_shelf, to_shelf, action
) values
( NEW.tracking_id, OLD.shelf_id, NEW.shelf_id,
case
when OLD.shelf_id is null and NEW.shelf_id is
not null then 1
when OLD.shelf_id is not null and NEW.shelf_id
is null then 2
else 0
end
);
Page 14
elseif (OLD.state_id='11' and NEW.state_id='12' ) then
if (NEW.missing_from_shelf_id is null and OLD.shelf_id is not
null) then
NEW.missing_from_shelf_id = OLD.shelf_id;
NEW.Shelf_id = null;
NEW.updated = NOW();
insert into event( tracking_id, from_shelf, to_shelf,
action) values (OLD.tracking_id, NEW.missing_from_shelf_id, NEW.shelf_id
,2);
end if;
insert into event( tracking_id, to_procedure, action) values
(OLD.tracking_id, NEW.medprocedure_id,7);
elseif (OLD.state_id='11' and NEW.state_id='16') then
if (NEW.missing_from_shelf_id is null and OLD.shelf_id is not
null) then
NEW.missing_from_shelf_id = OLD.shelf_id;
NEW.Shelf_id = null;
NEW.updated = NOW();
insert into event( tracking_id, from_shelf, to_shelf,
action) values (OLD.tracking_id, NEW.missing_from_shelf_id, NEW.shelf_id
,2);
end if;
insert into event( tracking_id, to_procedure, action) values
(OLD.tracking_id, NEW.medprocedure_id,8);
elseif (OLD.state_id='11' and NEW.state_id='17') then
if (NEW.missing_from_shelf_id is null and OLD.shelf_id is not
null) then
NEW.missing_from_shelf_id = OLD.shelf_id;
NEW.Shelf_id = null;
NEW.updated = NOW();
insert into event( tracking_id, from_shelf, to_shelf,
action) values (OLD.tracking_id, NEW.missing_from_shelf_id, NEW.shelf_id
,2);
end if;
insert into event( tracking_id, to_procedure, action) values
(OLD.tracking_id, NEW.medprocedure_id,12);
end if;
return NEW;
end;$BODY$
LANGUAGE 'plpgsql' VOLATILE
COST 100;
ALTER FUNCTION update_event() OWNER TO helloshelf;

Here is another pl/pgsql trigger function that updates some statistics in the shelf_stats table
whenever a shelf starts a read cycle or stops a read cycle. When these events happen, the counts
of items on the shelf and off the shelf are calculated and stored in a statistics record for quick
access by the applications. Placing business logic like this in a trigger function may cause a
religious war on where it should be kept. However, creating triggers like this are very efficient
and happen locally in the database. In our case we don’t directly control the readers that update
the database directly, so using triggers like this is a way to add functionality to an application and
it is very network friendly. One short command fired into the database from a remote device can
initiate a lot of table inserts and updates by the use of triggers.

-- Function: update_asset_counts()

-- DROP FUNCTION update_asset_counts();

CREATE OR REPLACE FUNCTION update_asset_counts()
RETURNS "trigger" AS
$BODY$declare
asset_count integer;
missing_count integer;
last_event_dt timestamp;

begin
SELECT count(*) into asset_count from asset
Page 15
where new.shelf_id = asset.shelf_id
AND state_id = 11;

SELECT count(*) into missing_count from asset
where asset.shelf_id is NULL
AND state_id = 11;

SELECT max(created_at) into last_event_dt from shelfevents
where shelfevents.shelf_id = new.shelf_id;

NEW.numsearch := asset_count;
NEW.nummissing := missing_count;
NEW.last_event_at := last_event_dt;
return NEW;

end;$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION update_asset_counts() OWNER TO visible;
Installing PL/RUBY
Robbie Russell has a good article on installing PL/RUBY from source here
http://www.robbyonrails.com/articles/tag/plruby
.The author Guy Decoux (something of an
internet legend) who wrote the PLRUBY module for postgres unfortunately passed away in
September 2008 and PLRUBY apparently is not being maintained anymore. When you
download the source you will find lots of information in the text files plruby.html, plruby.rd and
README.en.

Normally, PL/Ruby is installed as a "trusted" programming language named
plruby
. In this
setup, certain Ruby operations are disabled to preserve security. In general, the operations that
are restricted are those that interact with the environment. This includes file handle operations,
require
, and
use
(for external modules). There is no way to access internals of the database
server process or to gain OS-level access with the permissions of the server process, as a C
function can do. Thus, any unprivileged database user can be permitted to use this language.
Sometimes it is desirable to write Ruby functions that are not restricted. For example, one might
want a Ruby function that sends mail. To handle these cases, PL/Ruby can also be installed as an
"untrusted" language (usually called PL/RubyU). In this case the full Ruby language is available.
If the
createlang
program is used to install the language, the language name
plrubyu
will
select the untrusted PL/Ruby variant.
The writer of a PL/RubyU function must take care that the function cannot be used to do
anything unwanted, since it will be able to do anything that could be done by a user logged in as
the database administrator. Note that the database system allows only database superusers to
create functions in untrusted languages.
Note: to install both the trusted and untrusted versions I had to delete the plruby-0.5.3 directory
after configuring the first untrusted version so that the fresh versions would be built. Having
built the trusted version first, I continually received the error “ERROR: Insecure operation -
require”
Download
1. cd downloads/
2. wget ftp://moulon.inra.fr/pub/ruby/plruby.tar.gz
Page 16
3. tar zxvf plruby.tar.gz
4. cd plruby-0.5.3/

Install UnTrusted Version
5. sudo ruby extconf.rb --with-pgsql-dir=/usr/local/pgsql --with-safe-
level=0 --with-suffix=u
6. make
7. sudo make install

Install Trusted Version
8. cd ..
9. rm –rf plruby-0.5.3
10. tar xvzf plruby-0.5.3
11. cd plruby-0.5.3
12. sudo ruby extconf.rb --with-pgsql-dir=/usr/local/pgsql --with-
safe-level=12
13. make
14. sudo make install


psql –U postgres template1

CREATE DATABASE plruby;

\c plruby

create function plrubyu_call_handler() returns language_handler as
'/usr/lib/ruby/site_ruby/1.8/i586-linux/plrubyu.so' language 'C';

create function plruby_call_handler() returns language_handler as
'/usr/lib/ruby/site_ruby/1.8/i586-linux/plruby.so' language 'C';

create language 'plrubyu' handler plrubyu_call_handler lancompiler
'PL/Ruby';

create language 'plruby' handler plruby_call_handler lancompiler 'PL/Ruby';

CREATE FUNCTION ruby_max(int4, int4) RETURNS int4 AS '
if args[0].to_i > args[1].to_i
return args[0]
else
return args[1]
end
' LANGUAGE 'plruby';

plruby=# select ruby_max(10,11);
ruby_max
----------
11
(1 row)

CREATE FUNCTION redcloth(text) RETURNS text AS '
require ''rubygems''
require ''redcloth''
content = args[0]
rc = RedCloth.new(content)
return rc.to_html
Page 17
' LANGUAGE 'plrubyu';

plruby=# SELECT redcloth('*strong text* and _emphasized text_');
redcloth
------------------------------------------------------------------
<p><strong>strong text</strong> and <em>emphasized text</em></p>
(1 row)

Explain Analyse is your friend
Postgres allows you to view the execution plan for any SQL query. We tend to use this my
cutting and pasting queries from the logs and use explain analyze to determine how they are
running and whether we should optimize the query.

I created a new database called Miami_ruby_dev and loaded 27,607 first names. I ran the simple
select statement with name1 (first character of name) = ‘a’, it ran in 11.145 milliseconds. Then I
added an index on name1 and reran the select command, it runs in 2.951 milliseconds. When the
select commands are complex, the explain analyse outputs each step of the execution plan of the
query and the relative weight of the step.

miami_ruby_dev=> explain analyse select * from names where name1 = 'a';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on names (cost=0.00..616.09 rows=1472 width=44) (actual time=0.039..10.389 rows=1534
loops=1)
Filter: ((name1)::text = 'a'::text)
Total runtime: 11.145 ms
(3 rows)

CREATE INDEX name1_idx ON "names "USING btree (name1);

miami_ruby_dev=> explain analyse select * from names where name1 = 'a';
QUERY PLAN
-------------------------------------------------------------------------------------------------
Index Scan using name1_idx on names (cost=0.00..60.01 rows=1472 width=44) (actual
time=0.540..2.125 rows=1534 loops=1)
Index Cond: ((name1)::text = 'a'::text)
Total runtime: 2.951 ms
(3 rows)



About Contributions
The Contrib directory includes extra components that are not part of the core product but are
useful to add functionality to your Postgres server.

adminpack

adminpack provides a number of support functions which pgAdmin and other administration and
management tools can use to provide additional functionality, such as remote management of server log
files.
btree_gist

btree_gist provides sample GiST operator classes that implement B-Tree equivalent behavior for the
data types int2, int4, int8, float4, float8, numeric, timestamp with time zone,
timestamp without time zone, time with time zone, time without time zone,
date, interval, oid, money, char, varchar, text, bytea, bit, varbit, macaddr, inet,
and cidr.
chkpass

This module implements a data type chkpass that is designed for storing encrypted passwords. Each
password is automatically converted to encrypted form upon entry, and is always stored encrypted. To
Page 18
compare, simply compare against a clear text password and the comparison function will encrypt it before
comparing.
cube


dblink

dblink is a module which supports connections to other PostgreSQL databases from within a database
session.
dict_int

dict_int is an example of an add-on dictionary template for full-text search. The motivation for this
example dictionary is to control the indexing of integers (signed and unsigned), allowing such numbers to
be indexed while preventing excessive growth in the number of unique words, which greatly affects the
performance of searching.
dict_xsyn

dict_xsyn (Extended Synonym Dictionary) is an example of an add-on dictionary template for full-
text search. This dictionary type replaces words with groups of their synonyms, and so makes it possible
to search for a word using any of its synonyms.
earthdistance

The earthdistance module provides two different approaches to calculating great circle distances on
the surface of the Earth
fuzzystrmatch

The fuzzystrmatch module provides several functions to determine similarities and distance between
strings.
hstore

This module implements a data type hstore for storing sets of (key,value) pairs within a single
PostgreSQL data field. This can be useful in various scenarios, such as rows with many attributes that are
rarely examined, or semi-structured data.
intagg

The intagg module provides an integer aggregator and an enumerator.
intarray

The intarray module provides a number of useful functions and operators for manipulating one-
dimensional arrays of integers. There is also support for indexed searches using some of the operators.
isn

The isn module provides data types for the following international product numbering standards:
EAN13, UPC, ISBN (books), ISMN (music), and ISSN (serials). Numbers are validated on input, and
correctly hyphenated on outpu
lo

The lo module provides support for managing Large Objects (also called LOs or BLOBs). This includes
a data type lo and a trigger lo_manage.
ltree

This module implements a data type ltree for representing labels of data stored in a hierarchical tree-
like structure. Extensive facilities for searching through label trees are provided.
oid2name

oid2name is a utility program that helps administrators to examine the file structure used by PostgreSQL.
pageinspect

The pageinspect module provides functions that allow you to inspect the contents of database pages
at a low level, which is useful for debugging purposes. All of these functions may be used only by
superusers.
pgbench

pg_standby supports creation of a "warm standby" database server. It is designed to be a production-ready
program, as well as a customizable template should you require specific modifications.
pg_buffercache

The pg_buffercache module provides a means for examining what's happening in the shared buffer
cache in real time.
pgcrypto

The pgcrypto module provides cryptographic functions for PostgreSQL
pg_freespacemap

The pg_freespacemap module provides a means for examining the free space map (FSM). It
provides two C functions: pg_freespacemap_relations and pg_freespacemap_pages that
each return a set of records, plus two views pg_freespacemap_relations and
pg_freespacemap_pages that wrap the functions for convenient use.
pgrowlocks

The pgrowlocks module provides a function to show row locking information for a specified table.
pg_standby

pg_standby supports creation of a "warm standby" database server. It is designed to be a production-ready
program, as well as a customizable template should you require specific modifications.
pgstattuple

The pgstattuple module provides various functions to obtain tuple-level statistics.
pg_trgm

The pg_trgm module provides functions and operators for determining the similarity of text based on
trigram matching, as well as index operator classes that support fast searching for similar strings.
seg

This module implements a data type seg for representing line segments, or floating point intervals. seg
can represent uncertainty in the interval endpoints, making it especially useful for representing laboratory
measurements.
spi

The contrib/spi module provides several workable examples of using Server Programming Interface
(SPI) and triggers. While these functions are of some value in their own right, they are even more useful
as examples to modify for your own purposes. The functions are general enough to be used with any
table, but you have to specify table and field names (as described below) while creating a trigger.
sslinfo

The sslinfo module provides information about the SSL certificate that the current client provided
when connecting to PostgreSQL. The module is useless (most functions will return NULL) if the current
connection does not use SSL.
tablefunc

The tablefunc module includes various functions that return tables (that is, multiple rows). These
functions are useful both in their own right and as examples of how to write C functions that return
multiple rows.
test_parser

test_parser is an example of a custom parser for full-text search. It doesn't do anything especially
useful, but can serve as a starting point for developing your own parser.
Page 19
tsearch2

The tsearch2 module provides backwards-compatible text search functionality for applications that
used contrib/tsearch2 before text searching was integrated into core PostgreSQL in release 8.3.
uuid-ossp

The uuid-ossp module provides functions to generate universally unique identifiers (UUIDs) using
one of several standard algorithms. There are also functions to produce certain special UUID constants.
vacuumlo


xml2

The xml2 module provides XPath querying and XSLT functionality.

.
About Postgres TSEARCH2
Full-text search (FTS) is invaluable whether you’re storing documentation, hosting a Web site or
managing an enterprise resource planning suite. Tsearch2, a search package based on OpenFTS,
has been available as a contrib module to PostgreSQL prior to 8.3. It required a little extra work
to get it compiled and running, but now it’s fully-integrated and requires no special database
configuration to use.

Full-text search in PostgreSQL supports a data type called tsvector. This data type is a sorted list
of lexemes gathered from a source document. A lexeme roughly corresponds to the root of a
word without its prefixes or suffixes. For example, fluent, fluentness, fluently and fluency are all
forms of the lexeme “fluent.”

Using lexemes instead of the actual words in search data types makes the search functions more
flexible and useful than the more familiar SQL search operators such as ~, ~*, LIKE and ILIKE.
Those SQL operators treat search strings literally, and don’t have the linguistic smarts to know
that ‘fluency’ and ‘fluently’ have the same root word.

To use full-text search, you must first convert your document to the tsvector data type with the
to_tsvector() function. Then, to search the document, you create a search query of type tsquery
with the to_tsquery() function. This example searches for the word ‘fluent’ and uses @@ (the
MATCH operator):

CREATE TABLE response (comments TEXT);

SELECT comments FROM response WHERE to_tsvector('english',
comments) @@ to_tsquery('english', 'fluent');

You’ll notice the first argument in both functions is ‘english’. This specifies which dictionary
PostgreSQL will use to parse and search the document. PostgreSQL allows multiple dictionaries,
which are configured at run-time in postgresql.conf.

You can even create your own, special-purpose dictionaries for doing things like normalizing
similar URLs, or searching in domain-specific languages. If you are looking for non-English
dictionaries, they are available for eight languages at http://tinyurl.com/6lljm. The site also
includes several detailed tutorials, including one for creating your own, specialized dictionaries.

Page 20
Simple queries that use tsvector conversions on text work fine for small amounts of data. As the
number and size of documents grow, you will need to create indexes to speed up searches.
Creating an index is straightforward:

CREATE INDEX comments_idx ON response USING gin(to_tsvector('english', comments));

Generalized Inverted Index (GIN) is a special index type used only with the tsvector data type.
To find more information about this index, you can visit the website.

The comments_idx index we just created will be used on any queries that include
to_tsvector('english', comments).

Creating a separate column to hold the tsvector is another way to speed up full-text search. If you
do this, you need to include triggers for automatically updating the tsvector column when the
related text is updated. You can write your own trigger functions but PostgreSQL conveniently
provides two: tsvector_update_trigger, and tsvector_update_trigger_column.

Assuming your table looks like this:

CREATE TABLE response (comments TEXT, tsv_comments tsvector);

You would define your trigger this way:

CREATE TRIGGER tsvector_update BEFORE INSERT or UPDATE
ON response FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(tsv_comments, 'english', comments);

Another very useful function is ts_headline, which allows you to show fragments of the return
document, but highlighting the search terms. The example below will wrap all instances of
fluency with HTML bold tags:

SELECT ts_headline(comments, to_tsquery('english', 'fluency'))
FROM (SELECT comments FROM response
WHERE to_tsvector('english',comments)
@@ to_tsquery('english', 'fluency'))
AS FOO;

There are many more functions that allow you to do things like rewrite query search terms
(ts_rewrite), gather document statistics (ts_stat), or examin exactly what the query generator uses
for lexemes (parsetree).

There were some minor syntax changes between the contrib module and integration into version
8.3, so if you were already using tsearch2, have a look in the contrib directory for help with
migrating your existing installation.
Page 21
XML Datatype Support
6

The XML data type included in 8.3 supports the ANSI SQL:2003 standard and requires a special
compile-time option for use. It was initially developed through a Google Summer of Code
project in 2006, and is now fully supported with an array of helper functions and supporting
documentation
. XML2 is located in the Contrib directory

http://www.sqlx.org


Here is an excellent overview of what can be done with XML at the database level, the examples
below have been taken from this presentation.
http://www.scribd.com/doc/2402063/SQLXML-For-Postgres-Developers



1. cd ~/downloads/postgresql-8.3.7/contrib/xml2
2. make
3. sudo make install

CREATE TABLE EMP
(
LAST_NAME text,
EMP_ID integer NOT NULL,
FIRST_NAME text,
DEPT_ID integer,
SALARY integer,
CONSTRAINT EMP_pkey PRIMARY KEY (EMP_ID)
)
WITH (OIDS=FALSE);

INSERT INTO EMP(
LAST_NAME, EMP_ID, FIRST_NAME, DEPT_ID, SALARY)
VALUES
('Blow', 1, 'Joe', 1, 10000),
('Head', 2, 'Helmut', 1, 12000),
('Jack', 3, 'Noe', 1, 12000),
('Hard', 4, 'Blow', 2, 20000),
('First', 5, 'Hugo', 2, 21000),
('Spaem',6, 'Kingoof', 2, 20000),
('Ventura', 7, 'Ace', 3, 35000),
('Nickleby', 8, 'Nick', 3, 400000),
('Budd', 9, 'Billy', 4, 99000),
('Cleaver', 10, 'Wally', 4, 100000) ;


SELECT XMLElement(name main, last_name) from emp;

SELECT XMLForest(last_name, first_name) FROM emp;

CREATE TABLE xmltab (col1 XML);



6
Use of this data type requires PostgreSQL to be built with
configure --with-libxml


Page 22
INSERT INTO xmltab (
SELECT XMLElement(name main, XMLConcat(
XMLElement(name lastname, last_name),
XMLElement(name firstname, first_name) ) ) FROM emp);

SELECT * FROM xmltab;

SELECT Xpath('/main/firstname/text()', col1) FROM xmltab;

SELECT textcol[1] FROM (
SELECT xpath('/main/firstname/text()', col1) AS textcol FROM xmltab ) AS
xmlsource;

select query_to_xml('select * from emp', TRUE, TRUE, '');

About PGBENCH
pgbench is a simple program for running benchmark tests on PostgreSQL. It runs the same
sequence of SQL commands over and over, possibly in multiple concurrent database sessions,
and then calculates the average transaction rate (transactions per second). By default, pgbench
tests a scenario that is loosely based on TPC-B, involving five SELECT, UPDATE, and INSERT
commands per transaction. However, it is easy to test other cases by writing your own
transaction script files.

1. cd ~/downloads/postgres-8.3.7/contrib/pgbench/
2. make
3. sudo make install
4. psql -U postgres postgres
5. pgbench -i pgbench
6. pgbench -t 2000 -c 8 -S pgbench
7. pgbench -i -s 5 pgbench
8. pgbench -t 2000 -c 8 -S pgbench


About Postgres Replication
Postgres version 8.4 which is due out in Q2 2009 will include replication in it. There is already
replication available as separate products like Slony. The question is what kind of replication
does the application require? There is no single replication technique/tool that works for all
problem domains
About Large Objects
Postgres supports Large Objects through the contrib/lo add in component. Postgres stores the
large object as a series of chunks, though the chunks are hidden from the user. The user accesses
the large object through the OID of the large object. The OID data type is a 4 byte unsigned
integer so there is a limit of 4,294,967,296 large objects. Also, because the large object is
referenced by the table, you need to make sure that you use triggers to avoid creating orphaned
images. Large Objects are stored in the PostgreSQL Catalog (aka pg_catalog) in a table called
pg_largeobject,. In the pg_largeobject table, the large objects are broken into chunks and
assigned pageno value.
Page 23

1. cd ~/downloads/postgres-8.3.7/contrib/lo
2. cp lo.sql /home/postgres
3. sudo make install
4. psql -U postgres postgres
5. \i /home/postgres/lo.sql
6. CREATE TABLE image (name text, raster oid);
7. INSERT INTO image (name, raster) VALUES ('good
picture',lo_import('/usr/local/pgsql/data/myimage.bmp'));


select * from image;
"good picture";70060

Select * from pg_largeobject;
Loid (integer), pageno (integer), data (bytea)
70060;0;"<binary data>"
70060;1;"<binary data>"
70060;2;"<binary data>"

70060;746;"<binary data>"

About Partitioning
Postgres allows the users to partition tables into smaller tables and then access the table either in
aggregate or on the sub-table level. The is a good explanation of partitioning in postgres here
http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html

About Upgrading Databases
It is very possible to install two different versions of postgres on the same server. To do this and
have both instances running at the same time, you can modify the postgresql.conf file to start the
new server on a different port, say 5433 instead of 5432 (default). Then you issue the pg-
dumpall command to the old database, pipe that through to a psql command on the new database
and voila data is moved.

One note is important here, the pg_dumpall command should be the one from the newer version
of postgres not the old version so I like to specify the full path in the commands.

/usr/local/pgsql/bin/pg_dumpall -p 5432 | /usr/local/pgsql/bin/psql -d
postgres -p 5433

Backing up Postgres databases
I have an 11 page backup script that I modified from one I found on line. You can get it here
http://www.bitweaver.org/wiki/pg_backup+PostgreSQL+backup+script


This script produces a backup of each individual database with the options to Vacuum and GZIP
the backup. Each backup is dumped into a Month-Year Directory. Then each day in that month
Page 24
has a directory. Then each database that is backed up is a separate file with the time included in
the file name.

Of course there is a pg_dumpall command included in postgres that creates one backup file, but I
prefer to have them organized this way for daily backups and I use the dumpall command to
perform upgrades.

Crontab –e
#################################################################
#minute (0-59), #
#| hour (0-23), #
#| | day of the month (1-31), #
#| | | month of the year (1-12), #
#| | | | day of the week (0-6 with 0=Sun)#
#| | | | | commands #
#################################################################
00 21 * * * /home/postgres/pg_backup.sh bva > /dev/null 2>&1

[root@my_server backup]# ls -la
total 2100
drwxr-xr-x 7 postgres postgres 4096 Mar 26 12:25 .
drwxr-xr-x 3 postgres postgres 4096 Jul 23 2008 ..
drwxr-xr-x 32 postgres postgres 4096 Dec 31 21:00 December-2008
-rw-r--r-- 1 postgres postgres 2114563 Mar 25 21:34 dump.log
drwxr-xr-x 30 postgres postgres 4096 Feb 28 21:00 February-2009
drwxr-xr-x 33 postgres postgres 4096 Jan 31 21:00 January-2009
drwxr-xr-x 27 postgres postgres 4096 Mar 25 21:00 March-2009
drwxr-xr-x 32 postgres postgres 4096 Nov 30 21:00 November-2008

[root@my_server backup]# ls -la March-2009/
total 108
drwxr-xr-x 27 postgres postgres 4096 Mar 25 21:00 .
drwxr-xr-x 7 postgres postgres 4096 Mar 26 12:25 ..
drwxr-xr-x 2 postgres postgres 4096 Mar 1 21:32 2009-03-01
drwxr-xr-x 2 postgres postgres 4096 Mar 2 21:33 2009-03-02
drwxr-xr-x 2 postgres postgres 4096 Mar 3 21:33 2009-03-03
drwxr-xr-x 2 postgres postgres 4096 Mar 4 21:32 2009-03-04
drwxr-xr-x 2 postgres postgres 4096 Mar 5 21:33 2009-03-05
drwxr-xr-x 2 postgres postgres 4096 Mar 6 21:33 2009-03-06
drwxr-xr-x 2 postgres postgres 4096 Mar 7 21:33 2009-03-07
drwxr-xr-x 2 postgres postgres 4096 Mar 8 21:33 2009-03-08
drwxr-xr-x 2 postgres postgres 4096 Mar 9 21:33 2009-03-09
drwxr-xr-x 2 postgres postgres 4096 Mar 10 21:33 2009-03-10
drwxr-xr-x 2 postgres postgres 4096 Mar 11 21:33 2009-03-11
drwxr-xr-x 2 postgres postgres 4096 Mar 12 21:32 2009-03-12
drwxr-xr-x 2 postgres postgres 4096 Mar 13 21:33 2009-03-13
drwxr-xr-x 2 postgres postgres 4096 Mar 14 21:33 2009-03-14
drwxr-xr-x 2 postgres postgres 4096 Mar 15 21:33 2009-03-15
drwxr-xr-x 2 postgres postgres 4096 Mar 16 21:33 2009-03-16
drwxr-xr-x 2 postgres postgres 4096 Mar 17 21:33 2009-03-17
drwxr-xr-x 2 postgres postgres 4096 Mar 18 21:33 2009-03-18
drwxr-xr-x 2 postgres postgres 4096 Mar 19 21:33 2009-03-19
drwxr-xr-x 2 postgres postgres 4096 Mar 20 21:33 2009-03-20
drwxr-xr-x 2 postgres postgres 4096 Mar 21 21:33 2009-03-21
drwxr-xr-x 2 postgres postgres 4096 Mar 22 21:34 2009-03-22
drwxr-xr-x 2 postgres postgres 4096 Mar 23 21:34 2009-03-23
drwxr-xr-x 2 postgres postgres 4096 Mar 24 21:33 2009-03-24
drwxr-xr-x 2 postgres postgres 4096 Mar 25 21:34 2009-03-25

[root@my_server backup]# ls -la March-2009/2009-03-01
total 1242224
drwxr-xr-x 2 postgres postgres 4096 Mar 1 21:32 .
drwxr-xr-x 27 postgres postgres 4096 Mar 25 21:00 ..
-rw-r--r-- 1 postgres postgres 3579620 Mar 1 21:00 21h00m-postgresql_database-cdcdemo-backup.gz
-rw-r--r-- 1 postgres postgres 645 Mar 1 21:00 21h00m-postgresql_database-davec-backup.gz
-rw-r--r-- 1 postgres postgres 5677 Mar 1 21:00 21h00m-postgresql_database-depuy-backup.gz
Page 25
-rw-r--r-- 1 postgres postgres 736286 Mar 1 21:00 21h00m-postgresql_database-depuy_development-backup.gz
-rw-r--r-- 1 postgres postgres 138230 Mar 1 21:00 21h00m-postgresql_database-depuyioh0318-backup.gz
-rw-r--r-- 1 postgres postgres 70908 Mar 1 21:00 21h00m-postgresql_database-device_manager-backup.gz
-rw-r--r-- 1 postgres postgres 55673 Mar 1 21:00 21h00m-postgresql_database-gallery-backup.gz
-rw-r--r-- 1 postgres postgres 512 Mar 1 21:00 21h00m-postgresql_database-holycross_production-backup.gz
-rw-r--r-- 1 postgres postgres 65616 Mar 1 21:00 21h00m-postgresql_database-jaguar-backup.gz
-rw-r--r-- 1 postgres postgres 4331667 Mar 1 21:00 21h00m-postgresql_database-johns_shelfdemo-backup.gz
-rw-r--r-- 1 postgres postgres 124094 Mar 1 21:00 21h00m-postgresql_database-lftest2-backup.gz
-rw-r--r-- 1 postgres postgres 60961 Mar 1 21:00 21h00m-postgresql_database-lftest-backup.gz
-rw-r--r-- 1 postgres postgres 116434 Mar 1 21:00 21h00m-postgresql_database-Mercy-backup.gz
-rw-r--r-- 1 postgres postgres 51484405 Mar 1 21:01 21h00m-postgresql_database-orcartdemo-backup.gz
-rw-r--r-- 1 postgres postgres 1122198 Mar 1 21:02 21h00m-postgresql_database-orcarttest6-backup.gz
-rw-r--r-- 1 postgres postgres 1381083 Mar 1 21:02 21h00m-postgresql_database-orcarttest7-backup.gz
-rw-r--r-- 1 postgres postgres 1389030 Mar 1 21:02 21h00m-postgresql_database-orcarttest8-backup.gz
-rw-r--r-- 1 postgres postgres 318655 Mar 1 21:02 21h00m-postgresql_database-postgres-backup.gz
-rw-r--r-- 1 postgres postgres 14696 Mar 1 21:02 21h00m-postgresql_database-shelf_demo-backup.gz
-rw-r--r-- 1 postgres postgres 473572 Mar 1 21:02 21h00m-postgresql_database-shelfdemo-backup.gz
-rw-r--r-- 1 postgres postgres 12037163 Mar 1 21:02 21h00m-postgresql_database-testzimmer-backup.gz
-rw-r--r-- 1 postgres postgres 105710 Mar 1 21:02 21h00m-postgresql_database-visible-backup.gz
-rw-r--r-- 1 postgres postgres 35231267 Mar 1 21:03 21h00m-postgresql_database-visibledepuy-backup.gz
-rw-r--r-- 1 postgres postgres 249989576 Mar 1 21:09 21h00m-postgresql_database-visibledepuyioh-backup.gz
-rw-r--r-- 1 postgres postgres 805959 Mar 1 21:10 21h00m-postgresql_database-visibleethicon_boca-backup.gz
-rw-r--r-- 1 postgres postgres 3344345 Mar 1 21:10 21h00m-postgresql_database-visiblefirearms-backup.gz
-rw-r--r-- 1 postgres postgres 7379 Mar 1 21:02 21h00m-postgresql_database-visible_healthcare-backup.gz
-rw-r--r-- 1 postgres postgres 203387 Mar 1 21:02 21h00m-postgresql_database-visible_integris-backup.gz
-rw-r--r-- 1 postgres postgres 397098 Mar 1 21:02 21h00m-postgresql_database-visible_mercy-backup.gz
-rw-r--r-- 1 postgres postgres 170648 Mar 1 21:10 21h00m-postgresql_database-visiblesn_distoffice-backup.gz
-rw-r--r-- 1 postgres postgres 403956 Mar 1 21:10 21h00m-postgresql_database-visiblesn_frye-backup.gz
-rw-r--r-- 1 postgres postgres 26464515 Mar 1 21:11 21h00m-postgresql_database-visiblesn_tampa-backup.gz
-rw-r--r-- 1 postgres postgres 875658010 Mar 1 21:31 21h00m-postgresql_database-visiblezimmer-backup.gz
-rw-r--r-- 1 postgres postgres 318851 Mar 1 21:32 21h00m-postgresql_database-vprdemo-backup.gz
[root@my_server backup]#

Commands in psql
psql –u postgres postgres

The Bold Commands below are ones I use quite often.

General

\c[onnect] [DBNAME|- USER|- HOST|- PORT|-] connect to new
database (currently "myapp_dev")
\cd [DIR] change the current working directory
\copyright show PostgreSQL usage and distribution terms
\encoding [ENCODING] show or set client encoding
\h [NAME] help on syntax of SQL commands, * for all commands
\prompt [TEXT] NAME prompt user to set internal variable
\password [USERNAME] securely change the password for a user
\q quit psql
\set [NAME [VALUE]] set internal variable, or list all if no
parameters
\timing toggle timing of commands (currently off)
\unset NAME unset (delete) internal variable
\! [COMMAND] execute command in shell or start interactive shell
Query Buffer
\e [FILE] edit the query buffer (or file) with external editor
\g [FILE] send query buffer to server (and results to file or
|pipe)
\p show the contents of the query buffer
\r reset (clear) the query buffer
\s [FILE] display history or save it to file
\w FILE write query buffer to file Input/Output
Page 26
\echo [STRING] write string to standard output
\i FILE execute commands from file
\o [FILE] send all query results to file or |pipe
\qecho [STRING] write string to query output stream (see \o)

Informational
\d [NAME] describe table, index, sequence, or view
\d{t|i|s|v|S} [PATTERN] (add "+" for more detail) list
tables/indexes/sequences/views/system tables
\da [PATTERN] list aggregate functions
\db [PATTERN] list tablespaces (add "+" for more detail)
\dc [PATTERN] list conversions
\dC list casts
\dd [PATTERN] show comment for object
\dD [PATTERN] list domains
\df [PATTERN] list functions (add "+" for more detail)
\dF [PATTERN] list text search configurations (add "+" for more
detail)
\dFd [PATTERN] list text search dictionaries (add "+" for more
detail)
\dFt [PATTERN] list text search templates
\dFp [PATTERN] list text search parsers (add "+" for more
detail)
\dg [PATTERN] list groups
\dn [PATTERN] list schemas (add "+" for more detail)
\do [NAME] list operators
\dl list large objects, same as
\lo_list
\dp [PATTERN] list table, view, and sequence access privileges
\dT [PATTERN] list data types (add "+" for more detail)
\du [PATTERN] list users
\l list all databases (add "+" for more detail)
\z [PATTERN] list table, view, and sequence access privileges
(same as
\dp) Formatting
\a toggle between unaligned and aligned output mode
\C [STRING] set table title, or unset if none
\f [STRING] show or set field separator for unaligned query
output
\H toggle HTML output mode (currently off)
\pset NAME [VALUE] set table output option (NAME :=
{format|border|expanded|fieldsep|footer|null|
numericlocale|recordsep|tuples_only|title|tableattr|pager})
\t show only rows (currently off)
\T [STRING] set HTML
tag attributes, or unset if none
\x toggle expanded output (currently off) Copy, Large Object
Page 27
\copy ... perform SQL COPY with data stream to the client host
\lo_export LOBOID FILE
\lo_import FILE [COMMENT]
\lo_list
\lo_unlink LOBOID large object operations

Who are big dataset users of Postgres
Skype
https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer

https://developer.skype.com/SkypeGarage/DbProjects


Greenplum http://www.greenplum.com/


Hi5 Networks
http://news.cnet.com/8301-13505_3-10048937-16.html


Enterprise DB - http://www.enterprisedb.com/


Other Resources online:
Ten New Features that Make PostgreSQL 8.3 a Must-Have


http://wiki.postgresql.org/images/5/5a/Pgworkshop.pdf


http://wiki.postgresql.org/wiki/MVCC


If you want to see an application that really make huge use of Postgres go to
http://www.xtuple.com/postbooks
and download the Postbooks application and study the
Schema, it will really open your eyes to what can be done with postgres

Fermilab comparison of MySQL, Orable and Postgres:
http://www-css.fnal.gov/dsg/external/freeware/mysql-vs-pgsql.html


http://blogs.sun.com/jkshah/resource/pgcon_problems.pdf


http://it.toolbox.com/blogs/database-soup/postgresql-publishes-first-real-benchmark-17470