PostgreSQL - Benjamin Arai

needmoreneedmoreΔιαχείριση Δεδομένων

28 Νοε 2012 (πριν από 4 χρόνια και 11 μήνες)

247 εμφανίσεις

PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


1










PostgreSQL



By: Benjamin Arai



Last Update: May 12, 2004

Version: 1.0.2


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


2

Table of Contents


1.

OVERVIEW

................................
................................
................................
................................
......

4

2.

WHY POSTGRESQL?

................................
................................
................................
....................

4

3.

WHY MYSQL?

................................
................................
................................
................................
.

4

4.

POSTGRESQL VS. MYSQL

................................
................................
................................
...........

4

4.1.

M
Y
SQL

................................
................................
................................
................................
.......

4

4.2.

P
OSTGRE
SQL

................................
................................
................................
..............................

5

4.3.

O
THER
C
OMMERCIAL
&

N
ON
-
C
OMMERCIAL
D
ATABASES

................................
...........................

5

4.4.

W
HICH DO
I

CHOOSE
?

................................
................................
................................
..................

5

5.

THE ACID TEST

................................
................................
................................
..............................

5

5.1.

A
TOMICITY

................................
................................
................................
................................
..

5

5.2.

C
ONSISTENCY

................................
................................
................................
..............................

5

5.3.

I
SOLATION

................................
................................
................................
................................
...

6

5.4.

D
URABILITY

................................
................................
................................
................................
.

6

6.

SETUP / INSTALLATION

................................
................................
................................
..............

6

7.

SYSTEM CONFIGURATION

................................
................................
................................
........

6

7.1.

R
ESOURCES

................................
................................
................................
................................
..

6

7.2.

R
EPLICATION

................................
................................
................................
...............................

7

7.3.

S
CALABILITY

................................
................................
................................
...............................

7

8.

AUTHENTICATION

................................
................................
................................
.......................

7

8.1.

A
UTHENTICATION
M
ETHODS

................................
................................
................................
.......

7

8.2.

A
UTHENTICATION
/P
RIVILEGE
G
RANULARITY

................................
................................
..............

8

9.

MANAGING RESOURCES

................................
................................
................................
............

8

9.1.

R
ESOURCE
L
IMITS

................................
................................
................................
........................

8

9.2.

L
INUX
M
EMORY
O
VERCOMMIT

................................
................................
................................
...

8

10.

SECURING SYSTEMS

................................
................................
................................
................

9

10.1.

S
ECURE
TCP/I
P

C
ONNECTIONS WITH
SSL

................................
................................
...............

9

10.2.

S
ECURE
TCP/IP

C
ONNECTIONS WITH
SSH

................................
................................
..............

9

11.

ENTERPRISE SYSTEMS

................................
................................
................................
...........

9

11.1.

R
EQUIREMENTS
................................
................................
................................
........................

9

11.2.

P
ERFORMANCE

................................
................................
................................
.........................

9

11.3.

C
OST

................................
................................
................................
................................
......

10

12.

BACKUP & RESTORATION

................................
................................
................................
...

10

12.1.

H
IGH
A
VAILABILITY

................................
................................
................................
..............

10

12.2.

R
EPLICATION

................................
................................
................................
.........................

10

13.

CLUSTER & HIGH AVAIL
ABILITY TECHNOLOGIES

................................
...................

10

13.1.

H
IGH
A
VAILABILITY

................................
................................
................................
..............

10

13.2.

R
EPLICATION

................................
................................
................................
.........................

10

PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


3

13.3.

C
LUSTERS

................................
................................
................................
..............................

11

14.

CREDITS

................................
................................
................................
................................
.....

11

14.1.

B
OOK
:

H
IGH
P
ERFORMANCE
M
Y
SQL

................................
................................
....................

11

14.2.

P
OSTGRE
SQL

M
ANUAL

................................
................................
................................
.........

11

14.3.

N
EW
A
RCHITECT
:

P
OSTGRE
SQL

VS
.

M
Y
SQL
................................
................................
........

12


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


4

PostgreSQL


1.

Overview

The PostgreSQL Global Development
Group is a community of companies and people co
-
operating
to drive the development of PostgreSQL, the worlds most advanced Open Source database software.


The PostgreSQL software itself had its beginnings in 1986 inside the University of California at
Berk
eley as a research prototype, and in the 16 years since has moved to its now globally distributed
development model, with central servers based in Canada.


2.

Why PostgreSQL?

There are SQL database systems, which scale, perform, and offer better reliability t
hen PostgreSQL.
This is obvious given the competitive commercial market for database software. PostgreSQL offers
is complete solution to small to mid
-
sized companies looking for cost effective reliable data storage.
Other solutions such as MySQL are cos
t effective but don’t offer the same level of reliability or
features expected from an enterprise class database system.


PostgreSQL development focuses on a rich set of tools that ensure the productivity and scalability to
the largest audience possible.
PostgreSQL is used in a variety of corporation’s for various uses
including research, storage, online
-
access and rapid
-
access storage.


3.

Why MySQL?

For completeness I felt I should add this section. MySQL is not useless and has many applications
given the
correct circumstances and data requirements. Among the most evident are high read
queries and low insertions. MySQL outperforms most other databases on raw read performance, so
given a high read requirement for a given application; MySQL may be the ideal

choice. For the
most part I have found MySQL is most often used because of its simplicity to setup and run.


4.

PostgreSQL vs. MySQL

The only thing I can say is that for small websites MySQL is much faster, but as you site becomes
larger needs more features

and requires better scalability, PostgreSQL becomes the undeniable
choice. For further, reference refer to:


http://www.databasejournal.com/features/mysql/article.php/32889
51

http://www.phpbuilder.com/columns/tim20000705.php3

http://www.webtechniques.com/archives/2001/09/jepson/

4.1.

MySQL

Fo
r the novice user MySQL is more than adequate for solving most database needs. This type of
tasks MySQL excels in includes logging, form storage and other web
-
based small transactions.
MySQL is also useful for very large databases but only under certain
circumstances. Provided
the database is accessed in a standalone method (no transactions). Since MySQL performs fast
reads for large databases it exceed PostgreSQL for large datasets in terms of read performance
but as soon as a large number of users sta
rt accessing the database (insertions & updates), the
system does not scale at an acceptable rate expected for large companies and organizations.


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


5

4.2.

PostgreSQL

PostgreSQL is by far the smartest solution for most database tasks. It lies in terms of complete
solution between the scalability of MySQL and other large commercial database vendors. For
moderate to large projects PostgreSQL is almost always the ideal and safest choice. At first
glance both MySQL and PostgreSQL look very similar and perform for the

most part the same in
light load environments but as soon as the requirements for the database increase such as a need
for greater query functionality or scalability, then PostgreSQL quickly becomes the optimal
choice. Unlike MySQL, PostgreSQL enjoys a f
ull feature query language similar in size and
functionality to other commercial databases such as Oracle and DB2. Of course, it goes without
saying that DB2 and Oracle are the premiere contenders in query functional but also have a
$25,000+ price tag.

4.3.

Ot
her Commercial & Non
-
Commercial Databases

It goes without saying that commercial databases for the most part have all of the features
MySQL and PostgreSQL contain but two main points make MySQL and PostgreSQL a viable
solution for both personal and profess
ional use. The first is cost, depending on the size of the
company they may not be willing to pay hundreds or thousands of dollars for a database system,
and this is becoming a strong emphasis for companies because most of the functionality in
PostgreSQL
and MySQL is all that is needed for most user applications. There is still a small
section of the industry that require commercial systems for their functionality, such as spatial
analysis but even that is quickly being implemented into current version of

MySQL and
PostgreSQL. Second is ease of use and cost of long
-
term administration. Every time a company
buys a piece of hardware they have to consider how much its going to cost in five, to ten years to
replace it. If the long
-
term cost out weighs the s
hort
-
term cost there is no reason to buy it at all.
The same principle holds for databases. In the case of commercial database servers it is yet to be
decided which cost less on a long
-
term scale but it seems to be obvious from the recent growth in
free
database servers that free is definitely better.

4.4.

Which do I choose?

That depends on the needs of the company. In my experience MySQL and/or PostgreSQL can
handle just about any task but in the short term the in ability of MySQL to handle a persistent
st
orage model or guarantee consistency, which cripples MySQL’s usability under most business
architectures.


5.

The ACID Test

5.1.

Atomicity

Atomicity is an all
-
or
-
none proposition. This guarantees that for a given request that either the
entire request is competed

or non
-
at
-
all. This is a important feature that PostgreSQL and
MySQL supports.

5.2.

Consistency

Consistency guarantees that a transaction never leaves your database in a half
-
finished state. If a
part of a transaction fails then the portion of the transactio
n already completed is rolled
-
back to
the original state. This is supported by PostgreSQL but not by MySQL.


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


6

5.3.

Isolation

Isolation keeps transactions separated from each other until they
-
re finished. It ensures that there
is not mixing of transaction data
during execution. This is important for data integrity. This is
supported by PostgreSQL but not by MySQL.

5.4.

Durability

Durability guarantees that the database will keep track of pending changes in such a way that the
server can recover from an abnormal ter
mination. This is supported by PostgreSQL but not by
MySQL.


6.

Setup / Installation

On both Windows and UNIX based systems installation of the application is a straightforward
process only involving the installation of a single package. PostgreSQL database

data is compatible
from version to version, allowing for quick upgrades and installations.


7.

System Configuration

7.1.

Resources

System requirements for PostgreSQL like any other database system are completely dependent
on the size and structure of the data bei
ng stored. Most web databases including web logs and
form data need very little storage or system resources but for larger datasets such as equities
data, spatial information or streaming data large amounts of resources are needed for processing
and the d
ata in an efficient manor. When designing a database system there are three main
sections that need to be addressed.




The first, is
hard disk

performance, hard disks are the single slowest medium on
computers. For databases which are larger, greater th
e one terabyte, they cannot fit all of
the data into main memory, therefore the system they must access the disk several times
in order to traverse the index to find the specified data. These create a requirement for
very fast disks, which take little tim
e seeking and locating information.




The second is
CPU

resources. In most cases this does not become a issue in large
database systems because the processor spends most of its time waiting for data to be
found on the disk but in systems where there is a

high number of transactions i.e. credit
transactions because a majority of the time is spent moving small amounts if data in
memory.




The third and most important resource is main
memory
. The amount of main memory on
a system can make or break the perfor
mance of a system independent of the speed of the
processor or the disks. System memory not only acts a fast medium to process data but
also acts the primary storage units for table indexes. For large databases it is imperative
that as much of the table
index is stored in memory to reduce the number of disk accesses
to the disk.


The combination of these three components makes up most if not all database systems today.
There are other components, which go into designing a database system but none of them

make
an impact as great as CPU resources, hard disk performance, and main memory.

PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


7

7.2.

Replication

Database replication is the corner stone of database management for systems of all sizes. The
ability to backup streaming data and more important survive a hard
ware failure is key in
providing reliable database services. Because standard backups put a high load on the system
(disk I/O), another method is needed to backup and restore data. The method of choice is
replication. Replication is a method of data du
plication via SQL or other proprietary
communication methods.


Once setup, there is a main server and any number of replication servers that act as streaming
mirrors to the main server. Every insertion or update that is made to the main SQL server is
pr
opagated to all of the replication servers to update the data. This ensures upon a failure any of
the replication servers can be used as the new primary server. Other features include bi
-
directional propagation, which allows any of the servers to be acce
ssed and any changes made to
one node will propagate to the other child nodes arbitrarily. This is method creates a true
autonomous system allowing for multiple failures to be handled gracefully, but is also the
hardest backup system to keep running coher
ently due to decrease in data stability. This is
obvious since a single bad drive could start spewing bad data to all of the other replication
servers.

7.3.

Scalability

Scalability is among the hardest obstacles to overcome using commodity hardware and softwar
e
but PostgreSQL offers elegant solutions that allow database to be scaled to virtually and limitless
level of abstraction. This system uses a single thread to carry incoming queries but each of these
queries is handled on there own thread. Even though t
he query receiving thread is alone it still
offers better or equal scalability to MySQL. In terms of multi
-
computer scalability, PostgreSQL
does not scale at all. For large cluster based clustering, commercial database systems are by far
more mature; som
e good implementations include Oracle and DB2.


The most important aspect of SQL server scalability is the ability for a server to scale both based
on the number of users and size of the dataset. In both of these area’s PostgreSQL performs very
well. Pos
tgreSQL scales to large number of users linearly with the hardware it is installed on.
PostgreSQL uses a modified B
-
Tree structure that scale to almost an infinite level of data.




8.

Authentication

8.1.

Authentication Methods

PostgreSQL supports various forms o
f authentication including trust, password, Kerberos, ident
and PAM. These are all handled in the configuration file “pg_hba.conf”.


When trust authentication is specified, PostgreSQL assumes that anyone who can connect to the
server is authorized to acce
ss the database as whatever database user he specifies (including the
database super user). This method should only be used when there is adequate operating system
-
level protection on connections to the server.


The password
-
based authentication methods
are md5, crypt, and password. These methods
operate similarly except for the way that the password is sent across the connection. If you are at
all concerned about password "sniffing" attacks then md5 is preferred, with crypt a second
choice if you must su
pport pre
-
7.2 clients.

PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


8


Kerberos is an industry
-
standard secure authentication system suitable for distributed computing
over a public network. A description of the Kerberos system is far beyond the scope of this
document; in all generality it can be qui
te complex (yet powerful). The Kerberos FAQ or MIT
Project Athena can be a good starting point for exploration. Several sources for Kerberos
distributions exist.


The ident authentication method works by inspecting the client's operating system user name

and
determining the allowed database user names by using a map file that lists the permitted
corresponding user name pairs. The determination of the client's user name is the security
-
critical
point, and it works differently depending on the connection ty
pe.

8.2.

Authentication/Privilege Granularity

One of PostgreSQL most powerful features is the ability to allocate privileges at a fine level.
The is apparent in its ability to not only give privileges to tables and databases but also to give
privilege to creat
e databases on a per user basis, therefore giving each user the ability to create
there own database and administration capabilities. MySQL also provides privilege
-
based access
but on a very simplistic level by offering all of the same controls but many t
imes requiring many
additional settings to achieve the same privileges PostgreSQL can accomplish in a single
command.




9.

Managing Resources

9.1.

Resource Limits

Unix
-
like operating systems enforce various kinds of resource limits that might interfere with the
o
peration of your PostgreSQL server. Of particular importance are limits on the number of
processes per user, the number of open files per process, and the amount of memory available to
each process. Each of these have a "hard" and a "soft" limit. The hard
limit is what actually
counts and the soft limit is not usually enforced. The root user can only change the hard limit.


System resources can also be controlled at the user level by Unix
-
like operating systems but for
windows resource limits built
-
into the

applications are a needed addition to ensure quality of
service for all users. The PostgreSQL server uses one process per connection so you should
provide for at least as many processes as allowed connections, in addition to what you need for
the rest of

your system. This is usually not a problem but if you run several servers on one
-
machine things might get tight.

9.2.

Linux Memory Overcommit

In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL.
Because of the way that the

kernel implements memory overcommit, the kernel may terminate
the PostgreSQL server (the postmaster process) if the memory demands of another process cause
the system to run out of virtual memory.


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


9

10.

Securing Systems

10.1.

Secure TCP/IP Connections with SSL

Postg
reSQL

has native support for using
SSL

connections to encrypt client/server
communications for increased security. This requires that
OpenSSL

is installed on both client
and server systems and that support in
PostgreSQL

is enabled at build time.


With
SSL

support compiled in, the
PostgreSQL

server can be started with
SSL

enabled by
setting the parameter
ssl

to on in
postgresql.conf
. When starting in
SSL

mode, the server will look
for the files
server.key

and
server.crt

in the data directory, which should co
ntain the server private
key and certificate, respectively. These files must be set up correctly before an
SSL
-
enabled
server can start. If the private key is protected with a pass phrase, the server will prompt for the
pass phrase and will not start until

it has been entered.


The server will listen for both standard and
SSL

connections on the same TCP port, and will
negotiate with any connecting client on whether to use
SSL
.

10.2.

Secure TCP/IP Connections with SSH

One can use
SSH

to encrypt the network conne
ction between clients and a
PostgreSQL

server.
Done properly, this provides an adequately secure network connection.


11.

Enterprise Systems

11.1.

Requirements

System requirements for enterprise grade servers vary greatly from the machine a user would use
for say a

web log or DNS server. Enterprise systems must offer both a higher level of reliability
and speed then a standard workstation can offer. Some of these requirements usually include
some sort of
RAID

storage for the data and also some sort
autonomous

syst
em recovery. Any
further information in regards to enterprise level hardware is beyond the scope of this paper.

11.2.

Performance

System performance varies from system to system based on the requirements pre
-
specified by
the owner, company, or application. Co
nfiguration scripts are the key to creating a PostgreSQL
system that is optimized for both the hardware and the software. From a hardware standpoint,
the most important part of optimization is estimating the amount of resources available on the
system and

the required resources by the database and the expected number of users accessing
the system.


In most cases systems are dedicated to running either only the PostgreSQL database or the
database with minimal or no other applications running. In this case
the system can be setup to
allocate most if not all of the available resources to the to the SQL server. For PostgreSQL these
resources are located under “/var/lib/pgsql/” under the files name “postgresql.conf”. This file
contains all of the options for
limit all of the available resources to the PostgreSQL server.



PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


10

11.3.

Cost

PostgreSQL cost per system is approximately linear with the size of the dataset times the number
of users expected to access the database. The cost saving from PostgreSQL lies in the l
icensing
of the software making it essentially free (BSD License). This in
-
tern leaves the cost of the
entire system to only the cost of the hardware.


The cost of hardware per system depends on the required resources as stated but also other
supporting

hardware (systems) for replication and backup, which acts as streaming and static
backups of the current data. This backup server(s) usually cost the same or slightly less then the
main server.


12.

Backup & Restoration

12.1.

High Availability

So what is high avai
lability? High availability is anything this is going to give your SQL
server(s) or any server for that matter a level of service availability greater then that of what
could be achieved through the use of normal installation of a SQL server or etc. This

has
become a buzz word in recent past because the idea of
high availability
has been around for quit
a while through mainframe technologies but only recently has the idea of
high availability

been
brought to commodity hardware.

12.2.

Replication

Database replic
ation has an undeserved reputation for being complex to setup and prone to
failure. The early version of MySQL and PostgreSQL replication were difficult to configure
because the process was inadequately documented. Current versions of replication offer a

more
manageable and complete solution for datareplication, especially for multi
-
point storage.


13.

Cluster & High Availability Technologies

13.1.

High Availability

So what is high availability? High availability is anything this is going to give your SQL servers
or any server for that matter a level of service availability greater then that of what could be
achieved through the use of normal installation of a SQL server or etc. This has become a buzz
word in recent past because the idea of
high availability
has b
een around for quit a while through
mainframe technologies but only recently has the idea of
high availability

been brought to
commodity hardware.

13.2.

Replication

Database replication has an undeserved reputation for being complex to setup and prone to
failure
. The early version of MySQL and PostgreSQL replication were difficult to configure
because the process was inadequately documented. In its most basoc form, replication consists
of two servers: a master and a slave. The master server(s) record all queri
es that modify or add
data to the databases. The slave connects to the master, reads queries from the master’s binary
log (list of changes), and executes them against its local copy of the data.


PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


11



Data distribution: allows data to be copied to multiple lo
cations without any manual
interventention. This can be particularlly helpful where live backups must be kept at
all times.




Load balancing: replication offers a convient method for creating a load balancing
system for select queries. It does not scale
for inserts or updates because they must all
be written to the master node. Multi
-
master master mode is possible but is prone to
errors.




High availability & failover: Like other 24/7 services, SQL servers cannot have
downtime and suffer from backup pre
formance degredation. Replication distributes
backups to live feeds, therefore eliminating/reducing the need for scheduled backups.


Replication as shown solves many problems but still leaves some problems to be solved.
Among the largest of these issues
is scalable writes. Under the current replicaton architecures
writes cannot scale with the number of nodes, therefore creating a severe bottleneck for certain
query circumstatnces. For these problems another systems is used called clustering.

13.3.

Clusters

My
SQL Cluster combines the world's most popular open source database with a fault tolerant
database clustering architecture so you can deliver mission
-
critical database applications with
99.999% availability.


MySQL Cluster enables you to:




Cost
-
effectively
deliver 5 nines availability using parallel server architecture with no
single point of failure.




Deliver the performance and high throughput required to meet the most demanding
enterprise applications.





Incrementally scale your applications in a linear
fashion as your needs grow without
having to invest in expensive hardware.


MySQL Cluster have a flexible distributed architecture, which gives you complete control over
the level of performance, reliability and scalability you need to match any applicatio
ns
requirements.


14.

Credits

14.1.

Book: High Performance MySQL



Abstract: Advanced Tools & Techniques for MySQL Administrators



By: Jeremy D. Zawodny & Derek J. Balling



O’reilly Press 2004

14.2.

PostgreSQL Manual



Abstract: Manual

PostgreSQL Summary & Introduction

The overview is taken from
http://advocacy.postgresql.org/about/
.


12



By:
http://www.postgresql.org/docs/7.4/static/biblio.html



PostgreSQL 2004



Many of the sections in this whitepaper are copied directly from this site. I am not
claiming them as my own in any way shape or form.

14.3.

New Architect: P
ostgreSQL vs. MySQL



Abstract: Comparison based on building a better database



By:
http://www.webtechniques.com/archive/2001/09/jepson/



Most of the stuff used if from the ACID testing sect
ion.