Postgres-XC Install Manual Version 0.9.5 NTT Open Source ...

frightenedfroggeryΔιαχείριση Δεδομένων

16 Δεκ 2012 (πριν από 4 χρόνια και 4 μήνες)

335 εμφανίσεις

Postgres-XC Install Manual Version 0.9.5
Postgres-XC
Install Manual
Version 0.9.5
NTT Open Source Software Center
June 30
th
, 2011
© 2010-2011 by Nippon Telegraph and Telephone Corporation
Postgres-XC Install Manual Version 0.9.5
Table of Contents
1. Code Acquisition



.................................................................................................................................

3

2. Postgres-XC Quick Start



.....................................................................................................................

3

2.1Recommended environment



..........................................................................................................

3

2.2Install commands



...........................................................................................................................

4

2.3Environment initialization and setting



...........................................................................................

4

2.4Quick Setup



....................................................................................................................................

5

2.4.1Configuration file setting



........................................................................................................

5

2.4.2pg_hba.conf setting



.................................................................................................................

6

2.5Launching Components



.................................................................................................................

6

3. Postgres-XC detailed installation



........................................................................................................

7

3.1Configure



........................................................................................................................................

7

3.2Make



...............................................................................................................................................

8

4. Postgres-XC setting



.............................................................................................................................

8

4.1initdb



..............................................................................................................................................

8

4.2Configuration files



.........................................................................................................................

9

4.2.1Datanode configuration



.........................................................................................................

10

4.2.2Coordinator configuration



.....................................................................................................

11

4.3Start up



.........................................................................................................................................

15

4.3.1GTM



......................................................................................................................................

15

4.3.2GTM proxy



...........................................................................................................................

16

4.3.3Launching Data Nodes and Coordinators



............................................................................

16

5. Postgres-XC start/stop functions: pg_ctl and gtm_ctl functionalities



..............................................

17

5.1pg_ctl module



...............................................................................................................................

17

5.2gtm_ctl module



............................................................................................................................

17

2
Postgres-XC Install Manual Version 0.9.5
1.
Code Acquisition

Code can be downloaded from the Source Forge Postgres-XC development page.
It can be taken from the public git repository in Source Forge. Install at least git 1.6.0 (supported by

Source Forge) on your machine, and type the following command:
git clone
git://postgres-xc.git.sourceforge.net/gitroot/postgres-xc/postgres-xc

Installing P
ostgres
-XC from source code requires at least gcc version 4.1 or higher and GNU make.

You may need flex later than 2.5.4 and bison later than 1.875.
2.
Postgres-XC Quick Start

This section presents how to quickly install Postgres-XC.
Please refer to part
3
for a more detailed procedure.
2.1
Recommended environment

Here is the basic structure used for Postgres-XC cluster:
Figure
1
Postgres-XC environment and
configuration
3
Postgres-XC Install Manual Version 0.9.5
This environment structure has been used to make Postgres-XC run as efficiently as possible by

dividing processes efficiently. With (n+1) servers (n nodes and one GTM) , GTM is on a dedicated

node and each other server is running one coordinator and one data node process of Postgres-XC.

Having the Coordinator and Datanode processes together and on each node allow for an even

distribution of workload. Also, reads from replicated tables can happen locally without requiring an

additional network shop.
The (n+1) servers are set on the same IP segment.
The GTM server may be of a different class of server than the other components. It does not require

have the storage requirements of the other components for example.
Note:
the user is free to spread the components as he wants. This configuration depends highly on

the application layer used.
When testing with a benchmark called DBT-1
, classified as CPU-bound
,

Postgres-XC development team

witnessed about 25-30% of CPU time being taken up by the

Coordinator processes, and the rest by the Data Node processes.
2.2
Install commands

The software has to be installed from the source code, which can be done easily with the following

commands, operation on to be in each of the (n+1) servers of the cluster:
./configure CFLAGS="-DPGXC"
make
make i
nstall
Everything will be installed in the folder /usr/local/pgsql by default, unless overridden with –prefix

option when running configure.

make install” has to be done as user root if installing in /usr/local/pgsql.
Note:
Postgres-XC uses by default a CFLAGS called DPGXC when compiled. This flag is activated

by default. In order to disable this flag, A flag UPGXC (undefine Postgres-XC) has to be set as

CFLAGS.
2.3
Environment initialization and setting

Coordinators, Data Nodes and GTM require data directories for storing their data. The coordinator

4
Postgres-XC Install Manual Version 0.9.5
storage requirements are modest, as it is mainly catalog information, but data nodes may require

much more, depending on the database.
An example for data directories appears below. Assume that the desired root data directory is /data.
mkdir /data
mkdir /data/coord /data/datanode /data/gtm
It can be also necessary for initdb process to allow access to the data directory for a proper

initialization of the environment. Be sure to launch:
chown username /data
chown username /data/datanode /data/gtm /data/coord
Then launch the following commands to initialize your environment:
initdb

D /data/coord
initdb –D /data/datanode
In each coordinator/datanode folder, a postgresql.conf and pg_hba.conf file will appear.
The next step is to set up those files.
Initialization is not required for GTM but the data folder has to be set up when launching the

process.
2.4
Quick Setup

This part explains how to quickly configure the Postgres-XC cluster.
2.4.1
Configuration file setting
Based on
Figure 1
, initialize the following values in postgresql.conf file located in the folder

coordinator
of server i (i between 1 and n):
-
pooler_port = 6667
-
num_data_nodes = n
-
coordinator_hosts = list of the IP addresses of the n Postgres-XC servers
-
coordinator_ports = 5432
-
data_node_hosts = list of the IP addresses of the n Postgres-XC servers
-
data_node_ports = 15432
5
Postgres-XC Install Manual Version 0.9.5
-
gtm_host = GTM ip address
-
gtm_port = GTM process port
-
pgxc_node_id = i
It doesn't cause any problems if the above values are the same on all nodes. It is actually encouraged,

as it makes maintenance easier.
pgxc_node_id plays an important roles for Coordinator because it permits the local coordinator to

identify itself in the arrays coordinator_ports and coordinator_hosts. For example, by setting

pgxc_node_id at 2, coordinator using this value thinks that its own IP and port are the second

elements in the coordinator setting arrays. Such a definition is thought more simple in the cluster

because it permits to keep the same arrays of data for Coordinators on all the Coordinator nodes.
A similar setting has to be done in postgresql.conf file located in the folder
datanode
of server i (i

between 1 and n):
-
gtm_host = GTM ip address
-
gtm_port = GTM process port
-
pgxc_node_id = i
2.4.2
pg_hba.conf setting
Add in each pg_hba.conf file of
both coordinator and datanode
folder the following line at the end

of the file:
host all all segment_IP.0/24 trust
This will allow each component to communicate with other components in the cluster.
If you have stricter security requirements, you can be more precise and modify accordingly. Please

refer to the PostgreSQL documentation for more details.
2.5
Launching Components

Figure 1

shows the minimum options necessary to launch it correctly.
In GTM machine, begin by launching GTM with the following command (it is important to start

GTM before the other components for node registration):
gtm -x 628 -p 6666 -D /data/gtm &
6
Postgres-XC Install Manual Version 0.9.5
-D option is mandatory for GTM, to make it consistent with PostgreSQL APIs.
It is also possible to launch an additional component called GTM Proxy so as to limit the network

load between Coordinators and GTM. Please refer to section
4.3.2
for detailed explanations.
Then on each of the n Postgres-XC servers, launch the datanodes:
postgres -X -i -p 15432 -D /data/datanode &
Then on each of the n Postgres-XC servers, launch the coordinators:
postgres -C -i -D /data/coordinator &
The cluster should run and is ready to be used. In case a customized installation is necessary, please

refer to part
3
.
3.
Postgres-XC

detailed i

nstallation

From the repository where Postgres-XC code has been saved, the installation is made up of two

steps: configure and make.
Postgres-XC is based on PostgreSQL code.
3.1
Configure

Postgres-XC can be configured with this simple command:
./configure
In case of compiling Postgres-XC with an optimization option, do the following:
export CFLAGS='-O2'
./configure
Postgres-XC code is easily identifiable in PostgreSQL code thanks to a C flag called PGXC.
Postgres-XC uses the same configuration options as PostgreSQL, here are a couple of useful ones

when setting your environment.
--enable-depend, useful if you are doing some development work. When a header is

changed, the make files will be modified so as that impacted objects will be rebuilt as well in

coordination.
--prefix=/data, so as to set the repository where you want Postgres-XC to be installed
--enable-debug, to allow debug options when running Postgres-XC
--disable-rpath, in case binaries needs to be moved. This option is particularly useful for

Postgres-XC as it is designed to be extended to a server cluster.
7
Postgres-XC Install Manual Version 0.9.5
--enable-cassert, activates assertions in the code. May have an impact on performance.
The user can also set a optimization option in the C flag setting with:
export CFLAGS=
'
-O2'.
This option is
highly
recommended to scale efficiently the cluster !
Postgres-XC activates by default a compilation flag CFLAGS called DPGXC. In order to disable

this flag, a flag UPGXC (undefine Postgres-XC) has to be set as CFLAGS when launching

configure.
./configure CFLAGS='-UPGXC'
or
export CFLAGS='-UPGXC'
3.2
Make

The two following commands have to be entered in the base folder where Postgres-XC has been

saved.
make
make install
After installation in the desired folder, Postgres-XC is divided into 4 subdirectories. The format is

the same as a regular PostgreSQL install:
- bin, containing all the call functions
- include, for the APIs
- library, containing Postgres-XC libraries
- share, for sample files and time zone management
4.
Postgres-XC setting

Postgres-XC is divided into the following major components: Coordinator, Datanode, Global

Transaction Manager (GTM as cited below) and GTM proxy.
All are available in the bin subdirectory of the installation directory.
4.1
initdb

It is necessary to setup Coordinator and Datanode environment folders with initdb on all the servers

where Postgres-XC will be run.
The initialization is made simply with initdb -D folder_name, for a coordinator
It can also use the same options as PostgreSQL.
8
Postgres-XC Install Manual Version 0.9.5
Depending on the folders where data is located, it can be necessary to grant access for initdb to data

folders. Be sure to launch the following commands:
chown username coord_folder
chown username datanode_folder
It is important
not to forget to divide
Coordinator and Datanode folders as they are separate entities,

using as well separate configuration files.
initdb -D coord_folder
initdb -D datanode_folder
For example, for a Datanode:
initdb -D /data/datanode
for a Coordinator:
initdb -D /data/coordinator
Now the basic environment is set up.
Before launching Postgres-XC components, they must be configured to allow the whole system to

work properly.
4.2
Configuration files

The next step concerns configuration files. Postgres-XC, as a database cluster product, has to be set

up depending on the environment where it is used.
Two files have to be modified for both the data node and coordinator.
-
pg_hba.conf
so as to allow all the Postgres-XC components to interact
properly
-
postgresql.conf
, which is the biggest part to modify.
All the configuration files are found where coordinator and datanode data directories have been set

up with initdb, as in regular PostgreSQL.
9
Postgres-XC Install Manual Version 0.9.5
4.2.1
Datanode configuration
4.2.1.1
pg_hba.conf
This configuration is exactly the same as for PostgreSQL. It allows to define restrictions and

authorizations for the access of Postgres-XC elements. In the case of the Datanode, it allows

controlling the accesses from coordinators.
The user has just to write in this file the list of access allowed for this Datanode with the following

syntax.
# TYPE DATABASE USER CIDR-ADDRESS METHOD
If Postgres-XC architecture is defined locally in a same segment, for instance 84.147.16.*, for all the

users and all the databases, it is enough to write:
host all all 84.147.16.0/24 trust.
The user can restrict the access to one Datanode for one database or one user also thanks to this file.
For example, if a Datanode's pg_hba configuration file has the following line in it:
host flight pilot 84.147.16.16 trust
Only coordinator at address 84.147.16.16 from the user pilot and the database flight can have an

access to this Datanode.
This permits to have a high control on the system security.
4.2.1.2
postgresql.conf
The configuration file of Postgres-XC is based on the one of PostgreSQL. It basically uses the same

options in terms of logs or vacuum for instance.
Additional parameters have been added to set up properly Postgres-XC components.
The additional parameters are classified into 2 categories.
The first category is related to the Data Node internal setting.
The second is related to GTM connection settings.
1) Datanode parameter
Parameter name
Description
Default value
port
Datanode port
5432
10
Postgres-XC Install Manual Version 0.9.5
2) GTM connection parameters
Parameter name
Description
Default value
gtm_host
IP address or host name where GTM or GTM

proxy is localized
*
gtm_port
GTM or GTM proxy port
6666
pgxc_node_id
Parameter used for node registration. This

number has to be defined uniquely in each

Datanode so as to avoid confusions at the GTM

level. Coordinator and Datanodes can use same

values though. If a Datanode uses the same ID as

another node already registered, it cannot start.
1
Note about pgxc_node_id:
If a Coordinator or a Datanode goes through a GTM proxy to connect to GTM. The Proxy ID used

by the node is also registered on GTM as well as the node ID. If node connects directly to GTM, the

Proxy ID associated to node when it registers is equal to 0.
4.2.2
Coordinator configuration
4.2.2.1
pg_hba.conf
The setup is more or less the same as for the Data Node. pg_hba.conf permits to restrict and to

authorize the access to the coordinator server from an application server.
As for PostgreSQL and the Datanode, the syntax is the same:
# TYPE DATABASE USER CIDR-ADDRESS METHOD
For example, if a coordinator has as authorization configuration:
host flight pilot 84.214.18.55 trust
This coordinator will only accept connections from the application server 84.214.18.55 for the table

flight from the user pilot.
As for the Data Node, this permits us to have a high control on the system security.
However, this setting has to be made in accordance with the other Postgres-XC components or the

system would encounter authentication/permission errors.
11
Postgres-XC Install Manual Version 0.9.5
4.2.2.2
postgresql.conf
The Coordinator's default configuration file is identical to the data node's. It uses as a base all the

PostgreSQL configuration parameters. Logs use for instance the same setup as normal PostgreSQL.
However, different parameters have to be set as Coordinator interacts with GTM, pooler, application

servers and Data Nodes when running.
They are classified in 4 categories: internal Coordinator configuration, pooler setting, connections to

Data Nodes, connection to GTM or GTM proxy.
1) Coordinator configuration:
Parameter name
Description
Default value
port
Coordinator port
5432
preferred_data_node
s
Possibility to set a data node number. For

replicated tables on several Datanodes, the

Coordinator will go in priority to this one. It

reduces particularly the traffic if Coordinator

targets a local Datanode. This should be set

to the local data node and can effect

performance
none
strict_statement_checking
Blocking statements like CREATE

TRIGGER, DECLARE CURSOR. Set at on,

it allows those statements to be executed on

all data nodes and coordinator.
on
strict_select_checking
Temporary parameter while Postgres-XC is

still being developed. It is intended to limit

the use of multi-node ORDER BY. GROUP

BY and DISTINCT are still checked

whatever strict_select_checking is on or off.
off
coordinator_hosts
List of
Coordinator
hosts used in Postgres-
XC system. The total number of host

addresses written has to be the same as

num_coordinators.
The input format is

none
12
Postgres-XC Install Manual Version 0.9.5
'host(1),host(2),...,host(n)' for n =

num_coordinators hosts
coordinator_ports
List of Data Node ports. Their number has to

be the same as num_coordinators. If only one

port number is defined, Postgres-XC assumes

that all the
Coordinator
are using the same

port number.
The input format is 'port(1),port(2),...,port(n)
none
num_coordinators
Number of Coordinators used in Postgres-
XC cluster
1
These parameters can be set easily by connecting to a coordinator via psql and to use SET query.
2) Pooler parameters
Parameter name
Description
Default value
max_pool_size
Maximum size of pool. This parameter has to be high

enough
so as to have a sufficient number of pooler

connections in the system. It depends on the number of

backends at each coordinator too.
1000
min_pool_size
Minimum size of pool
1
pooler_port
Pooler port to use for communication between the

Coordinator backend proceses and pooler.
6667
persistent_datanode_co
nnections
Parameter to be set at on or off. If it is set to on, the

connection used by a transaction is not put back in the

pool and held by the session. If it is set at off (the

default), when a transaction is committed, the

coordinator puts the connections back in the pool.
off
3) Data Node settings
By default, port at position 1 is associated to the host listed at position 1, port(2) to host(2), etc.
Those parameters are linked to the catalog where
Data Nodes
are identified through integer numbers.

host(1) is in the catalog
d
atanode 1,... host(n) is called in the catalog
Datanode n.
If only one port is defined in data_node_port, Postgres-XC assumes that all the
Data Nodes
are using

the same port number (recommended for simplicity if each data node is on a separate physical

13
Postgres-XC Install Manual Version 0.9.5
server).
For example, in the configuration case:
num_data_nodes = 3
data_node_hosts = 'datanode1,datanode2,datanode3'
data_node_ports = '12345,51234,45123'
By setting those parameters in a Coordinator configuration file, the Coordinator can connect to 3

servers called:

datanode1, using port 12345. This server is identified in catalog with number 1.

datanode2, using port 51234. This server is identified in catalog with number 2.

datanode3, using port 45123. This server is identified in catalog with number 3.
As in PostgreSQL, Postgres-XC provides support for both SET and SHOW so as to be able to set

guc parameters among the cluster efficiently and easily.
4) GTM connection parameters
Parameter name
Description
Default value
gtm_host
IP address or host name where GTM or GTM

proxy is located
*
gtm_port
GTM or GTM proxy port
6666
pgxc_node_id
Postgres-XC Node identifier on GTM, used for

node registration. This value has to be unique for

each Coordinator in the cluster. Coordinator also

uses this parameter to determine which is his

position in the array coordinator_hosts,

coordinator_ports. If Coordinator starts with the

ID which is the same of another Coordinator

already registered, it cannot start.
1
Note about pgxc_node_id:
The use of pgxc_node_id is critic so as to run correctly DDL and utilities in the cluster. For example,

on a Postgres-XC Coordinator, setting pgxc_node_id at a value equal to 2 will make the Coordinator

think that its own position in coordinator_* arrays is the second. If this is not set correctly, DDL and

utility statements cannot be run correctly.
This permit to keep the same values in coordinator_hosts and coordinator_ports for each Coordinator

node and this facilitates the maintenance of the system.
14
Postgres-XC Install Manual Version 0.9.5
If a Coordinator or a Datanode goes through a GTM proxy to connect to GTM. The Proxy ID used

by the node is also registered on GTM as well as the node ID. If node connects directly to GTM, the

Proxy ID associated to node when it registers is equal to 0.
4.3
Start up

The following activation order has to be respected when activated Postgres-XC:

GTM

GTM proxy (if necessary)

Data Node/Coordinators
4.3.1
GTM
GTM can be activated with the following arguments:

x option, first gxid used, by default 628 so as to correspond to initdb.

l option, log file of GTM. Default value is “/tmp/gtmlog”.

p option, port number of GTM. Default value is 6666.

D option, data folder repository. In this folder are kept PID file, option file and control file.

Control file is a file defining the sequence used in GTM as well as the last used GXID at

GTM shutdown. If GTM is stopped, it writes down to this file the sequence names and

number at stop time. There is no default value, as it has to be set at GTM launching.
A GTM instance can also be started as a standby, this instance would be in charge of continuing

cluster operations (snapshot and GXID feed) in case of a GTM failure.

S option, to start GTM instance as a Standby

-i option, host name or IP of GTM active instance to connect to

-q port, port of GTM active instance to connect to
For example, to start an active GTM instance:
gtm -x 628 -l /data/log/gtmlog -p 6680 –D /data/gtm &
To start a GTM standby instance.

gtm -x 628 -l /data/log/standbylog -p 6681 –D /data/standby -i localhost -q 6680 &
A GTM Standby is not necessary when building the cluster. It may only be used as an HA solution in

case of a GTM failure.
15
Postgres-XC Install Manual Version 0.9.5
4.3.2
GTM proxy
GTM proxy is an optional component used betwenn GTM and coordinator used to group queries

from Coordinators to GTM. It is usually launched as a sublayer of Coordinator in the same server.
GTM proxy can be activated with the following options:

h option, list of listened IPs of the GTM proxy, Default value is * (it should be able to hear

everything by default).

p option, GTM proxy port, this is targeted by Coordinator and Datanode in their

postgresql.conf file. Default value is 6666.

s option, host of GTM. Default value is *.

t option, port of the GTM in its host server. Default value is 6666.

n option, number of worker threads between GTM and GTM proxy. Default value is 2

l option, log file of GTM proxy. Default value is “/tmp/gtmlog”

D option, data directory for GTM proxy. In this folder are located PID and option files.

i option, gtm_proxy node id. ID used for GTM Proxy registration. It has to be unique in the

cluster or Proxy cannot start properly.
It is normally launched on the same server as a Coordinator, to efficiently group together requests.
For example:
gtm_proxy -h localhost -p 6541 -s gtm -t 6680 -n 2 -l /data/log/gtmproxylog –D

/data/gtm -i 1 &
In this case GTM is located in a server called “gtm” with port 6680 open. This gtm_proxy will be

registered as “1” in the gtm.
Proxy is launched locally with 2 worker threads to GTM created, and a proxy log file in the directory

written above.
4.3.3
Launching Data Nodes and Coordinators
The launch function of those two components is more or less the same as original PostgreSQL. It

uses the same options and an
additional
mode argument.
The only difference is that a -C option has to be used when launching a Coordinator, and the
-X option has to be used when launching a Datanode.
Here is a list of recommended parameters adapted to Postgres-XC cluster:
16
Postgres-XC Install Manual Version 0.9.5

i option, so as to let the Data Node or coordinator accept tcp-ip connections, essential in a

cluster.

D option, so as to use the correct initialized directory with the correct option (for example

not set -C with a repository initialized for a Data Node!).
Example of Data Node launch:
postgres -X -i -p 15451 -D /data/datanode &
Example of Coordinator launch:
postgres -C -i -p 5451 -D /data/coordinator &

5.
Postgres-XC start/stop functions: pg_ctl and gtm_ctl functionalities

5.1
pg_ctl module

pg_ctl uses the same functionalities as its implementation in PostgreSQL.
The only addition is a startup option –S coordinator|datanode which permits to choose whether to

start a Postgres-XC Coordinator or a Postgres-XC Data Node. This option is only necessary for start

and restart commands.
It is possible to start, stop, restart, reload GUC parameters as for a PostgreSQL server.
Examples:
Start/restart a coordinator/data node server:
pg_ctl –S coordinator/datanode –D datafolder –o ‘options wanted’
Stop a coordinator or data node server:
pg_ctl stop –D datafolder
5.2
gtm_ctl module

gtm_ctl is a module implemented to manipulate exclusively GTM and GTM proxy features of

Postgres-XC.
It permits to stop, start and restart those components efficiently. Its options are close to PostgreSQL

functionalities. They are listed above.
Option
Description
D
Data folder for GTM or GTM proxy
l
Log file for gtm_ctl
17
Postgres-XC Install Manual Version 0.9.5
m
Set mode, can be smart, fast or immediate
o
Option method. Permits to
pass down to gtm or gtm_proxy
applications some additional

options. Input options are set with a single quotation mark.
In case of customized options (gtm port/IP-address or proxy port/IP-address/Proxy ID),

following options have to be passed down like this for gtm proxy:
-o '-h localhost -p 6541 -s gtm -t 6680 -n 2 -i 1'
In the case of gtm, options can be passed
down
like this:
-o '-x 628 -l log_file -p 6680 –D /data/gtm'
Please
REFER
to gtm_proxy and gtm reference page for all the option details.
p
Set up postgres bin repository
S
Start/restart/stop option. Has to be set with gtm or gtm_proxy so as to launch one of those

server processes.
t
Wait time in seconds
W
or w
Wait option.
Examples:
Start/restart a GTM/GTM proxy server:
gtm_ctl start/restart –S gtm/gtm_proxy –D datafolder –o ‘options wanted’
gtm_ctl start/restart –S gtm/gtm_proxy –D datafolder –o ‘-h proxy_IP -p proxy_port -s

GTM_IP -t GTM_port -n worker_thread_number -i id_number’
Stop a GTM/GTM proxy server:
gtm_ctl stop –S gtm/gtm_proxy –D datafolder
With stop used as a command, option –S is needed because GTM and GTM proxy pid files are

named differently.
Look at the status of a GTM instance
gtm_ctl status –S gtm –D datafolder
If the data folder scanned is the one of an active GTM instance, status prints 1 as value.
If the data folder scanned is the one of a standby GTM instance, status prints 0 as value.
Promote a standby GTM instance to become active.
gtm_ctl promote -S gtm -D datafolder
Reconnect a GTM Proxy to an other GTM instance, in this case it is mandatory to specify the new

host and port values to connect to the new GTM instance.
gtm_ctl reconnect -S gtm_proxy -D datafolder_proxy -o '-s standby_host -t

standby_port'
[End]
18