Demonstration of CORBA technology for distrib

pridefulauburnΔιαχείριση Δεδομένων

16 Δεκ 2012 (πριν από 4 χρόνια και 10 μήνες)

251 εμφανίσεις

DST
2
:Demonstration of CORBA technology for distributed database (cont.)
Vojtˇech Kupˇca (
vojtech@hafro.is
)
Marine Research Institute,Reykjavik,Iceland
April 2003
In this paper is discussed the distribution of DST
2
data warehouse (DW) using CORBA technology.Description of
solution and interaction of the query systemand user interface is provided.Detailed diagramof the systemarchitecture is
supplied in Appendices A and B.
1 Introduction
As a demonstration of linking several databases with marine ecosystemdata it was decided that CORBA technology will
be used.Its main advantages were first seen in platformand language independence.However,with better understanding
of the project these started to be not so significant contributions since only one type of language and operating system
were used.Moreover,real problems of the distributed database solutions were encountered that were not thought of at the
beginning.This report describes the results of an attempt to implement the systemdespite the mentioned obstacles.As it
was stated before

1

,more interesting from the point of view of truly usable and useful systemis usage of asynchronous
replication implemented by PostgreSQL Replicator

2

or later by native PostgreSQL facilities currently developed within
the PostgreSQL Replication Project

3

.
2 Solution
CORBA is an object oriented framework (specification) for interconnecting distributed computer systems regardless of
underlying software and hardware differences.Implementation of CORBA called JacORB with Java binding
1
was used.
The systemhas two main parts,one that processes all requests and does all internal data processing (called query system)
and the other which allows end-user to send requests to the query system and receive corresponding results (interface).
Query systemis programmed in Java and is composed of the following parts:
in one site only:

PHP server for communication with the user interface part

CORBA client which controls distribution of queries to all involved query systems and combination in temporary
tables of auxiliary database

CORBA name service for looking up references to objects by names

auxiliary PostgreSQL database for combination of partial results
in all involved sites:

PostgreSQL database systemwith setup DST
2
DWtables

ORB (Object Request Broker) with defined functions for accessing DST
2
DW
Interface part is web-based and is programmed in PHP and HTML.
The idea behind distributed querying can be described in several steps:
1.user fills in the request using interface part
2.request is sent to the CORBA client part for processing
3.CORBA client contacts all involved ORBs to process the request
4.ORBs send queries to (usually local) DST
2
DWs
5.ORBs report back results to the CORBA client
1
Java binding in JacORBis a mechanismthat maps CORBA specification details (e.g.data types) to the corresponding entities in Java programming
language.
1
6.CORBA client stores partial results inside of the temporary table of an auxiliary database and tries to combine all
queries as if only one systemwas queried
7.CORBA client hands over the result to the PHP server
8.PHP server forwards ’raw’ result to the user interface for processing and presentation
Communication between interface and PHP server (points 2 and 8) are realized using a simple communication protocol
described in

1

.In the following sections,some points fromthe above list are described in more detail.
3 ORB functions
What an ORB(the core CORBAcomponent which actually performs requests sent to it) can do is defined in ORBinterface
definition using language called IDL (Interface Definition Language).ORB interface defined in
QR.idl
contains these
functions:

long doQuery (in string theQuery);
execute query on a connected database

long getNumberOfRows();
return number of rows

long getNumberOfColumns();
return number of columns

stringArray getRow(in long rowNumber);
return array with one row specified by its number

stringArray getColumnNames();
return array of columns names

long getTypeOfColumn(in long columnNumber);
return column type code

boolean login (in string username,in string database,in string password);
open database connection and login to the database

void logout();
logoff the database and close connection
where
stringArray
is a container type defined as:
typedef sequence<string> stringArray;
.For a detailed de-
scription it is best to refer directly to the implementation file
QueryRetrieverImpl.java
.
4 Database setup
On all involved sites there has to be installed the PostgreSQL database (traditionally PostgreSQL runs on port 5432).
DST
2
DWtable definitions are stored in
dwC.sql
which is a part of the database setup scripts.In addition,on CORBA
client site there has to be installed auxiliary database system with write access for creation of temporary tables.These
serve for combination of results from different sites.
5 User interface
User can interact with the whole system using the web interface.For demonstration purposes and simplicity it was
sufficient to provide only a subset of functionality and features of DST
2
DWdatabase.User can therefore choose from
the following options:
a) output types:age-length keys (ALK),age-length distributions with average lengths at age (ALD),catches in numbers
by age and total landings (CNR),
b) available databases:Iceland or Celtic Sea,
c) areas:DST
2
divisions covering Iceland and the Celtic Sea,
d) year,month,species (cod or haddock),gearclass and vesselclass selection,
e) lengthcell range,age range and lengthcell interval selection,
2
f) it is possible to generate output in two forms:one is to display result formated as HTML tables,the other is to save
output in an ASCII text file.
On the webpage user can find information on howto use the interface and also the map of DST
2
DWdivisions.
6 Limitations and problems
As it was mentioned in

1

it was not a goal to build a versatile and robust distributed database system;instead a demon-
stration of simple linking of several databases using CORBA was intended.Therefore there exist some limitations to the
design as well as several other ad hoc solutions.

Database definitions have to be identical,the same query is sent to all databases.

CORBA client part is responsible for sending queries to the involved ORBs and retrieving their results.Later it has
to combine partial results from each database into the correct unified result.It tries to do so by storing results in a
temporary PostgreSQL table and rerunning the same query on the whole table.This approach is simple and works
for some queries but for many other it fails and certain modifications to query morphology are required.Let us
show a simple example where this technique fails.If we consider only one database with table
A
as
a
b
1
5
2
10
and we do
SELECT a FROM A WHERE b = 5;
,we obtain the first row and the first column of the table only.Run-
ning the query from temporary table as
SELECT a FROM temp1 WHERE b = 5;
is not correct since the column
b
is not selected in the first query.We can say that problem occurs if
SELECT
and
WHERE
clauses do not coincide
column-wise.Workaround for this is to split the above query into three:one that creates a newtable fromthe result
(using
SELECT INTO
) and returns only boolean value,second that selects everything from this table,and the third
that drops this table.More problems would arise if we needed to use aggregate functions with
GROUP BY
clause.
An example is an average function
AVG()
:if we consider tables
A
and
B
having both one column
a
,where table
A
contains values 1,1,4 (whose average is 2),table
B
contains values 3,5 (average 4).It is then obvious that true
average of 2.8 is not equal to the average of partial results.For more information see reference to the Mariposa
project in

1

.

Performance is dependent on how much data is being moved around the network (typically tens of kilobytes) and
how complicated the queries are.In extreme cases queries can take a couple of minutes to process.This is due to
the fact that many tens of thousands of rows are being selected from the databases and large matrices manipulated
by the web interface program.

Application uses threading for non-blocking execution of queries.From the point of view of program threads are
separate parallel execution paths of program that share common memory.For the operating system,creation of
thread means creation of a new process.During run it may happen that number of threads (processes) increases.
This behaviour was seen in our application and it is was not tested what happens if the application is run for a long
time.
7 Security
SSH
With distributed applications it is reasonable to address the issue of security of data being sent among involved sites.
Normally data are passing the network in a plain unencrypted way so that anyone can potentially eavesdrop the communi-
cation.On an insecure network (like e.g.Internet) it is often needed to protect the content being sent.In our case,JacORB
has built-in set of proxy classes (called Appligator) that provide a client with a ’gateway’ for calling server functions.For
our purpose,it is possible to use Appligator to direct all client-server communication to a specified port and tunnel this
port using
ssh
(secure shell) in a secure way over the network.Section 10.4 of the JacORB Programming Guide

4

provides further details on how to setup this configuration.
SSL
Another possibility to implement a secure data connection is to use the SSL (Secure Socket Layer).SSL encapsulates
higher level network protocols in an encrypted data streamusing symmetric cryptography protocols such as DES or RC4.
In JacORB the SSL is implemented through the Java Secure Socket Extension (JSSE).This solution is not suitable for
3
passing data through fire wall since it still opens the same high ports for communication as in the normal setup (though
this time in a secure way).
Either of the above described ways makes sure that client-to-server communication is secure.Eventually,however,the
data

are sent all the way from the PHP server and interface part to the user’s browser in an open form (see Appendix A).
Therefore,for the part between PHP server and interface the JSSE (for server) and PHP’s SSL (for client) is convenient
to use.For details on how to configure JacORB to use the SSL see section 11 of the JacORB Programming Guide

4

and JSSE Reference Guide

16

.For the part between the web interface and user’s browser HTTPS
2
is needed.These
measures ensure that no third party is able to see the passing data until they are presented to the (authorized) end user.
Connections between servers (or client) and databases are local and are considered as trusted.The same is true for the
connection between client and
ssh
’s tunneling port.Local connections are interceptable only through a loopback device
on a local host.
8 Installation and running
Before it is possible to run the system,there has to be made some installation steps first:

installed latest OpenSSL

5


installed latest Apache

6


for the encryption between web server and the end user it is necessary to install Apache module
mod
ssl

13

and create public/private key pair as described in

14

.Public key is usually distributed in the form of the so
called certificate file which contains some extra authentification information

remark:when signing the Certificate Signing Request (CSR) by the Certifying Authority (CA) using
sign.sh
script,one has to use different ”Organizational Unit Name” or ”Common Name”

installed Java environment SDK-1.4.1

7

,it is necessary to set variable
JAVA
HOME
accordingly (e.g.
export
JAVA
HOME=/usr/lib/j2sdk1.4.1
02
)

installed latest Apache Ant (Java based build-tool)

8


installed latest PostgreSQL database (compiled with
--with-java
option)

9


installed latest JacORB

10


for database access,the
jdbc:postgresql
driver is used,it is therefore necessary to inflate PostgreSQL sub-
directory jar-file
.../share/java/postgresql.jar
into the JacORB subdirectory
classes
and add this di-
rectory to
CLASSPATH
variable which tells JacORB where to look for java classes (e.g.
export CLASSPATH=
$CLASSPATH:/usr/local/JacORB
1
4
1/classes:.

copy template
jacorb.properties
file to your
$HOME
directory and set the initial name reference to where
your name service will be running (for example
ORBInitRef.NameService=corbaloc::130.208.66.191:1060/
StandardNS/NameServer%2DPOA/
root
)

setup PostgreSQL database with DST
2
DWtables (and uploaded data using
dst2up.php
)

installed PHP

11

with shared Apache module (compiled with
--with-apxs --with-openssl
options) and PHP
interface program

extended variable
$PATH
to include directories with executables of Java,JacORB,Ant and PostgreSQL

once application is extracted in some directory it has to be compiled and should be ready for launching:

commands are run from
phpServer
directory

compile all java sources with java compiler:
javac QR/*.java

start CORBA name service:
jaco -DOAPort=1060 org.jacorb.naming.NameServer NS
Ref &

start ORB:
jaco -DOAIAddr=130.208.66.191 QR.Server &

start PHP server/CORBA client:
jaco -DOAIAddr=130.208.66.191 QR.phpServer &
2
HTTPS is a secure HTTP protocol running typically on the port 443.This protocol uses mechanisms of public key cryptography to protect data
content.It is based on the open source implementation of SSL called OpenSSL

5

.To enable the use of SSL by the web server it is necessary to
configure
mod
ssl
module accordingly.
4

connect through browser to the interface webpage
If it is needed to tunnel traffic through fire wall one has to setup Appligator and do the following changes in configuration:

installed OpenSSH

12

with
ssh
client and
sshd
server

in
jacorb.properties
setup Appligator properties
jacorb.ProxyServer.*
:

jacorb.ProxyServer.Name=Appligator

jacorb.ProxyServer.ID=Appligator

jacorb.ProxyServer.URL=http://www.hafro.is/

vojtech/AP
Ref

for bypassing usage of Appligator in particular networks it is possible to set an extra two properties:
jacorb.ProxyServer.Netmask=255.255.255.0
jacorb.ProxyServer.Network=130.208.66.0

enable Appligator in CORBA client by adding ORB property:
ORBProps.put("org.omg.PortableInterceptor.ORBInitializerClass.ForwardInit",
"org.jacorb.proxy.ProxyClientInitializer");

run Appligator on a server (ORB) side on port e.g.22222 like
appligator 22222 AP
Ref &

fix Appligator inter-object reference in the file
AP
Ref
to point to the local system by
fixior 127.0.0.1 11111
AP
Ref

place
AP
Ref
to some client-readable location specified by
jacorb.ProxyServer.URL
property

PHP server/CORBA client (client) will use the file
AP
Ref
to find the Appligator and connect to it to access the
server functions.Patched
AP
Ref
redirects the client to the local port 11111.

configure the client to use the Appligator and tunnel communication fromthe client port 11111 to the server Appli-
gator port 22222 over
ssh
port 22 as described in the section 10.4.4 of the JacORB Programming Guide

4

.
It it necessary to remember that the numbers of ports and the names of databases and reference files are variable and has
to be setup and checked for correctness in appropriate source files or command lines.
9 Current status
Testing:In April 2003 application was successfully tested as a one query systemwithout encryption.
Celtic Sea:It it intended that the system is installed in IFREMER,Nantes later this year.By tunnelling communication
through
ssh
port 22 it should be possible to surpass problems with local fire wall configuration.
10 Credits and changes
Query systemwas developed at the University of Iceland by Oskar Audunsson (
oskara@hi.is
) and Bergsteinn Einarsson
(
bergste@hi.is
).Code clean-up,SSL version and testing was done by Orri Eiriksson (
orroloth@planewalker.com
).
For the documentation of class structure see

15

.
There had to be made certain minor corrections or changes to the source code:a) instead of original
jdbc:odbc
driver,
jdbc:postgresql
driver taken from PostgreSQL installation tree was used,b) problem with rerunning the query de-
scribed in section Limitations and problems was fix ed,c) creation of auxiliary tables changed to creation of real temporary
tables that are dropped automatically on connection close,d) queries containing aggregate functions return one row of
result even when being empty;in
QueryRetrieverImpl.java
it is first necessary to check the content of the first returned
row using
if(rs.getString(1)!= null)
before increasing number of rows by
rowCount++;
.
5
References

1

Demonstration of CORBA technology for distributed database:
http://www.hafro.is/

vojtech/dw/dw-corba02.pdf

2

PostgreSQL Replicator:
http://pgreplicator.sourceforge.net

3

PostgreSQL Replication Project:
http://gborg.postgresql.org/project/pgreplication

4

JacORB Programming Guide:
http://www.jacorb.org/documentation.html

5

Open SSL:
http://www.openssl.org

6

Apache:
http://www.apache.org

7

SDK-1.4.1:
http://java.sun.com

8

Apache Ant:
http://ant.apache.org

9

PostgreSQL:
http://www.postgresql.org

10

JacORB:
http://www.jacorb.org

11

PHP:
http://www.php.net

12

OpenSSH:
http://www.openssh.org

13

mod
ssl
:
http://www.modssl.org

14

mod
ssl
installation:
http://httpd.apache.org/docs-2.0/ssl/ssl
faq.html

15

class documentation
:
http://www.hafro.is/vojtech/dw/qrdoc

16

JSSE Reference Guide
:
http://java.sun.com/j2se/1.4.1/docs/guide/security/jsse/JSSERefGuide.html
6
Appendix A
DST2 DW tables
PostgreSQL database
5432
ORB
PHP server
auxiliary TEMP table
DST2 DW tables
PostgreSQL database
ORB
CORBA name service
5432
5432
1050
HI
HI
HI CORBA client
HI
1060
HI
HI
HI
HI
HI
HI
. . .
Query system 2
Query system 1
HTML/PHP interface
HI
HI
InterfaceInternet
phpServer.java
Server.java
80 / 443
end
user
Figure 1:Systemarchitecture:In the example above CORBA name service is started on port 1060;since then CORBA objects can
be referenced by names bound to the references.Afterwards the server (ORB) is started on one of JacORB’s high ports (marked HI);
first it needs to register with the naming service to be usable by other components.Later its IDL definition says what functions the
server can do (implemented in
QueryRetrieverImpl.java
):e.g.query local DST
2
DWdatabase,count returned rows and columns,
etc.Then PHP server/CORBA client program is run.It has two parts,PHP server listens on agreed port (e.g.1050) for incoming
web-client requests over a defined protocol and submits these requests the the second part,CORBA client.It’s purpose in brief is to
connect to user-specified servers (ORBs),send themappropriate SQL requests,wait for results,combine results using a local auxiliary
database table,and through the PHP server hand over the result to the end-user interface.It can be seen that more ORBs similar to
Query system 2 can be connected in the place of three dots in the above picture.Since all shown components communicate over the
network,their physical location is almost arbitrary;they can all reside on one host or can be scattered in many places.The above
diagramis just one of many possible configurations.
7
Appendix B
PHP server
CORBA client
Query system 1
HI
Firewall
HI
HI
AP_Ref
80
Appligator reference file
Query system 2
ORBHI
HI
SSHSSH
. . .
11111 22
Appligator
22
local network
Internet
HI
HI
22222
Figure 2:Secure system architecture using Appligator and
ssh
:Client (Query system 1) first reads Appligator object reference
fromthe file
AP
Ref
.Because
AP
Ref
is patched to refer to 127.0.0.1,the client thinks that the Appligator is found on the local host on
port 11111.On this port,however,listens
ssh
which tunnels connection in a secure way through port 22 and hands it over to the real
Appligator listening on server (Query system 2) on port 22222.The Appligator serves as a proxy for communication between client
and server.This way it is possible to overcome difficulties with restrictions of high-port communication over fire walls.
8