Introduction to ETL Solutions - JISC

righteousgaggleData Management

Jan 31, 2013 (4 years and 4 months ago)

107 views

ioNode

‘A Framework for Interoperability’



Shell Project
-

ETL Activities


October 15
th

2003

John Bigerstaff

Wynne Rees

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ETL Solutions Ltd Overview


MBO of Prismtech April 2002


Data Transformation Team and Product


Based in Blaenau Ffestiniog Wales, UK


10 member team (8 technical)


Sales and Marketing Office in North East


Standards based solutions


POSC, PPDM, W3C standards (XML, XSLT)


Product and services company


>$4m investment in development of
E
xtract
T
ransform
L
oad (ETL) tools


Principal market is large enterprises in Finance, Oil
and Gas, Education and Health ...

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ETL Solutions Strengths



ETL product, Transformation Manager (TM)


System Integrators of Messaging and ETL solutions


Highly qualified and experienced 10 member company
with IT and Transformation Skills


Requirements Capture and Analysis


Data Modelling using EXPRESS, UML,


Many years of Java and C++ Programming


RDBMS design, development and tuning


Extensive knowledge of W3C standards XML, XSD, XSLT


Project Management


Transformation Manager


ETL Tool Meta Data Code Generator


Java components for easy deployment in client architectures

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioNode


A framework for interoperability to meet the
interoperability challenges of today



Flexible Hardware and Software configuration


Developed in Java to ‘run anywhere’


Software based on


Open Source middleware software


Low cost transformation component


Standards based


SOAP [Simple Object Access Protocol]


XML [eXtensible Mark up Language]


ebXML [electronic business XML]


JMS [Java Messaging Service]



Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioNode Software

Operating System - FreeBSD
JAVA Runtime Environment - JDK 1.3+
Webserver / Servlet Container - Apache Jakarta Tomcat
SOAP Handler - AXIS
Message Handler - OpenJMS
Persistent Storage - PostgreSQL
XML
DOM Handler - JDOM
Parser - Xalan
Validator - Xerces
Transformation Manager - ETL TM
Webserver - Apache
HTTPD
Administration
Active Scripting -
PHP
Storage -
PostgreSQL
Native XML Storage - Apache Xindice
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ETL Role in ioNode


Responsible for the design and implementation of
ioNode software components:


ioAgent


messaging component


ioHub


message routing


ioTransform


data transformation


ioDB


XML persistent data store


Utilise Transformation Manager for design of
transforms


For SHELL project


CSV to IMS
-
LIP


LSC to HESA code lookup


and vice versa



Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Shell Hub and ioAgents

SHELL

HUB

SCR

SR

VLE

MIS

PDP

PDP

SR

VLE

PDP

= IO Agent

SHELL

HUB

SCR

SR

MIS

PDP

PDP

SR

VLE

PDP

VLE

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Scalability of ioNode and SHELL Architecture

Regional
Record
Centre
Regional
Hub
INSTITUTION
VLE
PDP
SR
ioConsort
ioCo
nsor
t
ioCo
nsor
t
Wide
Area
Hub
Learner
Record
Progress
File
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

SHELL Data Flow and Transforms

ioAgent
ioHub
LSC to
IMS-LIP (HESA)
Lookup Lists
Learner
Record
Database
Student
Record
Database
SOAP transport over https
Add, Update, Withdraw, Query
Student records sent as IMS-LIP record
SOAP transport over https
Asynchronous message response
returns results in IMS-LIP record
Admin
Database
Outbound Queue
Send Response
Queue
Transform
CSV to IMS-
LIP XML
Admin
Database
Outbound Queue
Send Response
Queue
Asynchronous message
sent in response to
LRDB update
IMS-LIP message read,
Global Unique ID
generated if new student
record and
LRDB updated
Plug-in accesses the
message
IMS-LIP (HESA)
to LSC
Lookup Lists
Transform
IMS-LIP XML
to CSV
Admin
Database
Message Rx Queue
Inbound Queue
Admin
Database
Plug-in creates
student record as CSV file
entered into DB
or HTPPS protocol
or file stored in directory
Message Rx Queue
Inbound Queue
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Messaging Requirements


Once and once only message delivery


ebXML uniquely identified messages are
acknowledged upon delivery


checks for duplication are carried out


Send and re
-
send


ebXML messages not successfully acknowledged
are re
-
sent a fixed number of times


Persistent storage


Provides a record of the message exchanges and
allows messages to be re
-
sent if required

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioAgent at College


Messages posted using one of the following:


Direct to outbound message queue


A file drop directory


A HTTP Post interface



Send Message


Respond to messages posted in outbound message queue (ionodepost).


Initiate transforms that have been configured


e.g. CSV to IMS
-
LIP and LSC to HESA code conversions


Retry messages preset number of times if synchronous reply not received



Receive Message


Respond to messages in ionodereceive


Initiate transforms that have been configured


e.g. IMS
-
LIP to CSV and HESA to LSC to code conversions


Place message in inbound message queue



Persistence provided by PostgreSQL

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioAgent

Messages sent and
and synchronous reply
ioAgent
Send Channel
Receive Channel
Plug in
Post Messages
Configure Transforms
Delivery Results
Read Messages
Configure Transforms
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
Messages received and
and asynchronous replies
ioHub
In bound message queue
Message receive queue
Transform
Data validation
Data Handling Rules
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Synchronous Message Response

ioHub
ioAgent
Admin
Database
Outbound
Queue
Reliable
Messanging
Queue
Send Response
Queue
Synchronous reply
SOAP transport over https
LSC to
IMS-LIP (HESA)
Lookup Lists
Transform
CSV to IMS-
LIP XML
Admin
Database
Message
Receive
Queue
Inbound
Queue
Attempt retries if not received
Transforms
(if required)
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioAgent


ioNodePost

index

Field name

type

Description

1

index

varchar(80)

Unique identifier for entry (Primary Key)

2

rowcount

bigint,

Index of number of rows

3

messageId

varchar(80)

Unique message ID, set by ioNode, used for message identification

4

timestamp

bigint,

Timestamp set when table is read by ioNode

5

status_client

varchar(20),

Flag for success of failure of message use by application

6

status_message

varchar(20),

Flag for success or failure of message delivery

7

sourcedatatype

varchar(20),

Type of source data in data field eg. CSV

8

sourcedatatypeversion

varchar(20),

Version of source data in data field eg. v1

9

targetdatatype

varchar(20),

Type required of target data eg. LIP

10

targetdatatypeversion

varchar(20),

Version required of target data eg v1

11

senders_authorisation

varchar(80),

Authorization field for sender

12

function_type

varchar(80),

Action type of post eg. Add

13

target

varchar(80),

Intended recipient of message eg. UoP.

14

domainref

varchar(80),

The identity of the sender eg. Cornwall

15

localidentifier

varchar(80),

Security identity field for the sender

16

altdomainref

varchar(80),

Alternative identity for the sender

17

altlocalidentifer

varchar(80),

Alternative security field for the sender

18

data

bytea

The data to be sent as payload eg. CSV

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

IoAgent
-

ioNodeReceive

index

Field name

type

Description

1

rowcount

bigint,

Index of number of rows

2

messageId

varchar(80)

Unique message ID, set by ioNode, used for message identification

3

timestamp

bigint,

Timestamp set when table is read by ioNode

4

message_type

varchar(80)

Type of data eg. CSV

5

version

varchar(20)

Version of data eg v1

6

senders_authorisation


varchar(80),

Authorization field for sender

7

function_type

varchar(80),

Action type of post eg. Add

8

fromTarget

varchar(80),

URL of sender of message

9

domainref

varchar(80),

The identity of the sender eg. Cornwall

10

localidentifier

varchar(80),

Security identity field for the sender

11

altdomainref

varchar(80),

Alternative identity for the sender

12

altlocalidentifer

varchar(80),

Alternative security field for the sender

13

data

bytea

The data received

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioHub at University


Implements IoAgent send and receive of messages


Sends synchronous and asynchronous messages replies


May act as message router



Incoming messages received in Message Receive Queue
(ionodereceive)


Acts as router to other ioAgents if required


In this case makes synchronous reply only when successfully
forwarded


May transform incoming messages before placing in inbound queue



Outgoing messages


Responds to asynchronous messages placed in outbound message
queue


May transform outgoing messages before placing in message delivery
queue


Acts as message router for asynchronous replies

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Hub as Router

ioHub
ioAgent
LSC to
IMS-LIP (HESA)
Lookup Lists
Student
Record
Database
Add or Update
Student record
sent as IMS-LIP record
Transform
CSV to IMS-
LIP XML
IMS-LIP (HESA)
to LSC
Lookup Lists
Transform
IMS-LIP XML
to CSV
IMS-LIP (HESA)
to LSC
Lookup Lists
Transform
IMS-LIP XML
to CSV
ioAgent
Student
Record
Database
Application level processes
requests for data and
returns sets of records in XML.
SOAP transport over
https
Admin
Database
Outbound
Queue
Send Response
Queue
Admin
Database
Message Rx
Queue
Inbound Queue
Admin
Database
Message Rx
Queue
Inbound Queue
Route to
Destination
Admin
Database
Message Rx
Queue
Inbound Queue
Admin
Database
Outbound
Queue
Send Response
Queue
Admin
Database
Outbound
Queue
Send Response
Queue
LSC to
IMS-LIP (HESA)
Lookup Lists
Transform
CSV to IMS-
LIP XML
Admin
Database
Message Rx
Queue
Inbound Queue
Admin
Database
Outbound
Queue
Send Response
Queue
SOAP transport over
https
Route to
Destination
ioHub acts as a router
sending messages
to other institutions
SOAP transport
over https
SOAP transport
over https
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Hub as a Router


Seamless Connection

ioHub
message delivery queue
message receive queue
Messages sent and
and synchronous reply
Messages received and
and asynchronous replies
ioAgent
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
In bound message queue
Message receive queue
Transform
Data validation
Data Handling Rules
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
In bound message queue
Message receive queue
Transform
Data validation
Data Handling Rules
ioAgent
Send Channel
Receive Channel
ioHub
message receive queue
Messages sent and
and synchronous reply
Messages received and
and asynchronous replies
ioAgent
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
In bound message queue
Message receive queue
Transform
Data validation
Data Handling Rules
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
In bound message queue
Message receive queue
Transform
Data validation
Data Handling Rules
ioAgent
Send Channel
Receive Channel
message receive queue
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

JMS Internal Services
ioAgent
Third Party Processor
AXIS Client/Server
Implementation
TOMCAT
Servlet Agent AXIS Handler
ioHub
Third Party Processor
TOMCAT
Servlet Agent AXIS Handler
JMS Internal Services
ioAgent
Third Party Processor
AXIS Client/Server
Implementation
TOMCAT
Servlet Agent AXIS Handler
SOAP
SOAP
AXIS Client/Server
Implementation
JMS Internal Services
TOMCAT - Provides HTTP services and a Java Servlet container.
AXIS - provides SOAP message handling and link into HTTP server
JMS - Java Messaging Service - Provides a messaging framework on which to build
reliable messaging services
Third Party Processor - Provides rules and validation of messaging.
Messaging Implementation ioAgent to ioAgent

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioTransform


Ability to deploy java transforms as an integral part
of messaging



Transforms automatically generated using ETL Integrated
Development Environment, Transform Manager


Open Interfaces allow transforms to and from virtually any
format

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Transformation Manager Rapid Development Process


Transformation Manager
supports the whole
development process


Design


Testing


Deployment


The heart of the system is
Transformation Builder
, a
state
-
of
-
the
-
art interactive
transform designer



Close integration with
Transformation Tester
to
quickly verify transforms



Flexible Deployment via mixed
Java and XSLT run time
environments




Deploy
Test
Design
Business
Model
Deploy XML
Transfoms
Interactive
Test
Multiple
Target Models
Specimen
Target Instances
Specimen
Target Instances
Multiple
Source Models
Specimen
Source Instances
Specimen
Source Instances
Transformation
Builder
Transformation
Test Tool
Client Environment
Java or XSLT
components
Deploy Java
or XSLT Code
Static Java or .Net
Runtime
Deployment
Tool
Deploy Java
Components
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Step By Step Approach


Load data models into repository


In
-
built support for RDBMS, XML, and Java


Open Interface to access virtually any kind of data


Create Transforms


Define transforms by drag and drop


Add more complex mapping


and complete business logic


Viewing instance data to provide assistance


Test transforms


Viewing instance data


Rework transforms


Deploy


Simple Java deployment


Update and maintain


Transforms generate HTML documentation


Transforms are easily updated as models change

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Transformation Manager Design Environment

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Open Interfaces


Extends TM’s Ability to Any Scenario


Access to User Specified
Models


Supports Associated
Instance Data


Fits Client Architectures


Supports Push and Pull
modes


Pull Model supports
‘Propagation of Selection’


Example Uses


Express
\
Epicentre Part21


ASCII LIS, LAS,


Binary RP66


Edifact Messages

Design
Transformation
Builder
Deploy
Deploy Components
Java, XSLT, XML
Transformation
Deployment
Options
RDBMS, Java, XML
Models
RDBMS, Java, XML
Specimen Instances
Open Interface
TM Loader
Other Models
e.g. Flat File, Express
TM Generic
Adapter
Other
Specimen Instances
TM
Depolyment
Interface
Initialise, Set Adapters,
Execute Transforms
RDBMS, XML, Java
Specimen Instances
RDBMS, XML, Java
Specimen Instances
TM Generic
Adapter Read
Interface
Other
Specimen Instances
TM Generic
Adapter Write
Interface
Other
Specimen Instances
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Truly Open Interfaces


Access to all forms of models and instance data


Allows user models to be loaded with full constraints, relationship
cardinality, etc.


Easily create user
-
defined read and write adapters for accessing
instance data


Transforms called by client architecture with parameter
passing and callbacks as required


Java interface description published for users to implement


Models and instance data can be accessed and viewed as
any other data in TM


Numerous uses and examples available


Edifact, CSV files, EXPRESS Models and Part 21

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioTransform Interface


ioNode is designed to support a very simple and generic
transformation interface


public ArrayList transform(String
configFile,ArrayList dataIn)throws Exception;



Carry out a transform on a list of source data, dataIn, and return
a list of target data entities.



The configuration of the transform is contained in the
configuration file supplied by the caller.



The configuration file is read using methods contained by the
transform provider so can be of any schema and format


Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioDB Requirements


Storage of student learner records as native XML


Generation of a consortium and LRDB unique identifier


Update student learner records


Roll back log of present number of transactions or ‘messages’


Generation of Asynchronous messages and responses to
business transactions such as new record.


The ability to switch and configure the number of transactions
stored per consortium and LRDB unique identifier.


The ability to switch the facility for extending and building
individual master records using xpath updates and a unique
identifier.


An admin interface for record retrieval and reporting is provided.


A standard API for third party interfaces such as the Learner
record portal is to be made available and documented.

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Learner Record Database

Xindice XML
Database
IoAgent
Rules
Processor
Query
Builder
Xupdate
Builder
ioAgent passes an IMS-LIP record and a process action to
the Rules Processor
The Rules Processor decides between building an Xpath
query to request a result set or to build an Xupdate
command set to perform changes to a record.
The XML database stores records passed to it and returns
a unique identifier for new records. It processes requests
for data and returns sets of records in XML.
Results
Processor
Xindice XML
Database
IoAgent
XMLRecords
Xupdate
confirmation
ioAgent receives back a data set result
The result of the XUpdate command or the XPath query
are sent back through the results processor
The XML database returns confirmation of an XUpdate
command or a result set generated by an XPath query
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioDB

ioAgent
Messages received and
and synchronous reply
Send Channel
Receive Channel
ioDB
Post Messages
In bound message queue
Message delivery queue
Out bound message queue
Transform
Data validation
Data Handling Rules
Messages sent and
and asynchronous replies
ioHub
Message receive queue
Transform
Data validation
Data Handling Rules
Learner
Record
Database
Add, update, delete and
query Student records
Create Asynchronous reply
containing original message
id, status, IMS-LIP content
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioDB Interface

Method

Description

setDatabaseConnection

Connect to the database that hosts the learner records repository

createCollection

Create a new collection in the learner records

setLearnerRecordCollection

Set the name of the collection that holds the learner records

setArchiveLearnerRecordCollection

Set the name of the collection that holds the archived learner records


getCollectionList

Get a list of collection names in the database

removeCollection

Remove a collection from the learner records database

addRecord

Add a new student to the learner record collection

updateRecord

Update an existing student in the learner record collection

getRecord

Get a particular learner record in the database

queryCollection

Query the learner record database using the supplied XPath

deleteRecord

Delete a student in the learner record collection

resetUniqueIDSequence

Reset the unique ioDB sequence number

setNumberofArchivedRecords

Set maximum number of archived entries for each learner record

rollbackRecord

Reinstate archived learner record as the current learner record

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

ioDB Interface Sequence Diagram

ioDB
Interface
ioDB
setDatabaseConnection(uri, driver)
createCollection(collectionName)
setLearnerRecordCollection(collectionName)
setArchiveLearnerRecordCollection (collectionName)
addRecord(urlXML)
updateRecord(urlXML)
Inbound
Message
Outbound
Message
Process Incoming Message
Post Asynchronous Reply
Post Asynchronous Reply
withdrawRecord(urlXML)
Post Asynchronous Reply
Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Current Activities


Example CSV files provided by Phosphorix


Create, Update, and Withdraw student records now
implemented


Transform CSV files to IMS
-
LIP


LSC to HESA Lookup


Secure messaging between ioAgents and ioHub


Stored in ioDB with unique Global ID


Asynchronous replies generated


Transform IMS
-
LIP files to CSV files on ‘round
-
trip’
completion


HESA to LSC Lookup

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Next Steps
-

Some Issues


Web Browser access to all software interfaces


Provide easy to use access to data via existing interfaces


Resolve a few incompatibilities between CSV and IMS
-
LIP
formats


Agree format for asynchronous replies


Implement Publish / Subscribe messaging


Implement rollback an archive logic in ioDB


Implement full query logic in ioDB


Testing of routing logic


Discuss logic for multi
-
hop messages


Complete full JMS messaging (currently used only as a
persistent messaging mechanism)


Obtain full set of messages and transform descriptions


Create and test full set


use TM HTML documentation for review


Maintain as regression test suite


Determine timescales for rollout


Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Q&A

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

Commercial in Confidence

© 2003 ETL Solutions Ltd. All rights reserved.

Further Information

John Bigerstaff

+44 (0) 1740 645868

+44 (0) 773 6404079

jab@etlsolutions.com


or

www.etlsolutions.com