DSP System Admin Guide - of /ubuntu

solidseniorServers

Dec 9, 2013 (3 years and 11 months ago)

182 views





DSP

System Administration &
User G
uide

Version 1.0




The

at the

in the


2012

Russ Profant

DSP

4/11/2012



DSP

Street Address 1

Street Address 2

City

Zip/Postal Code

Country

Tel: +xxx
-
x
-
xxx
-
xxxx

Fax: + xxx
-
x
-
xxx
-
xxxx

Last edited:
04 August 2012

Copyright © 2008
DSP
. All rights reserved.

No part of this publication may be r
eproduced, transmitted, transcribed, stored in a retrieval system, or
translated into any language, in any form or by any means, electronic, mechanical, photocopying, recording,
or otherwise, without prior written permission from [Company].

All copyright,

confidential information, patents, design rights and all other intellectual property rights of
whatsoever nature contained herein are and shall remain the sole and exclusive property of [Company]. The
information furnished herein is believed to be accurat
e and reliable.

However, no responsibility is assumed by [Company] for its use, or for any infringements of patents or other
rights of third parties resulting from its use.

The [Company] name and [Company] logo are trademarks or registered trademarks of [
Company].

All other trademarks are the property of their respective owners





Table of Contents
1

CONTENTS

1

Contents

................................
................................
................................
..........................

3

1.1

Revision History

................................
................................
................................
............

8

1.2

Reference Documents

................................
................................
................................
...

8

1.3

Distribution List
................................
................................
................................
..............

8

1

Introduction

................................
................................
................................
....................

9

1.1

Purpose and Scope

................................
................................
................................
.......

9

1.2

Audience and Assumptions

................................
................................
...........................

9

1.3

System Administration Tasks

................................
................................
........................

9

2

System Overview

................................
................................
................................
...........
10

2.1

The Big Picture
................................
................................
................................
.............
10

2.2

System Design Goals

................................
................................
................................
...
10

2.3

System Description

................................
................................
................................
......
11

2.
3.1.

Database Load Types

................................
................................
................................
...............

11

2.4

System Configuration

................................
................................
................................
...
14

2.4.1.

Rules For Setting up Parameters
................................
................................
..............................

15

2.4.2.

P
arameters Grouped By Process Name

................................
................................
..................

16

2.4.3.

Parameters Grouped By Task

................................
................................
................................
..

17

2.4.4.

Task Configuration File

................................
................................
................................
.............

17

2.5

Executable Files

................................
................................
................................
...........
17

2.5.1.

Executable Files In Windows

................................
................................
................................
....

18

2.5.2.

DSP P
rocesses As Windows Services

................................
................................
.....................

19

3

System Installation

................................
................................
................................
........
21

3.1

Linux Installation

................................
................................
................................
..........
21

3.2

Windows Installation

................................
................................
................................
....
22

3.3

Databases

................................
................................
................................
....................
22



3.3.1.

Sybase DSP Installation

................................
................................
................................
...........

22

3.3.2.

Oracle DSP Installation

................................
................................
................................
.............

22

3.3.3.

MySQL DSP Installation

................................
................................
................................
...........

23

3.3.4.

MSSQL DSP Installation

................................
................................
................................
...........

23

3.3.5.

Database Client Libs

................................
................................
................................
.................

25

3.4

Mifid Example Installation

................................
................................
.............................
25

3.5

Email Use In DSP

................................
................................
................................
........
26

3.5.1.

Linux Email

................................
................................
................................
................................

27

3.5.2.

Windows Em
ail
................................
................................
................................
..........................

27

4

System Administration

................................
................................
................................
.
28

4.1

A Sample Approach To Building Data Streams In DSP

................................
................
28

4.2

General Rules For Using DSP

................................
................................
......................
30

4.3

Common Parameters In DSP

................................
................................
.......................
31

4.4

DSP Exit Codes

................................
................................
................................
...........
37

4.5

DSP Set Funcionality

................................
................................
................................
...
38

4.5.1.

File Set Functionality

................................
................................
................................
.................

38

4.5.1.

Data Set Functionality

................................
................................
................................
...............

38

5

DSP Process

................................
................................
................................
..................
39

5.1

DSP System Date

................................
................................
................................
........
39

5.2

DSP_PROCESS Table

................................
................................
................................
40

6

DSP System Variables

................................
................................
................................
..
43

6.1

DSP_SYSTEM_VARIABLE Tabl
e

................................
................................
................
44

7

DSP File Transfer

................................
................................
................................
..........
45

7.1

File Transfer Functionality

................................
................................
............................
45

7.2

Transfer Setup

................................
................................
................................
.............
46

7.3

File Naming Within DSP

................................
................................
...............................
46

7.3.1.

Overriding Source File Name Parameters

................................
................................
................

47

7.4

Transfer Tasks

................................
................................
................................
.............
47

7.5

Transfer Types & Operations

................................
................................
.......................
48

7.6

File Encryption

................................
................................
................................
.............
49

7.7

Parallel Transfer

................................
................................
................................
...........
50



7.7.1.

Transfer Timeout

................................
................................
................................
.......................

50

7.7.2.

How To Setup Parallel Transfer

................................
................................
................................

50

7.8

File Compression & Decompression

................................
................................
............
51

7.9

Encrypted Communication
-

SSH & SSL

................................
................................
......
52

7.10

SSH On
Windows

................................
................................
................................
........
52

7.11

Processing Date & Status Check Functionality

................................
.............................
53

7.12

DSP_FILE_TRANSFER Table

................................
................................
.....................
54

7.13

Transfer Command
Line Options

................................
................................
.................
60

7.14

DSP_PROCESSED_FILE Table

................................
................................
..................
62

7.15

Practical Transfer Tips

................................
................................
................................
.
63

7.16

Transfer Examples

................................
................................
................................
.......
63

7.16.1.

Example 1: Download multiple files requiring two types of internal processing

.......................

63

7.16.2.

Example 2: Download a file from a website and archive it

................................
.......................

65

7.16.3.

Example 3: Download a file from a website but only if the file is changed

...............................

66

8

DSP File Parser

................................
................................
................................
..............
68

8.1

Types of Parsing

................................
................................
................................
..........
68

8.2

Parser Setup

................................
................................
................................
................
68

8.3

Parser Setup Tables

................................
................................
................................
....
71

8.3.1.

D
SP_FILE_PARSER Table

................................
................................
................................
......

71

8.3.2.

DSP_FILE_FIELD Table

................................
................................
................................
...........

74

8.3.3.

DSP_FIELD_MAPPING Table

................................
................................
................................
..

76

8.3.4.

DSP_SPLIT_FIE
LD Table

................................
................................
................................
........

77

8.3.5.

DSP_INTERNAL_FIELD Table

................................
................................
................................
.

78

8.3.6.

DSP_TABLE_FILE Table

................................
................................
................................
..........

79

8.4

Parser Command Line Options

................................
................................
....................
80

8.5

Practical Parsing Tips

................................
................................
................................
...
80

8.6

Parsing Examples

................................
................................
................................
........
81

8.6.1.

Example

1: Process financial data full and delta files

................................
...............................

81

8.6.2.

Example 2: Process MI
F
ID data file

................................
................................
.........................

85

8.6.3.

Example 3: Process Data files from a zip file

................................
................................
...........

88

9

DSP File Loader & Table Updater

................................
................................
.................
89

9.1

Update Types Available

................................
................................
...............................
89

9.2

Loader & Updater Table Definitions

................................
................................
..............
90



9.2.1.

DSP_FILE_LOADER Table

................................
................................
................................
......

90

9.2.2.

DSP_TABLE_UPDATER Table

................................
................................
................................

91

9.3

Loader & Updater Command Line Options

................................
................................
...
95

9.3.1.

Loader Command Line Options

................................
................................
................................

95

9.3.2.

Updater Command Line Options

................................
................................
..............................

95

9.4

Loader & Updater Examples

................................
................................
........................
96

9.4.1.

Example 1: Load parsed financial equity data files
................................
................................
...

96

10

DSP File Generator

................................
................................
................................
........
99

10.1

DSP_FILE_GENERATOR Table

................................
................................
..................
99

10.2

DSP_FILE_OUT_FIELD Table

................................
................................
...................

101

10.3

Generator Example
s

................................
................................
................................
..

102

10.3.1.

Example 1: Generate File From Incoming Files

................................
................................
.....

102

10.3.2.

Example 2: Generate File From Database

................................
................................
.............

104

11

DSP Process Manager

................................
................................
................................

106

11.1

Extended Processing

................................
................................
................................
.

106

11.2

Parallel Processing

................................
................................
................................
....

107

11.2.1.

Transfer

................................
................................
................................
................................
...

107

11.2.2.

Parser

................................
................................
................................
................................
......

107

11.2.3.

Loader

................................
................................
................................
................................
.....

108

11.2.4.

Updater

................................
................................
................................
................................
...

108

11.2.5.

Generator

................................
................................
................................
................................

108

11.2.6.

Process Manager

................................
................................
................................
....................

108

12

Custom Processing

................................
................................
................................
.....

112

12.1

DSP_CUSTOM_CMD Table

................................
................................
......................

112

12.2

Custom Command Example
................................
................................
.......................

112

13

Frequently Asked Ques
tions

................................
................................
......................

114

14

Known Issues

................................
................................
................................
..............

114

14.1

Transfer
................................
................................
................................
......................

114

14.2

Parser

................................
................................
................................
........................

114

14.3

Loader

................................
................................
................................
........................

114

14.4

Updater

................................
................................
................................
......................

114

16.4.1.

Minimum file count not working In Table Updater

................................
................................
...

114

14.5

Generator

................................
................................
................................
...................

115



16.5.1.

Generating File With Split Parsing In Parallel Processes Mode

................................
.............

115

16.5.2.

Date Field Not Properly formated

................................
................................
...........................

115

14.6

Process Manager

................................
................................
................................
.......

115

15

Appendix

................................
................................
................................
......................

115

15.1

References

................................
................................
................................
.................

11
5

15.2

Glossary

................................
................................
................................
.....................

115



Page
8

of
115


Document History

Paper copies are valid only on the day they are printed. Contact the author if you are in

any doubt about the acc
u-
racy of this document.

1.1

REVISION HISTORY

Revision Number

Revision Date

Summary of Changes

Author





















1.2

REFERENCE DOCUMENTS

Please see the following documents for more information:

Document Name

Version

Author
















1.3

DISTRIBUTION LIST

This document has been distributed to:

Name

Position

Company

Action






















Page
9

of
115


1

INTRODUCTION


This System Administrator Guide provides information on how set up, configure, and deploy the
DSP system,

t
he Sy
stems Administration Guide serves the purpose of an Operations Guide
to the DSP system.
.


1.1

PURPOSE AND SCOPE


The purpose of this document is to give detailed description of all DSP components as well as the full scope of
possible usage

with deatiled instr
uctions on how to setup specific types of data streams.


1.2

AUDIENCE AND ASSUMPT
IONS


The intended audience is composed
of managers, developers, support personnel, business analysts and an
y-
body else who might come into contact with the DSP system
.

However the

primary purpose of this guide is to
assist the support/development people in setting up new streams.


1.3

SYSTEM ADMINISTRATIO
N TASKS


There isn’t much to administer in DSP and that is so by design. The primary
direct
human
usage of DSP would
be for data stre
am
setup

and the subsequent support of these streams once in production. Though strictly
speaking once a stream is configured and moved to production no further maintenance
should be

required
other than implementing
stream changes and enhancements as busin
ess requirements change.



















Page
10

of
115



2

SYSTEM OVERVIEW


2.1

THE BIG PICTURE


W
hat

is

DSP
?

DSP
or Data Stream Processor
is a “liquid” platform for distributing data across an enterprise and beyond.
T
he term “liquid”

has several meanings

in this case
. It’s
liquid
in the sense that it can be shaped
in
many

different
way
s

to fit user’s needs
.

I
t’s also “liquid” in the sense used in the financial business in that it can
be converted/made to produce results quickly. If you need to setup a new data stream it migh
t take days
or weeks on a different ETL
-
type system in DSP it can take an hour or two.

And it’s liquid in the most obvious sense in that it allows creation of data

streams

.
DSP was developed
to enable an enterprise to create its own data

waterways


to d
eliver

the RIGHT DATA at the RIGHT TIME
to the RIGHT place in the company.

This is the “vision” behind the
system;

the details are given in the rest of this manual.


2.2

SYSTEM DESIGN G
OALS


Before we get into the details of the system, what it does and how i
t does it, it’s important to describe the
overall philosophy which underpins the system design

and the business goals of the system.

Unlike many similar systems

this is not a design
-
centric
system; this is a hard
-
core back end system. It cu
r-
rently doesn’t
even have a front end, though it will in the future. This is because the front
-
end centric sy
s-
tems often waste too much effort

on

and pay

too much attention to the design (GUI)

functionality at the
expense of the actual processing capabiliti
es. In this bus
iness the design is the easy part; to design a
job
/data stream

can take anywhere from an hour

to a few days but the
stream

will
probably
run in pr
o-
duction for years to come. Therefore the job itself and the support it will require over the years should
be
a lot more important in the overall scheme o
f things within a system like this. That’s why DSP is designed
to fit the needs of the support personnel first and foremost rather than those of the develo
p-
ers/designers.

Secondly,

the GUI often hides the process
ing complexity. This is
fine

when everything works
perfectly;

d
e-
signer/developer just understands enough to

manipula
te the icons on the screen to create a

job/interface

etc
.

Unfortunately things
don’t always

work perfectly and when they go wrong and need t
o be fixed the
hidden complexity makes it difficult to diagnose and
correct

the problem. There is no hidden complexity
in this system.
What you see is what you get. It doesn’t mean the system is simple; it can easily get very
complex but it does
n’t

hide it
s complexity if and when it gets complex.

Beyond the overall aim at creating a system that is easy to run and support there is a number of specific
goals that the design tries achieve.



SEPARATE RAW DATA PROCESSING FROM BUSINESS LOGIC

DSP is all about

raw

data processing;
it’s completely agnostic as to the meaning of the data.

There isn’t even any functionality that
would allow one to give meaning to
any of the data
prcoseed
within DSP.




Page
11

of
115


However DSP does allow for a business logic plugin from the outside i
n the form of a stored pr
o-
cedure to be run after all the raw data has been handled and updated.



NO NEW CODE
: a

new data stream should not require new code writing or
code generation (
at
least most of the time
)
; the full
stream
processing life
-
cycle can be
setup simply through the sy
s-
tem
service

parameters



CUSTOMIZATION
: This is an amendment to the
previous

rule; i
f new
custom
code has to be
cr
e-
ated

to handle a very specific issue the system
must
be
able to be
extended by
the new

code to
handle
such a

situat
ion



SIMPLE MAINTENANCE
:
Simple to install, maintain and use
; an average technical user

like a d
e-
veloper

should be able
to
install maintain and use the system just by looking at the various
scripts and options available with
in the system

without even consul
ting this manual



POWERFUL PROCESSING OPTIONS
:

most processing tasks can be run in parallel mode and in
some situations a specific parallel mode can be selected



EASE OF USE
: Easy to specify parameters, parameters are generally case
-
independent, most p
a-
ramet
ers don’t have to be set because they default to the most logical values for the specific si
t-
uation, wildcards are allowed as part of internal data filters, lists are allowed throughout the sy
s-
tem, regular expressions are allowed in many different places w
ithin the system
; even lists of
regular expressions are allowed in certain parameters



DIFFERENT
WAYS

OF ACCOMPLISHING THE SAME OR SIMILAR RESULT
: DSP gives users choices;
usually there is more than one way of accomplishing the end result; sometimes the opt
ions need
to be tested to determine the best overall solution; sometimes it’s just a matter of preference



2.3

SYSTEM DESCRIPTION


As we said above
DSP
is an infrastructure platform for processing large sets of
disparate
data
in
text

file
format
(other
file f
ormats

will be supported

in the future)

and making specific pieces of data available
within or without the enterprise
.

DSP

enables the enterprise to create sophisticated internal data streams to carry data from one place to
another all in an attempt to en
able the business to have




2.3.1.

DATABASE LOAD TYPES


DSP

is
at its core

just a very

sophisticated database loader, sophisticated being the key word.

From the
operational

perspective t
here are 2 basic ty
pes of data loads; there is an initial full load and then
there are delta updates.
T
he full load
usually
occurs
at specific

longer

intervals

primarily between an e
n-
terprise a
nd its outside vendors, suppliers, customers. Delta updates are then propagated th
roughout the
enterprise. This is just

a

very generic differentiation, obviously there can be and are partial updates b
e-

Page
12

of
115


tween an enterprise and its outside business partners just as there can be full internal loads between va
r-
ious internal systems within an

enterprise.

DSP

supports both

types of data loads:


1.

Full Table Load

Full Table Load is the default for the system. It means that
DSP

will process data files on the assum
p-
tion that each
target
table
acts like a holding container for the full data
set

in th
e file
, meaning each
column
in the table
has a matching column in the
data
file

at least most of the time
. If there is no
matching column

data

in the file the system will create it with
default (empty)

value.


When doing full table load the parser builds t
he load file to look exactly like the target table with the
same columns and in the same order. In this case no column names are necessary at the top of the
load file because the load file maps exactly to the target table.


2.

Partial Table Update

The other t
ype of
DSP

processing is the Partial Table Update. In this case
DSP

updates only the fields
defined and mapped between the table and the file. If a field is not mapped then
DSP

will not supply
the default value for it.

With the partial update the load file

and hence the temp table may not look anything like the target
table, they just contain a few columns from the target table. Therefore in this case load file must have
the names of the columns in the first line so that the updater knows which columns to i
nclude in the
temp table when creating

it
.


The core syste
m
technical
architecture is rather simple in that

DSP

is

comprised of
only
3 types of
comp
o-
nents
:

1.

System configuration files

2.

System database

3.

Executable files

The only necessary
part
is

the
executabl
e
files. The database is
optional but
it is necessary if the user
wishes to make full use of the
DSP

functionality.
The configuration files are also optional but highly re
c-
ommended.

There are six executable files each performin
g a specific
service

after wh
ich the pr
ogram is named (there
is a second part to each name that contains build number and package type info ‘lin’ for Linux, ‘no
-
ora’ for
no Oracle support etc)
:

1.

dsp
FileTransfer

2.

dsp
FileParser

3.

dsp
FileLoader


4.

dsp
TableUpdater


5.

dsp
FileGenerator

6.

dsp
ProcessM
anager

Each program contains its complete functionality within itself and doesn’t need any other program to
work with the exception of
SSH

for encrypted communication (for
dsp
FileTransfer)
.

That means that each

Page
13

of
115




program is completely independent and can be
run on its own. The glue that holds all the programs t
o-
gether is the system database. It makes the sy
stem more functional as a whole
.

Below are several pictorial illustrations of DSP functionality.

The first picture shows
DSP from 2 opp
o
sing but compliment
ary perscpetives: the perspective of a file
consumer and the perspective of a file producer. DSP can act as both within the same data stream.




Download


Decrypt


Unzip




Validate


Enrich


Transfor
m

Parse


Insert


Update


Delete

Load


Select


Validate


Enrich



Data
base

Files

The lifecycle of a fi
le

PRODUCER

Perspective


CONSUMER

Perspective


Page
14

of
115



The illustration below shows how DSP components
can be used independently to perform specific

task in di
f-
ferent
stag
es of

file lifecycle. The lack of space in the picture between dspFileLoader and dspTableUpdater si
g-
nifies the close relationship between these 2 components.





Finally the the last picture shows

how the process manager can be used

to coordinate the complete pr
o-
cessing in DSP.





2.4

SYSTEM CONFIGURATION


Each program
in
DSP

needs a minimum set of parameters to perform its function. There are three places
where the parameters can be specified.


1.

Command line
: these

parameters

take the highest precedent and

override the database and the c
o
n-
fig
uration

file settings

2.

Config
uration

file
s
: these parameters

only override the database and
in turn
can be overridden on
the command line

dspFileTransfer

dspFileParser

dspFileLoader

dspTableUpdater

dspFileGenerator






DSP

full process mapping to a file lifecycle

DSP

single
process

mapping to a file lifecycle


Page
15

of
115


3.

Database
: these param
eters can be overridden in the
c
onfig
uration

file and on the command line


The full
list

of
possible
parameters is available on the command line using the ‘
-
help’ option.
Also

if the pr
o-
gram doesn’t recognize a specific parameter from the command line it will print th
e list of parameter
s and exit
with

failure.
There is a set of common parameters that all the programs in
DSP

share. The complete list is
shown in the Administration section below.

The system looks for the main configuration file in the environment variable $
DSP
_CONFIG by def
ault.
Optio
n-
ally

each program offers ‘configFile’ option where the configuration file path can be specified.

Typically in ‘nix’ systems the variable would be set on the command line or another configuration file with a
command such as (in bash shell):


exp
ort
DSP
_CONFIG=/var/opt/
dsp
/
c
onfig/
dsp
.config

In Windows environment variables are set in the ‘Computer’ section which is usually in the ‘Start’ pane
l

in the
bottom left corner of the screen. By right
-
clicking on the ‘Computer’ and selecting ‘Properties’ i
tem one can
get to the ‘System’ panel and then to the ‘System Properties
-
>Environment Variables’.

There are two main ways of grouping parameters within the configuration file or files. They can be grouped by
the processes or they can by grouped by task typ
es.

2.4.1.

RULES FOR S
ETTING UP PARAMETERS


There are
a
few rules that apply to parameters depending on where the
parameters are
specified.



Some parameters are common to all
DSP

programs but most are specific to each program and
the task
s

it performs



All param
eters setup in the system database

can be specified in the configuration file or on
the command line
, and

thus overriding the parameter if setup in the database



But not all

available parameters can be setup in the database. A few parameters that are e
i-
ther

too obscure or too generic can only be specified in the configuration file or on the co
m-
mand line



Parameters specified on the command line must be preceded by a dash ‘
-
‘ or a double dash






The value on the command line must follow either an empty space
or an equal ‘=’ sign



The value in the configuration file can only follow the equal sign but not the empty space



Parameter

names are not case sensitive
.

This goes for all platforms not just Windows.



Parameter values may or may not be case sensitive. The gen
eral rule is that only

file

names/paths

logins, passwords and such

are case
-
sensitive, other values are not. If the pr
o-
cessing fails check the parameter case and change it if necessary.



List parameters, e.g. parameters that allow more than one value must b
e delimited with e
i-
ther a comma or a semi
-
colon with no empty space in
-
between



Boolean parameters such as ‘verbose’ or ‘force’
must

be specified on the command line wit
h-
out any value, the presence of the name indicates the parameter is set to true, the ab
sence
indicates the parameter is false.


Page
16

of
115




In the configuration file
s

Boolean parameters need to have a value of ‘Y’ or ‘y’ to be true, the
absence of the parameter indicates the parameter is false.



If the system database doesn’t use the default port the por
t needs to be specified at the end
of the server name following
colon character (:)

like this: d
bServer=mydb:port
.
The same goes
for any of the target databases setup in the
DSP
_FILE_LOADER table.

2.4.2.

PARAMETERS GROUP
ED

BY PROCESS NAME



The configuration fil
es offer flexibility in setting up jobs and processes. The simplest use of the
config
u-
ration

file is to use a single global file and have a section for each process that is setup

in it
. If the file
gets too big and unwi
eldy then it can be split into two

or

more files. The correct file can then be spec
i-
fied either in the
DSP
_CONFIG variable or as a command option ‘configFile’

to each program
.

The
DSP

configuration

file

looks like a typical
U
nix
configuration
file
. It has a section header name in the
square b
rackets and below the header are the attribute and value pairs that belong to the section.

For
instance a

configuration
file section may look like this
:


[equity
_options]

dbServer=LOCALHOST

dbUser=
DSP

dbPassword=
DSP

dbType=ORACLE

dbSchema=
DSP

processName=
EQUITY_OPTIONS

ftpType=binary

retryLimit=3

pwdCmd="type c:
\
perlProjects
\
DSP
\
config
\
dsp
"

ssh=plink

timeout=300

verbose=Y


Here
‘equity
_option
s’ is the section header (name). I
t is
also
the ‘configSection’ option in each pr
o-
gram. Below the section are the
op
tions/parameters/attributes

that apply to the process globally.

The attribute in the configuration file

can be any
parameter valid for any
DSP

executable program such
as the transfer, parser, loader etc. When the program runs it selects the parameters appr
opriate for its
run. The parame
ters in the configuration

file must be fully spelled out. This is different from the co
m-
mand line where the parameter needs to be only long enough to be uniquely identified. For instance
the
transfer as well as the parser has

the ‘destinationDirectory’ parameter. This parameter can be spe
c-
ified on the command line for the transfer as ‘
-
destinationD

c:
\
temp
’. This is sufficient to distinguish it
from another parameter ‘destinationFileSuffix’. However

when this parameter is in t
he c
onfig
uration

file it cannot be given as ‘destinationD=c:
\
temp’

because it will be disregarded, it must be fully spelled
out as ‘destinatio
n
Directory=c:
\
temp’ to be processed.


Every section must have the ‘processName’ parameter set because it ties the
section to the specific
process setup in the database unless ‘noDbUse’ option is specified. The same goes for

the parameters
that identify

the system database: dbServer, dbUser, dbPassword, dbType, dbSchema. All these must

Page
17

of
115


be set in the configuration file

so that the programs can connect to the system database. Listing any
other option in the configuration file is optional.

2.4.3.

PARAMETERS GROUPED B
Y TASK


Another way of using the configuration files is by grouping all related parameters for a specific task
suc
h as transfer or parser into a task section. The task section names are hardcoded as follows:




dsp
FileTransfer

=>
DSP
_FILE_TRANSFER



dsp
FileParser

=>
DSP
_FILE_PARSER



dsp
FileLoader

=>
DSP
_FILE_LOADER



dsp
TableUpdater

=>

DSP
_TABLE_UPDATER



dsp
ProcessManager =>
DSP
_PROCESS_MANAGER



dsp
FileGenerator =>
DSP
_FILE_GENERATOR


The
service
names on the right must be used as headings in the configuration fil
e for all tasks of the
same type

such as all transfers within the whole

system.

This allows setting up the global parameters
per task.

For example global parameters for all transfers might look like this when grouped under the
transfer heading:

[
DSP
_FILE_TRANSFER]

verbose=yes

ssh=ssh

maxRetry
=5

2.4.4.

TASK CONFIGURATION F
ILE


Stil
l another option is to define a configuration file specific to the task in the global configuration file
and use that file to list task options. To setup a separate task configuration file
for transfers you need to
have a heading called ‘config’. Below th
e heading you need to specify the task name as described
above and assign it the name of its configuration file

like this:

[config]

DSP
_FILE_TRANSFER=c:
\
perlProjects
\
dsp
\
configdsp
FileTransfer.config

The task configuration file needs to be setup just like t
he global configuration file. You need to specify
headings and
then
list the options below the headings.


2.5

EXECUTABLE FILES



Page
18

of
115


Currently there are
six executables each named after the
service the program

performs. There are no add
i-
tional libraries, DLLs, depe
ndent files

or what have you
.

E
ach

executable file has

all its functionality compiled
into itself.
Below is the full list of executable files.


#

Software

Description

1

dsp
FileTransfer

This program performs a
ll the file transfers required in and out of th
e
network or sub
-
network. It also performs all the related task if needed
such as compression and encryption.

2

dsp
FileParser

The parser
parses files, in other words
it formats the raw data file
s into
file
s that
specific database engine
s

can work with.

3

dsp
FileLoader

The file loader doesn’t actually load
慮XWU楮朻

it’s more like

m慮慧敲⸠
䥴慮慧敳⁴Ue 慤楮朠gf⁴Ue p慲VeT⁦楬eV⁣ e捫楮朠gU慴⁴Ue⁳灥捩f楣i
瑡b汥⁴o⁢e 慤eT⁩ o琠be楮朠gpT慴eT⁡汲l慤X⁢X 慮o瑨e爠
䑓M



捥VV⸠

4

TVp
T慢汥UpT慴er

TUe⁵
pT慴e爠汯lTV⁴Ue⁦楬eV⁩ Wo⁴emp⁴慢汥V⁡nT⁵VeV⁴UeVe 瑯⁰e牦o牭
瑨e⁵pT慴eVn⁴Ue
瑡牧rW

瑡b汥V.

5

TVp
䙩FeGene牡瑯r

TUe⁧敮e牡瑯爠捲c慴eV⁤慴愠f楬eV
f牯m

T慴愠楮⁴Ue⁤慴慢慳aV o爠o瑨e爠
T慴愠f楬iV

6

TVp
偲P捥VV䵡n慧er

TUe慮慧e爠捡c⁢e⁵VeT⁴o⁲un⁴Ue
瑲WnVfe爬⁰慲Ve爠慮T 慤e爠慳ne
捯浰汥Weob.


2.5.1.

EXECUTABLE FILES IN
WINDOWS



Windows

is different from all the other platforms in that it

has a pair of executables for each task

e
x-
cept for the table updater
. One

file

is a regular executable and the ot
her one

is Windows service ex
e-
cutable. Windows service executable file names end with ‘_WinSvc.exe’. They can be registered and
run as Windows services.

Because t
he table updater

is managed by the file loader
it makes no sense to
run it as a service and th
erefore a service exe
cutable

file is not provided.

Normally even in Windows environment, the preferred setup
is

to use a job scheduler with regular
Windows executables. However if a job scheduler is not available for any reason then the way to go is
to use

the service executables
. Windows services are like

UNIX

daemon
s

except they don’t need to be
run from a command line window. It’s possible to run the regular executables as daemons from the
command line using ‘
-
daemon’ op
tion but the com
mand window must
stay open for as long as the
daemon is running
.
The second difference is that when run as service by default Windows uses special
‘local’ account/login whereas if run from a temrinal window the service runs under the user login
.

IN

Page
19

of
115


normal circumstances thi
s is not significant.

Essentially Windows services functionality is there just to
conform to Windows way of doing things.

2.5.2.

DSP

PROCESSES AS WINDOWS

SERVICES


Windows service executables offer 2 special parameters in addition to the regular options/parameter
s.
They have ‘install’ and ‘remove’ parameters to be used to install and/or remove each service.

Services are installed per
each
process
record
as defined in the
DSP
_PROCESS table

and service type

(transfer, parse etc…). This means that the user must insta
ll
not just one universal transfer process for
all file transfers
defined
, but one specific file transfer service for each process defined within the sy
s-
tem.
This allows the user flexibility to define many different processes each with its own
run
param
e-
t
ers
.

To install a transfer service on the com
mand line
you can
type

(Windows is not case
-
sensitive
)

where
‘instal
l’ or ‘remove’ must be the first parameter after the executable:


C:
\
dsp
\
bin>
dsp
FileTransfer_WinSvc
-
install
-
configSection
SECTION_NAME


w
her
e

’SECTION_NAME’

is the section header for the process in the
c
onfig
uration

file. This will install a
Windows service called ‘
SECTION_NAME
-
Transfer’

and
with the
description ‘
SECTION_NAME File Tran
s-
fer Service’
.

When installing a service only a handful of
parameters
are

allowed
. These are installation
parameters,

the full service parameters are available to the actual service when installed but they cannot be spec
i-
fied on the command line, they must be either in the configuration file or configuration table

of the sy
s-
tem database. This is to make the service command line easily readable and not cluttered.

The

installation parameters are these:


Parameter

Description

configSection

The name of the configuration section in the
configuration file
. This is requi
red in order to
compose the service name.

logFile

The full path to the log file

logFileAppend

The flag indicating to append to an existing log
fil
e if available


Any other parameters will be rejected when entered while using ‘install’ parameter to insta
ll the
Windows service.

The service may be installed but may not run as setup.

Once the Windows service is installed it can be started by typing a command similar to this:


Page
20

of
115



C:
\
dsp
\
bin>net start SERVICE_NAME


Where

SERVICE_NAME is derived from the section n
ame plus
one of the strings

-
Transfer’, ‘
-
Parser’,

-
Loader

, ‘
-
Manager’ and ‘
-
G
enerator’

as explained above
.
Once installed services can also be co
n-
trolled
(run, stopped)
from the Windows Services GUI. The GUI is available through different routes
on dif
ferent Windows platforms. In Vista
for instance the
GUI is under ‘Control Panel
-
>Administrative
Tools
-
>Services’
.

To remove a Windows
DSP

service
,

type on the command line a command similar to
the one
below
only change the
e
xecutable file and supply an act
ual ‘SECTION_NAME’ as needed:


C:
\
dsp
\
bin>
dsp
FileTransfer_WinSvc
-
remove
-
configSection SECTION_NAME


The command will first stop the service if it is running and then remove it. Normally you would r
e-
move a service only to change the installation paramete
rs or to stop it permanently.




Page
21

of
115


3

SYSTEM INSTALLATION


First you need to download the appropriate package from the web site. Read the explanation on the web site
about t
he differences between packages but basically the only difference between packages on the

same pla
t-
form is the support for Oracle.

Packages with ‘no
-
ora’ in the name do not support Oracle. The reason for the two packages is that for Oracle
support in DSP you have to have Oracle client installed and in your ‘LD_LIBRARY_PATH’ variable. Without t
hat
DSP will start, c
omplain about missing Oracle cli
e
n
t and die immediately.

If you don’t use Oracle
by using ‘no
-
ora’ DSP package you don’t need to install Oracle client.


3.1

LINUX INSTALLATION




Once you downloaded the preferred package you need to instal
l

it
. The installation consists of nothing more
than unzipping the package you downloaded

to a ‘dsp’ directory which of course you must create first.



cd /opt


mkdir dsp

cd dsp

mv download_directory/ dsp
-
lin.tar.gz .


tar xzvf dsp
-
lin.tar.gz

or tar xzvf
dsp
-
lin
-
no
-
ora.tar.gz


This will
create
directories below
‘dsp’
: bin, config, db and docs.

The ‘bin’ directory co
ntains all the executable
files.

‘docs’ contains the admin guide and ‘db’ contains the database scripts.


config


has a sample config file ca
lled ‘dsp.config’. Also in the config
directory

you
will
find an environment file
called
‘dsprc’. You should use this file to setup all the environment variables needed for a run. This file can

be

moved to your home directory and prepended with a dot ‘.’
as is usual in Linux
/Unix

and it should be sourced
before every
DSP run.

‘dsprc’ sets up the DSP service components as environment variables, this is required for DSP to work

when
running through Process Manager

and using DSP loader
.

The reason for this s
etup is
that
the component

names (the version numbers in the name)

change
frequently

and independently of each other so they have different version numbers in their name
s
. This is an easy way to
get the latest service component setup

for use by DSP.

The ne
xt
step

is to install the database. The db installation steps depend on the type of db you want to use as
your DSP system db which most likely will depend on what your company is currently using as its main db pla
t-
form. In general unless you want to run t
housands of processes on DSP, which you can but probably will not
have the volume for it, it’s not worth installing a new instance for DSP. Just use one of your current instances
and if your usage and file volume grow you can always switch to a new instanc
e.



Page
22

of
115


In either case you will probably need help from one of your db ad
mins if you are in a large shop;

in
smaller
companies you
will have to do all the necessary ground
-
work by
yourself
.



What follows are the specific steps needed to install a specific ty
pe of DSP db.


3.2

WINDOWS INSTALLATION




Download the Windows zip file and unzip it to a directory of your choice such as “c:
\
dsp”.

The critical thing for Windows is to set the environment variable ‘TZ’ before running any of the DSP comp
o-
nents. ‘TZ’ is a tim
ezone environment variable without which DSP will fail with weird error because the system
will not be able to determine the ‘date’ value it needs.

For North America Eastern time zone use the value ‘EST5EDT’. For a list of valid values search the Internet.

If
you find that the system time is off by some hours you need to try a different value.



3.3

DATABASES




3.3.1.

SYBASE
DSP
INSTALLATION


We assume that you already have a Sybase instance running and configured. Now you need to create
a new DSP database in it.

You don’t have to call it DSP but why wouldn’t you?

The script for Sybase installation is in the in the ‘dsp/db/sybase’ directory. The script contains co
m-
mands to create
the
complete db

with all its objects

other than the data devices. The script assumes
2

existing

data devices called ‘dspdata’ and ‘dsplog’

with at least 200 and 100MB free space on them

respectively
.

If you want to use other devices just replace the two device names in the script with
your own names. Alternatively you can create DSP db ma
nually and use the script to create all the
other objects. In that case you will need to comment out the database creation statements in the
script or just leave them in and disregard the errors you get when running the full script.

You should also change

the DSP login password either in the script

on the line beginning with
‘sp_addlogin…’

before you run it or afterwards

in the database
.

The default password is ‘dspLogin’.

To create DSP db and all its objects you can run the following commands as a Sybase
admin (‘sa’ lo
g-
in):




cd
DSP_INSTALLATION_DIRECTORY
/db/sybase



isql

S
YOUR_SERVER

U sa

P password

i

CREATE_DSP_SYB.sql


You can disregard any warnings you may get but you need to check and deal with errors.

3.3.2.

ORACLE DSP INSTALLAT
ION


Page
23

of
115



With Oracle you n
eed to creat
e a DSP login and give it tablespace

first.

Obviously you need all the
necessary permissions including table creation

with your login

as well.

Once you have th
at go to the O
racle directo
ry and run the Oracle script through sqlplus

or any ot
her
cient utility you are using such as Toad, DBArtisan etc.


cd DSP_INSTALLATION_DIRECTORY/db/oracle

sqlplus login/password @CREATE_DSP_ORA.sql


3.3.3.

MYSQL DSP INSTALLATI
ON


With MySQL the full installation is in the setup file
DSP_INSTALLATION_DIRECTORY/db/mysql/
CREATE_DSP_MYSQL.sql. You can run this file through
your favority MySQL client.

Don’t forget to change the password before or after you run the script.

3.3.4.

MSSQL
DSP
INSTALLATION


Obviously MSSQL is not available on Linux. So you cannot have DSP system db run
ning MSSQL in
Linux environment.
You can still use
MSSQL
as your DSP db b
ut it must run on Windows. Y
ou can also
have MSSQL

as your target production data
base

or databases.

With MSSQL you need to use ODBC to access and process MSSQL data from your Linux e
nvironment.
When

you use ODBC you need to specify the ODBC DSN in the DSP_FILE_LOADER.ODBC_DSN. DSP
will use the specified ODBC DSN to connect to MSSQL.


You still need to specify the login anme and password in DSP even though it may already be specified
i
n the ODBC DSN file.

To
setup ODBC connection from Linux to Windows
you have to install and configure 2 packages:

1)

Unix ODBC

2)

FreeTDS

client library


Unix ODBC

Unix ODBC is available for download at
www.unixodbc.o
rg
. You need to follow instruction on the
web site on how to ins
tall and configure the package such as
www.unixodbc.org/doc/FreeTDS.html

There are 2 ini files that need to be created/updated.

Th
e ‘/etc/odbc.ini’ holds the actual dsn info
with the dsn name being the header in square brackets, it can look like this:


[MSSQL_DSP]

Description=My ODBC connection


Page
24

of
115


Driver=FreeTDS

Servername=SERVER

Database=DSP

UID=DSP

PWD=dspLogin

TDS_Version=8.0


The se
cond file is ‘/etc/odbcinst.ini’ and it h
o
lds the driver details:


[FreeTDS]

Description=FreeTDS driver

Driver=/usr/lib/odbc/libtdsodbc.so

Setup=/usr/lib/odbc/libtdsS.so


There are other parameter
s

that can be setup in both files, they are not necessary bu
t may be useful.
See documentation.


FreeTDS

FreeTDS is a free ODBC driver for Sybase and MSSQL. It’s available at
www.freetds.org

website. As
with ODBC you need to consult the website on how to install and configure

it.

There is also a commercial version of MSSQL ODBC driver available from Easysoft. It may be a touch
easier to configure but other than that it doesn’t offer other advantages over FreeTDS.

One issue worth mentioning

in this setup (unixODBC/MSSQL)
is tha
t it seems that if the physical
server where MSSQL resides disconnects from the network
even briefly
FreeTDS driver is not able to
reconnect from Linux to MSSQL again unless the Windows server is rebooted.

You may also need to setup FREETDSCONF variable
pointing to the configuration file so that it can be
found
. You can do this in ‘dsprc’ file.

There is a commented
-
out line for such a variable; you can u
n-
comment and update it if needed.

The ini file is

/
usr/share/
freetds/freetds.conf’ and it can look lik
e this:


[SQL_SERVER
]

Host=IP address

port=1433

tds version=8.0

client character set=ISO
-
88
59
-
1



Page
25

of
115


DSP expects a client tool MSSQL installation on the Windows server where it runs. Specifically it’s
looking for ‘bcp’ utility to quickly load a table file into

a temp table. If DSP runs on a machine where
bcp is not available it will try to use “BULK INSERT” command on the MSSQL server. This command
requires special server
-
side “bulk insert” privilege to run. This has to be granted to the mssql login
used by the

DSP and setup in the DSP_FILE_LOADER table.


Moreover for MSSQL to be able to access the file through the network (as we assume that this is not
a local file to the MSSQL but it must be a local file to the machine where DSP is running
) there must
be `shar
e`permissions setup for the directory where the table file is.

But that is not enough. On the machines we tested (windows 7) the only way to make this setup
work is like this:

1)

Change the login account under which the MSSQL server runs; this is done through

``services``
screen in Windows. Normally MSSQL runs under `SYSTEM`login on the server. This will not work
for accessing network files. The MSSQL login must be changed to a regular user login

instead of
SYSTEM
.

2)

That user login must then be created on the r
emote machine where the data files reside and
that login must be granted `full control`permissions over the directory where the files are.

We found that this was the only way to make the remote bulk load work on MSSQL in Windows.

This only applies to Windo
ws, in Linux DSP uses `freebcp`utility provided by FreeTDS to load the


file.

3.3.5.

DATABASE CLIENT LIBS


DSP uses native libraries for all dbs except
for MSSQL on Linux. MSSQL on Linux and Windows is a
c-
cessed through an ODBC bridge. In Windows the ODBC uses

n
ative MSSQL widnows library, in Linux
ODBC uses FreeTDS libs.

In order to access Oracle db an Oracle client must be installed on the server where DSP is running
(together with the Oracle

version of DSP) on both Linux and Windows.

The same goes for Sybase
on Windows but not Linux; in Linux there is no need for a Sybase client as
such other than the interfaces file. Then again if you need to install interfaces you basically need to
install the client package.

O
ther supported db
s

doesn’t require client packag
es to be installed for DSP to work in general but
each supported db must have its bulk

data

loading tool available through the PATH variable.

DSP uses native data loading tools to move the data

from the file

to temp tables in the
target
dat
a-
base.

Since t
hese tools are

usually

part of the client package it makes sense to install the full client
package rather then just the tool itself which may require other client files/libs anyway.

This is in normal circumstances a moot point since it’s hardly possible
today to have any kind of db
access without a client db package

installed
.
It’sjust mentioned for completeness.


3.4

MIFID EXAMPLE INSTAL
LATION





Page
26

of
115


To install the mifid example process use your favorite
sql client tool and run the
M
I-
FID_
EXAMPLE_
[
DB_TYPE
].sql f
ile through
it from the appropriate directory such as
‘DSP_DIRECTORY/db/oracle’. This file will create a DSP_MIFID_EXAMPLE table and all the necessary DSP re
c-
ords for processing a Mifid data file.

Before you run the file or after the records are inserted
:

1)

Y
ou need to change the db server and login details in the DSP_FILE_LOADER and DSP_TABLE_UPDATER
ta
bles
:

DSP_FILE_LOADER: DB_SERVER, DB_USER, DB_PWD

DSP_TABLE_UPDATER: FULL_TABLE_NAME


this consists of
4 fields from the DS_FILE_LOADER strung t
o-
gether:
DB_S
ERVER.DB_NAME.DB_SCHEMA.DB_TABLE

2)

You may also
need
change the destination directories for the data file in the DSP_FILE_TRANSFER table
:

DESTINATION_DIRECTORY, ARCHIVE_DIRECTORY

3)

You also need to update the config file DSP_DIRECTORY/config/dsp.config and upd
ate
DSP

db
server
name and login
(
dbServer, dbUser, dbPassword
) as needed.

4)

Add the
‘…/dsp/
bin


directory to your path.

To test if your setup is correct run only the download part first:


cd $DS
P
_BIN


dspFileTransfer
-
configFile dsp_dir/config/dsp.config
-
c
onfigSection mifid_example_ora

-
color


If the download is successful then your DSP installation is fine and you
can
start using it.
Y
ou

can

run the

whole
mifid process using the appropriate bash script from the bin directory
:


dsp
-
PM
-
mifid
-
ora
-
lin
.bash

Now

rerun the command again y
ou should get a message that the file has already been processed for the day.
To force the reprocessing again
update the bash script by adding the ‘
-
force’ option
:

dspProcessManager
-
version
-
configSec mifid_example_syb
-
col
or

-
for
ce

Now the whole process should have been rerun and in the recap you should see tha
t

no records were updated

assuming you haven’t changed the setup in the DSP_TABLE_UPDATER table having ‘SMART_UPDATE_FLAG’
set as the update type.


TIP

If you are using MSSQ
L then chances are you will have a problem updating the DSP_MIFID_EXAMPLE table b
e-
cause the source file will have some garbage characters in it.
This only applies
to
MSSQL as all other db
s

can
handle it internally (usually by either omitting the character

or replacing
it
with something else

like ‘x’
).


As of this wr
iting there are three such char
a
c
ters

in the file
. But this can change as the data changes. The sol
u-
tion is to use regex to replace each word with the inva
lid char
a
c
ter with the proper word/name
. You have to
look at the log file from

bcp to see which words contain

the bad character and correct it in a regex which you
can
put into the DSP_FILE_PARSER.
PRE_PARSE_FORMAT_RX
or
DSP_FILE_PARSER.
POST_PARSE_FORMAT_RX
in the form for instance of

‘s/
.B F/AB

F
/
’ where dot signifies the bad character and is replaced by ‘A’
.

This is the
actual example of a bad char
a
cter from th
e current file, this can change in the future. Remember you can i
n-
clude multiple regex expressions in the format fields delimited by a c
oma or semi
-
colon.


3.5

EMAIL
USE IN DSP




Page
27

of
115



DSP uses the option ‘emailList’ to indicate whether an email should be send after a process
ing

task completes.
If there is at least 1 email address in the ‘emailList’ the program will send an email. There may be seve
ral
emails in the list
separated by colon or semicolon with no space between them as is normal in DSP.

The email
is always the recap of the processing run.

If you have hundreds or thousands of processing runs a day
it may make sense to send email only when

a pr
o-
cess fails in which case in addition to ‘emailList’ option you need to set the ‘emailOnFailureOnly’ option.


3.5.1.

LINUX EMAIL


The Linux version of DSP relies on ‘sendmail’ command in the Linux OS to send emails. Som
e-
times ‘sendmail’ may not be configur
ed in your Linux box. In this case rather than trying to co
n-
figure sendmail which may be quite complicated it’s better to install a package called ‘ssmtp’.
Ssmtp

allows sending out mail but not receiving it. It
supports TLS aka encrypted communication
with the mail server. It’s simple to install and use and it allows you to use large public smtp
mail
providers

such as gmail
for sending out mail (assuming you have a gmail account).
.



3.5.2.

WINDOWS EMAIL


In Windows th
e situation is more complicated. There is a mail sending module built into DSP but
it doesn’t support TLS and therefore cannot be used together with gmail which supports only e
n-
crypted connections.

For DSP email to work in Windows you have to have an SMTP
server configured and working
somewhere on your network where it can be reached by DSP. The SMTP server must support
regular non
-
encrypted connections.


To make DSP use this server you need to setup 3 enviroment variables on the machine where
DSP is runni
ng:

SMTP_SERVER=smtp.domain.com

SMTP_USER=user

SMTP_PWD=password







Page
28

of
115



4

SYSTEM ADMINISTRATIO
N


This section discusses

general rules

and pointers applic
able to the whole
DSP

system
.
The system administr
a-
tion consists primarily
in s
etting

up data stream proc
esses. T
his is the main topic in this section.

Later sections starting with the section # 5 offer a detailed description of each
service component

within
DSP
;

every
table or tables involved in a
process

setup
are

listed with all
their

columns and their det
ailed description.
Also any important issues are me
ntioned or discussed if they warrant it
.


4.1

A SAMPLE

APPROACH TO
BUILDING

DATA STREAMS IN DSP


Before we get into the details of how to setup
a
data stream with specific functionality we need to start by

loo
king at how
a complex data stream can be built over time

in DSP. This is esp
e
cially important
when working
with

large data sets

in a complex environment
. The bigger and the more complicated a dataset is the more o
p-
tions you have how to deal with it

in DSP
.

Our goal is always the same: to have the best possible data in our system at any one point in time available to
the business. With

a

single file and/or small data set this is not much of an issue. We use

DSP to load the data
into our databases and we’re d
one.

However things get complicated with bigger datasets.

We’ll look at a specific business example rather than talk in generalities.

The main example we will use
throughout this manual is a
financial data set produced by
one of the

top financial data vend
or
s in the world.

This specific data set

contains over 200 daily files. These files contain the most complete and many users b
e-
lieve the best available data for all equity products from around the world at the end of each business day.
O
ur task is to assem
ble the best possible data in our system ready for the next day based
on
the vendor data.


We will not cover the full 200+ files in our

example,

we will concern ourselves only with a single category of
the data namely the master data for stocks traded on e
xchanges around the world. This data is contained in
two types of files, the full files which contain complete data available to the vendor and the delta files which
contain only the changed data for the day. Since this is not pricing data there isn’t all
that many changes to the
data on a daily basis. Each dataset is spread in 5 regional files. So we have 10 files alltogether

to deal with (5
full files and 5 changed files)
.

In this case we cannot just

sl
ice and dice

the ten

file
s and load

them into our ven
dor database. There is some
complexity involved which requires us to make certain decisions.

The basic decision we have to make is whether to process a full data for
the
day or only the changed data. O
b-
viously the main difference is in the volume. The chan
ged data is hardly ever over 10% of the full data set.

This
seems like a no
-
brainer.

1.

Step 1: Setup a process to

load
delta

daily files into our holding database

on a daily basis

It doesn’t

take

long however to get business people to complain to us that we
missed certain data changes.
We check the data and don’t see what we missed but then we check the full files and sure enough a piece of
data is different in that file.


It seems
that full data is better than changed data because sometimes there are omissio
ns

in changed files
.

And sometimes t
he missed
data
change can be an important piece of data that may cost

our
company money
if we don’t have it.

Hence we have to change our processing and process the full files instead of the delta files.


Page
29

of
115


2.

Step 2
:
Change th
e

process to load full daily files into our holding database

Then one day there
is

another business blow up. We learn that one of the business users of our data lost good
chunk of money because one of the records in our system was invalid.

After some inves
tigating we find out that the reason for
it
is that we are not deleting invalid records from our
database. The

invalid records are in the del
t
a

files and they have status of
-
1.

Hence we have to expand our pr
ocessing
and delete the invalid records.

Here DS
P offers us a number of
choices in accomplishing this. If we want to stick with only the full files we can wait until the end of the day
when all files are processed then collect the primary keys from all the data files load them into our database
and simp
ly delete records with primary keys not in our set.

And those will be invalid records.

3.

Step 3
:
Expand the process or setup a new process to delete invalid records at the end of the day

using
the full primary key collection from the dataset
and which is th
en applied to our database

This seems like a good solution and e
verything is fine again until another

business user comes to us and co
m-
plains that they get data from our database before we delete the invalid records

and therefore their data still

includes
invalid records
. Once again we have to adjust our processing. There is
no
alternative now, we
have to
process the delta files.

W
e have to select the invalid records as the files come in and delete them from our d
a-
tabase

immediately.

4.

Step 4
: Change the exis
ting process to delete invalid records as they come in the
delta

files

during the
day

By this time we are however paranoid and we want to make sure we have full and valid record set in our dat
a-
base at the end of the day. So we not only keep deleting the in
valid records from the delta file as they come in
we also reinstate the end of day full data comparison/delete where we collect the full set of primary keys after
all the files are processed and delete the records in our db that don’t have their key in the

full collection.

By now we are sure we have done everything possible to ensure only valid records are in our system at the
earliest possible time.

5.

Step 5
:
Keep or reinstate the end of the day full primary key set comparison from step 2.

However once again

it turns out we celebrated prematurely. Still another business user comes to us and says
that they extract data from our
db

at t
he end of the day but they cannot wait to find out which records are no
longer valid and they need those during the day as the
files arrive. Again we need to adjust our processing and
create 2 output files when processing invalid records from the delta files. We need to create one file for us to
apply to our database and

one file for our client
and we will
deposit
the file
on thei
r ftp account.

6.

Step 6
:
Add or expand the current processing to

generate

another copy of the file containg invalid re
c-
ords
.

But we are still not done. It turns out there is another data client of ours who found out that we have a new
fantastic system called

DSP for data processing and they decide that they would like to change their pr
o-
cessing. They no longer want to do an extract from our database on nightly bases and load it into their dat
a-
base. They want
us
to

extract only the changed data from our system

and do a direct update in their database
with it

as the files come in, a sort of
near
-
real time updates
.

7.

Step 7
: Update or create a new process to extract changed data from our db and use this data to u
p-
date our data client’s internal database with it.

In

business this is process

never ends or stops; it may pause for years but sooner or later there will be another
business request
.


Page
30

of
115


The power of DSP lies in the fact that all these steps ca be done within the same process. But they don’t have
to, you can sp
lit any of the steps above into its own separate process.

However we have to leave it here as this should be sufficient for illustration purposes. Keep in mind that we
only sketched a business requirement in each step above the implementation itself or at
least part of it will be
shown in the examples further in the guide.


4.2

GENERAL RULES FOR US
ING
DSP


Some of the g
eneral rules that apply
to

the
DSP

system
as a whole

are
listed and explained below
:



The system outputs logging messages to the screen.

The mess
ages are formatted to 8
0
-
character
width. If you want “pretty” output mak
e sure your terminal is set to 8
0 characters wide. Wider term
i-
nal window will work as well but narrower will cause line
-
splitting and the output will no
t

look “pre
t-
ty”.



All columns th
at have ‘LIST’ in their name accept multiple (list of
)
values separated by comma (‘,’) or
semicolon ‘;’.



But there are columns that accept lists even though they don’t have ‘LIST’ in the name. These are
some of the ‘RX’ regular expression columns that acc
ept either 1 regex or a list of literal values.



Parameters on the command line use camel case format while the parameters in the table, the table