IBM InfoSphere DataStage Skill Builder Part 1: How to build and run a

radiographerfictionΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

75 εμφανίσεις

IBM InfoSphere DataStage Skill Builder Part 1: How to build and run a
DataStage
https://www.youtube.com/watch?v=i2wDEnODDbI

Datastages has built in components called stages

This job has 3 stages connected by links
The links are like pipes within whict data flows from one stage to another
DB2 connect data stage
0:13
Welcome to this demonstration of IBM InfoSphere DataStage,
0:18
that is part of the IBM InfoSphere Information Server platform.
0:23
In this series of videos, I'm going to show you how to build and run a DataStage
0:28
parallel job.
0:30
Here you see the job we are going to build
0:33
shown
0:33
in the DataStage Designer client.
0:37
It is an example of what is often called an ETL job,
0:40
that is an extraction
0:42
transformation
0:44
load job.
0:46
With DataStage,
0:47
it is easy to build ETL jobs.
0:51
And, DataStage utilizes parallel technologies
0:54
to enable these jobs to process huge amounts of data with amazing speed.
1:01
DataStage jobs have built-in components called "stages"
1:06
for extracting
1:08
data from and loading data to many different types of data resources
1:13
including files,
1:14
database tables, and enterprise application data resources.
1:19
It also has built-in stages for transforming data including stages for
1:24
joining, sorting, aggregating data and for implementing business logic.
1:30
Now let's take a look at at the job we're going to build.
1:34
It's a simple job but it will give you a good idea of how to build many
1:38
DataStage jobs.
1:40
Most DataStage jobs are just more complex variations
1:44
containing more stages and links and using other types of stages .
1:50
This job has three stages
1:53
connected by links.
1:55
The links are like pipes through which data flows from one stage to another.
2:02
The first stage here is called the DB2 connector stage.
2:07
It is used to extract data from DB2 tables.
2:11
We will use it in our job to extract data from a DB2 table named EMPLOYEE that sits in the database named SAMPLE.
2:22
The data extracted from the EMPLOYEE table then flows to the transformer stage.
2:28
This stage is used to implement business transformations and a specified data constraint.
2:35
In this example, we will use the transformer stage to transform values going into a column called "employee name".
2:44
We will also implement a constraint that selects just employees who are not managers.
2:51
The transformed data then close to the sequential file stage which is used to write the data to a sequential file.
3:01
You'll also see in this job several annotation stages.
3:09
These are used to help document the job.
3:12
The GUI design along with these annotation stages provides a clear picture of the job design and specification.
3:22
After we build the job, we will then execute it using the DataStage high-performance parallel engine.
3:31
So let's get started.
3:32
The first thing we need to do is to open an empty canvas for the job where we will lay out the stages and links.
3:41
I'm going to close down this job and then open a new canvas .file new parallel job
3:53
And i'm gonna save this job in a folder called "DS essentials" which is going to store all of our objects.
4:06
I'll call the job "employee info"
4:16
The first thing we need to do is to lay out the job's stages and links.
4:22
We'll drag the stages from the palate in the lower left hand corner over to the canvas.
4:31
The DB2 connector stage will drag from the database folder over to the canvas.
4:41
The transformer stage will drag from the processing folder over to the canvas.
4:49
And the sequential file stage will drag from the file folder over to the canvas.
4:59
Next, we'll draw links between the se stages to represent the flow of data from one to the other.
5:06
We'll start with the DB2 connector stage ... right mouse click and then just
5:12
move the mouse cursor over to the target ... that's how we draw the link.
5:17
You see the arrow here indicating the flow of data flows from the
5:21
DB2 connector to the transformer.
5:24
And we'll do the same here.
5:30
The next thing we'll do is to rename the links in stages from their default names, as you see here, to more meaningful
names.
5:40
The DB2 connector stage is used to extract data from the employee table so we'll call this stage "employee".
5:49
Just click on the stage and then start typing to change the name.
5:57
And I'm gonna copy this.
6:00
And I'm also going to call this link ... cause data is flowing through this link ... is employee data ... I'll name the link the
same thing "employee".
6:14
the transformer stage since its processing employee information in selecting not managers all just call selected
employees debated going to the target sequential file stages employee info so i'll call this link and the stage employee info the
finished job which i showed you earlier also had an occasion stages to save time we're going to leave these out of that job and
this completes our first video.

In the next video will dead at the stages in the job

will configure them and set their properties to perform the activities we want them to perform if you find this series of videos
helpful but you need a deeper understanding of data stage the following courses are highly recommended i_d_ and infos
feared data stages central's i_b_m_ inflows fear advanced data stage ibm offer several types of data stage training including
traditional instructor led classroom and private onsite classes instructor led online self-paced virtual classroom courses web
based virtual classroom courses and flexible learning with product simulations and by the way these courses are a fantastic
step towards becoming an ibm certified developer for data stage for additional information about the different formats of
training and what's available







Renommer les différents stages et les liens


IBM InfoSphere DataStage Skill Builder Part 2: How to build and run a
DataStage
https://www.youtube.com/watch?v=iAP8XeGBseg


Double click on the DB2 stage to open it














Calculate à derivation for full name based on ..
Insert new column

200
length
Insert column


Right click and select concatenate



Click on constraint button upp here













0:13
in this video we are going to configure the job stages we laid out on the
0:18
cameras in the last video
0:21
will edit the stages from left to right a in terms of the data four oh
0:27
so we'll start with the dv two connector stages
0:30
which reads the employee table data
0:35
Doubleclick on the stage to open an account and when you open the stage you'll see a properties downtown and we need
to specify both connection properties and usage properties in particular we need to too type in here in the name of the
database that contains the table that we're reading from and we need to enter our user i_d_ m password to connect to that
database stormy manner usar a_d_d_ b_-two against one
1:22
what we're gonna do here is we're going to let the the stage generate the sql
1:29
as opposed to dot typing in
1:32
manually typing in the s_ q_ ourselves or pasting india sql from some other
1:38
utility
1:40
well at this stage generate to ask you at all
1:44
they were going to need to since we are leading the stage generator sql we
1:47
needed a writ the name of the table here some additional property here
1:55
and then we've we've done enough here
1:58
on the property stamp
2:01
the next thing we need to do is to go over to the calms down
2:05
where will specify the columns of data that we want to select from that table
2:13
in this example
2:16
we're going to reload
2:18
the calmed
2:20
and their definitions
2:22
from what's called the table definition previously outside of these videos use the data stage import utility too important
ptimal definition for the employees table animal now a select out too table definition here click okay

2:50
when we get a list of of the columns hedonistic we could selectively number of these columns but i'm going to go ahead
and and for simplicity to select a mall and here you see all the calls but the stage is going to read and that's all we need to do
it in the d b two connector station so i'll click ok to save that configuration text we need to configure the transformer stage

this is an extremely powerful stage that can be used to implement column mappings business derivations and constraints
double click on the stage to open it on the left side you see the calls coming into the stage now we need to specify what
columns are going to go out of the stage and what their values are the first thing i'm going to do is to select the columns that i
want to drag through the stage anonymous select all the calls except for these employees individual employee name calls
4:27
next we will define a derivation
4:30
that calculates the employees full name
4:33
based on
4:34
the values in the employees
4:37
individual named calls
4:41
so
4:41
i will first create
4:44
the call
4:45
store those values
4:47
and i do that i select the location i want to put the new calm
4:52
right-click
4:54
and then
4:56
selected answered the call
4:58
so you see the new calm
5:01
down here
5:04
will specify the name in type of the new called
5:09
the new called i'm going to call employees
5:14
main
5:16
and i'm going to make it any
5:20
for trucker
5:22
which means it can store or variable
5:24
number of characters up to it
5:27
links
5:28
maximum length which all
5:32
provided as two hundred
5:34
characters
5:40
now opened the
5:43
data stage expression editor
5:46
too
5:47
specified the derivation
5:50
and i'll double click in that cell here to the left of the employee named target
5:54
column
5:56
to open expression editor
5:58
and they're all construct
5:59
the derivation
6:01
the first
6:05
value
6:07
is going to be the employees last name so also liked
6:14
input column that contains the employees last name
6:21
nextel right-click
6:22
and selected
6:24
the operator which in this case is the concatenation operator
6:30
i'm going to can can't make the employees last name
6:34
with a string constants
6:37
comma space
6:42
and then i will concur made
6:44
that
6:46
with
6:46
the employees
6:48
firstname so i'll select
6:50
right-click
6:52
and all selected
6:54
firstname calm
7:00
concatenation operator
7:04
or put in a space
7:09
concatenation operator
7:13
and then i will select
7:14
the employees middle initial
7:22
completes
7:23
the uh...
7:24
derivation
7:28
next let's define the constraint
7:32
in this example the constraint will select only those employees who are not
7:37
managers
7:40
click the constraints button up here
7:44
to open this little window
7:46
and i'm going to invoke again the data stage expression editor
7:51
to build a constraint
7:53
so here's the name
7:54
of the link
7:57
the output link from the stage end i'm going to next to it
8:02
within the cell
8:05
we want to
8:08
low-tech at
8:10
the job at the employees have to determine
8:14
whether that job as a manager or not
8:17
so i will select
8:20
the input calm
8:22
jul
8:28
data stage contains a number all built-in functions that you can use in
8:32
your expressions
8:35
one function of war use here is the trend function
8:39
which will remove the in the mornings
8:43
surrounding the of without you in this job kar all
8:48
you can select
8:51
from any of the function so i right click
8:55
and you see function
8:58
and then i get out a list of all the functions that are available
9:03
in this case i'm going to select
9:05
neutral function i could also just type in the word trend which is what i a
9:09
actually usually do
9:11
but just to show you the list of all the functions
9:16
and now we're going to use the trim function
9:20
to modify the value in the job column
9:24
needed
9:25
all those
9:26
parentheses here
9:31
and that will remove any blacks and i want to say that that value
9:36
does not equal
9:38
manager
9:47
and that completes the constraint
9:56
and that completes our work within p transformer stage so let's go ahead and
10:01
click ok
10:02
to save
10:03
the results
10:07
mixtape we need to configure the sequential file stage
10:12
that's that's stage here at double click to open it up
10:17
and you see on the property stab you see its properties
10:23
we need to specify a year the name of the file in a path to that file
10:29
but we're going to be writing to so i'll select the property
10:34
and then i'm gonna browse for the directory
10:37
if and of file that we're going to be writing too
10:42
and here you see the path
10:45
and the name of the file employee info dot tax time
10:50
that's all we need to do here in terms of the properties
10:54
let's go over to the calms down
10:57
and you'll notice all the calmness of data that are going to be written to the
11:01
file
11:03
notice the employee name calm
11:06
that we
11:07
uh... put are derived value into the employees full name
11:13
also notice here that at the number of these columns arnoldo
11:19
and the last thing we need to do here is we need to say tell the stage how we
11:24
want to handle null values when they're written to the stage
11:28
and i can do that over here on the format uh...
11:34
per field of faults i'm going to select
11:37
was than dole field value property
11:41
and here i specify a what value i want written to the file
11:46
what good
11:47
do you know about you know there's written
11:50
and all the split in here
11:55
so click ok
11:57
and that's uh... completes the configuration of that stage and of the
12:02
job
12:04
in the uh...
12:06
next video will compile this job into an executed will form
12:11
and then run it

0:13
In this video, we are going to configure the job stages we laid out on the
0:18
canvas in the last video.
0:21
We'll edit the stages from left to right in terms of the data flow.
0:27
So, we'll start with the DB2 Connector stage
0:30
which reads the EMPLOYEE table data.
0:35
double click
0:37
on the stage to open it.
0:41
And when you open the stage
0:43
you'll see a properties tab.
0:46
And we need to specify both connection properties and usage properties.
0:52
In particular,
0:54
we need
0:55
to
0:55
type in here, the name of the database
0:59
that contains the table
1:02
that we're reading from
1:04
and we need to
1:06
enter a userid and password to connect to that database.
1:11
So, I'm going to enter
1:13
userid DB2inst1
1:22
Now, what we're going to do here is we're going to let the stage generate the SQL
1:29
as opposed to typing in
1:32
manually typing in the SQL ourselves or pasting in the SQL from some other
1:38
utility.
1:40
We'll let the stage generate the SQL.
1:44
And we're going to need to ... since we are letting the stage generator SQL we
1:47
need to enter the name of the table here, some additional property.
1:55
And then, we've done enough here
1:58
on the properties tab.
2:01
The next thing we need to do is to go over to the columns tab
2:05
where we'll specify the columns of data that we want to select from that table.
2:13
In this example
2:16
we're going to load
2:18
the columns
2:20
and their definitions
2:22
from what's called a table definition.
2:25
I previously,
2:27
outside of these videos, used the DataStage import utility
2:34
to import
2:36
a table definition
2:37
for the EMPLOYEE table .
2:41
And, we'll now select that
2:45
table definition ... here
2:48
... click okay
2:50
And we get a list of of the columns
2:54
And in this, we could select any
2:57
number of these columns.
2:59
But I'm going to go ahead and
3:01
for simplicity, select them all.
3:05
And here, you see all the columns
3:09
that the stage is going to read.
3:12
And that's all we need to do it in the DB2 Connector stage.
3:15
So I'll click "ok" to save
3:18
that configuration .
3:22
Next, we need to configure the transformer stage.
3:26
This is an extremely powerful stage that can be used to implement column mappings,
3:31
business derivations and constraints.
3:37
Double click on the stage to open it.
3:42
On the left side, you see the columns
3:45
coming into the stage.
3:50
And we need to specify what columns are going to go out of the stage
3:54
and what their values are.
3:57
The first thing I'm going to do is to select the columns that I want to drag
4:02
through the stage.
4:07
And I'm going to select all the columns
4:13
except for these
4:15
employee individual
4:17
employee name columns.
4:27
Next, we will define a derivation
4:30
that calculates the employee's full name
4:33
based on
4:34
the values in the employee
4:37
individual name columns.
4:41
So
4:41
I will first create
4:44
the column
4:45
to store those values.
4:47
And I do that ... I select the location I want to put the new column
4:52
right-click
4:54
and then
4:56
select insert new column.
4:58
So you see the new column.
5:01
Down here, I
5:04
will specify the name and type of the new column.
5:09
The new column, I'm going to call "employee
5:14
name"
5:16
and I'm going to make it a
5:20
VARCHAR
5:22
which means it can store a variable
5:24
number of characters up to a
5:27
length
5:28
maximum length which I'll
5:32
provide as two hundred
5:34
characters.
5:40
Now I'll opened the
5:43
DataStage expression editor

IBM InfoSphere DataStage Skill Builder Part 3: How to build and run a
DataStage
Compile and run
https://www.youtube.com/watch?v=2JG4qfLUwQ8










Link change color from blue to green
Blue: data mouvint through
Green: no run time error occurred





https://www.youtube.com/watch?v=HPP9W0b2FrI







0:15
the stage is not always well understood
0:17
so spent some time
0:19
having a look at how it works
0:21
and how you can most effectively usage capabilities
0:34
in trouble to look up stage is
0:36
to perform to look up that is
0:39
here's a key value
0:40
go get me that roe
0:43
or maybe
0:44
here's a key value
0:45
goaded me voters rose
0:52
intuitions up to seven point five
0:54
the only match that we had possible
0:58
equals
1:00
here's the key value go get me
1:02
the road for which
1:05
which will be searched kiki robson primary key
1:10
matches that value
1:12
exactly
1:14
inversion ite
1:15
we have additionally
1:17
the means to specify a different kind of look up oquawka range lookup
1:22
with the key is between
1:25
acpi repentance
1:27
between joined if you'd like to think of it that way
1:33
so we'll see how that's done
1:34
later in the presentation
1:44
lookup stage
1:46
and input pulled the string import
1:49
it's input port number zero
1:51
when you start looking at the generated
1:54
orchestrate shell script
1:57
army design a diagram is painted with us all the blind
2:02
it delivers rose containing the keys that we use
2:10
who have a sickening foot
2:12
called the reference in port
2:15
export number one
2:17
and is painted with the dashed line not vanessa love life
2:21
indicating
2:23
generic operation use
2:25
quite keen rather than simply get the next road
2:30
and it will
2:31
have the task then of returning
2:35
vero or
2:36
those rose
2:37
that correspond to the people you
2:40
according to your specification
2:43
lookup staged supports more than one reference input wing
2:47
in which case
2:48
the port numbers for the input ports up to three for
2:52
and held him and you need
2:55
and keep it sensible
2:57
if you made many many lookups
3:00
trying to keep their jobs is high and maintain a bold
3:02
maybe have not more than four referencing thought-provoking upstage
3:06
give moral stand for the downstream
3:09
try and get small process is happening
3:13
welcome back to the concept a little later
3:18
lookup stage has one
3:19
or perhaps to apples
3:23
combined output
3:24
is called output port numbers zero
3:27
generated a research
3:28
and its task is to cinch consume the results of wilco
3:32
so whenever lookup succeeds
3:35
zero
3:36
is delivered
3:37
along the output blink
3:41
in ninety in addition
3:43
the a reject output
3:44
apple port number zero
3:47
and it we used depending on the rules that you say
3:50
within the lookup stage
3:55
we'll talk about the reject output later
3:58
when we talk about
3:59
the rules
4:00
at the stage supports
4:07
in order to go and phage
4:11
rosa correspond to a particular keep value
4:14
limited data
4:16
table column definitions the record schema
4:19
on the reference sample link
4:21
must have at least one column
4:24
defined
4:27
you don't have that
4:28
there is no way to say
4:30
one actually looking for
4:36
when the lookup stages in use
4:39
it will take
4:41
corresponding value from the streaming port
4:45
provide that value to the column that is the key
4:48
and being go fish
4:51
the rolling questioned the rose in question
4:53
or maybe
4:56
to give it a key value
4:58
doesn't exist on the reference data set that we're looking up
5:03
we've got a condition called lookup filed
5:06
we will need to set up some rules
5:09
that handle that