Automating a Vendor File Load Process with Perl and Shell Scripting

whooploafΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

448 εμφανίσεις

Automating a Vendor File
Load Process with Perl and
Shell Scripting

Roy Zimmer

Western Michigan University

We needed to get Promptcat approval files, from OCLC’s
ftp site.


We needed to get Promptcat approval files, from OCLC’s
ftp site.


Historically, we’ve done file retrieval and processing via
shell scripting, with some supporting Perl software.

We needed to get Promptcat approval files, from OCLC’s
ftp site.


Historically, we’ve done file retrieval and processing via
shell scripting, with some supporting Perl software.


In this case, we started out mostly manual, with some
programmatic support.


We needed to get Promptcat approval files, from OCLC’s
ftp site.


Historically, we’ve done file retrieval and processing via
shell scripting, with some supporting Perl software.


In this case, we started out mostly manual, with some
programmatic support.


We kind of snuck up on the final method of retrieval and
processing.

The ftp site has
quite a few files,
including a
number of
different types:
LBL, RPT,
APPR, FIRM…


Let’s use a representative sample for this presentation…

Out of this large number of files, only one or a few will be of
interest. For example, take the files for May 7.

Out of this large number of files, only one or a few will be of
interest. In this case, the files for May 7.

How do we pick them out?

This is where Perl comes to the rescue.

With Perl, you can do many things.

Code details, main program, ftp stuff

ftppcatappr.pl

Required when using ftp within Perl

Site password is stored here.

Code details, main program, ftp stuff

ftppcatappr.pl

Required when using ftp within Perl

Site password is stored here.

Code details, main program, ftp stuff

ftppcatappr.pl

Required when using ftp within Perl

-

Site URL


-

Username



-

directory where files are
-

transfer mode

Code details, main program, ftp stuff

ftppcatappr.pl

Self
-
explanatory

Required when using ftp within Perl

Site password is stored here.

-

Site URL


-

Username



-

directory where files are
-

transfer mode

Site password is
stored in a file.

Code details, main program, ftp stuff

ftppcatappr.pl

Site password is
stored in a file.

Setting up for
FTP

Code details, main program, ftp stuff

ftppcatappr.pl

Retrieve ftp site file listing into a variable
as an array of directory entries.

Code details, main program, ftp stuff

ftppcatappr.pl

Set each line up to be split on the space character and then do so.

Code details, main program, ftp stuff

Retrieve ftp site file listing into a variable
as an array of directory entries.

ftppcatappr.pl

Set each line up to be split on the space character and then do so.

The last piece in each line will be the filename. Split this into pieces
based on the period.

Code details, main program, ftp stuff

Retrieve ftp site file listing into a variable
as an array of directory entries.

ftppcatappr.pl

Set each line up to be split on the space character and then do so.

The last piece in each line will be the filename. Split this into pieces
based on the period.

Look for the one(s) that correspond(s) with yesterday’s date and
keep those.

Code details, main program, ftp stuff

Retrieve ftp site file listing into a variable
as an array of directory entries.

ftppcatappr.pl

Want the files to be
processed in order

Code details, main program, ftp stuff

ftppcatappr.pl

Code details, main program, processing each file

ftppcatappr.pl

Get the files

Code details, main program, processing each file

Records will need some editing…

(Thanks to Birong Ho, our systems librarian, for originally supplying this editing code.)

Get the records

ftppcatappr.pl

Records will need some editing…

“grab” the fields of interest

Get the records

Code details, main program, processing each file

ftppcatappr.pl

Records will need some editing…

Some fields are deleted…

Code details, main program, processing each file

ftppcatappr.pl

Records will need some editing…

…and others are edited.

More edits than this are
performed; the basic syntax is
the same for each of them.

Code details, main program, processing each file

ftppcatappr.pl

File will need some splitting…

Split each file up based on the invoice number found in field 980 |f

Code details, main program, processing each file

File will need some splitting…

Split each file up based on the invoice number found in field 980 |f


The next program takes care of this…


I did say we snuck up on this, didn’t I?


Code details, main program, processing each file

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

Rather than using the familiar LF,
the MARC format uses a different
EOL character.

Code details, helper program, processing each file

oclc980.pl

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

This section reads each MARC
record, looking for the 980 field.

Code details, helper program, processing each file

oclc980.pl

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

Get the subfields into an array.

Code details, helper program, processing each file

oclc980.pl

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

Get the subfields into an array.

Code details, helper program, processing each file

Look for subfield f and
read it to get the invoice number.

oclc980.pl

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

Get the subfields into an array.

Code details, helper program, processing each file

Look for subfield f and
read it to get the invoice number.

Determine if it’s a new or “existing”
invoice number. This also lets us
count records for each invoice.

oclc980.pl

File will need some splitting…split each file up based on the
invoice number found in field 980 |f

Get the subfields into an array.

Code details, helper program, processing each file

Look for subfield f and
read it to get the invoice number.

Determine if it’s a new or “existing”
invoice number. This also lets us
count records for each invoice.

Use
append

mode to open, write
a record, and close the file for
each invoice number.

oclc980.pl

There are usually several files after splitting the file being processed.
Each one must be further processed and loaded into Voyager.

This is controlled via a small shell script.

Code details, helper program, processing each invoice file

There are usually several files after splitting the file being processed.
Each one must be further processed and loaded into Voyager.

This is controlled via a small shell script.

It calls another shell script for
preprocessing and bulk loading
of each of the invoice files.

Code details, helper program, processing each invoice file

(Thanks to Keith Kelley, director of systems, for creating this script.)

importall.sh

Code details, helper program, importing each invoice file

$1 is the default first parameter to the script. Let’s
use a more descriptive variable.

prodimport.script

Code details, helper program, importing each invoice file

$1 is the default first parameter to the script. Let’s
use a more descriptive variable.

Let’s also drop the filename extension, so
that we can “reuse” the filename.

prodimport.script

Code details, helper program, importing each invoice file

$1 is the default first parameter to the script. Let’s
use a more descriptive variable.

Let’s also drop the filename extension, so
that we can “reuse” the filename.

Get ready for the prebulk
processing.

prodimport.script

Code details, helper program, importing each invoice file

prodimport.script

1
st

use of file referenced by $1,
so that use is OK

Code details, helper program, importing each invoice file

Start with some final edits…

prodimport.script

Code details, helper program, importing each invoice file

Start with some final edits…

We’ll use marcedit.pl to
replace the contents of field
981 |a, as illustrated:

There are 59 such edits possible.

prodimport.script

Code details, helper program, importing each invoice file

Start with some final edits…

Prep for bulkimport, too

prodimport.script

Code details, helper program, importing each invoice file

Start with some final edits…

Prep for bulkimport, too

Self
-
explanatory

prodimport.script

Code details, helper program, importing each invoice file

Prebulk output is bulkimport input.

Perform the bulkimport

prodimport.script

Code details, helper program, importing each invoice file

Prebulk output is bulkimport input.

Perform the bulkimport

The final step for each file is to do some cleanup
and moving files to the loaded directory.

prodimport.script

Password maintenance

The ftp site requires us to change our password every 90 days.

We wanted all this to run hands
-
off, so that had to be automated, also.

The password gets changed every two months.

Password maintenance,
getpromptcatpw.ksh

Password maintenance,
getpromptcatpw.ksh

Password maintenance,
pwgen.pl

Want an 8
-
character password

Password maintenance,
pwgen.pl

Want an 8
-
character password

Password length defaults to 10

Password maintenance,
pwgen.pl

Want an 8
-
character password

Password length defaults to 10

Password consists of these
characters

Password maintenance,
pwgen.pl

Want an 8
-
character password

Password length defaults to 10

Password consists of these
characters

Seed the random number generator

Password maintenance,
pwgen.pl

Want an 8
-
character password

Password length defaults to 10

Password consists of these
characters

Seed the random number generator

Generate the password

Review

Run ftppcatappr.pl

Login to OCLC ftp site for promptcat

Find desired files and retrieve them

Do process each file


remove unwanted 6xx, 938, 948 fields


edit some 856 fields


run oclc980.pl


do process each record in the current file


look at the 980 |f (contains the invoice number)


if it contains invoice NNN, (create and) put this record in file NNN.marc, etc.


end do


run importall.sh


do process each file created by oclc980.pl


run prodimport.script


use marcedit.pl to process 981 |a replacements (59 possible edits)


prebulk


bulk import


wait 1.5 minutes before continuing


end do


move all interim .marc, .preimp, and .imp files to /loaded

End do

Move all RCD* files to /loaded

Review

Run ftppcatappr.pl

Login to OCLC ftp site for promptcat

Find desired files and retrieve them

Do process each file


remove unwanted 6xx, 938, 948 fields


edit some 856 fields


run oclc980.pl


do process each record in the current file


look at the 980 |f (contains the invoice number)


if it contains invoice NNN, (create and) put this record in file NNN.marc, etc.


end do


run importall.sh


do process each file created by oclc980.pl


run prodimport.script


use marcedit.pl to process 981 |a replacements (59 possible edits)


prebulk


bulk import


wait 1.5 minutes before continuing


end do


move all interim .marc, .preimp, and .imp files to /loaded

End do

Move all RCD* files to /loaded

The files listed below are available at


http://homepages.wmich.edu/~zimmer/files/eugm2008


fileload.ppt

this presentation


ftppcatappr.pl

gets the files and controls the processing


oclc980.pl

splits retrieved files based on invoice number


pwgen.pl

generates a password


importall.sh

ensures that each “split file” for a particular retrieved



file is processed


prodimport.ksh

does the actual processing of each file


getpromptpw.ksh

handles all the details of a password change

Resources

except for


marcedit.pl

enables batch editing of MARC files

which is at
http://homepages.wmich.edu/~zimmer/marc_index.html

CPAN


http://cpan.org

FTP


http://search.cpan.org/~gbarr/libnet
-
1.22/Net/FTP.pm


I’m not sure if the FTP module is supplied on Voyager boxes or not. If you
don’t have it, go to the above URL. It also has good documentation on this
module.

Resources

Thank you for listening.


Roy Zimmer

roy.zimmer@wmich.edu



Picture © 2008 by Roy Zimmer