PowerPoint Presentation - PERL Practical Extraction Recording ...

whooploafSoftware and s/w Development

Dec 13, 2013 (4 years and 6 months ago)








Follow the link below to go to the Perl
Reference guide. This is has also been
provided for you in the back of your workbook.
As we go along, we will ask you questions.
Guess where you can find the answers…


Learning the Language!

Download the Software

This software is provided

for all of your
Recording needs.

Retrieve Perl from ActiveState


This is

spot for the PC specific software!

Save this on the desktop, and double
click to
begin the installer. Now follow the directions and
let us know if you have



#Lindsay Husted


#July 10, 2003

#this program generates a random password


#Generate a random number

print "Enter a seed number: ";

$seed = <STDIN>;

chomp $seed;

srand($seed ^ time);

Let’s walk through a script that will generate a

#Set up a list of consonants and vowels

@consonants= split(/ */, "bcdfghjklmnpqrstvwxyz");

@vowels= split (/ */, "

#Loop through the generation of a password

for ($i=1; $i<=4; $i +=1) {

print $consonants[int(rand(21))],




Running Your Script

Be sure to save you script in your Perl folder.

What working directory are you in?

Once you open your terminal or command
prompt, type the UNIX command “dir”.

Change your directory to your Perl folder.

To run your script type “perl password.pl”

This calls on the Perl (ActiveState) program to
open the specified file


Filtering Script Template





#description of program

#Define and open source file, stop the program if there is an error

#Define and open output file, stop the program if there is an error

#Loop through the entire file, line by line

#Read in a line

#Process the line

#Output the newly edited line to a new file

#End of script


The first line must be entered exactly as it is (including the
pound sign) at the top of each Perl program. This line should
always be the very first line in the file (not even comments
should precede it). Unfortunately, an explanation of what
this line does is beyond our present scope. For now, just
recognize that it designates this file as a Perl script, and that
it is essential.



#description of program

The first few lines are comments. They should include the author
of the script's name, the date the script was written (and possibly
the dates it is modified as well), and a very brief description of
what the program does (at most a short paragraph). No code or
commands are necessary for this section of the file, only

#Define and open source file,
stop the program if there is an

The next step will involve the first Perl commands. First, we
must define a variable that identifies the location of the data
file. Let's suppose that our file was named
was located in the main directory of the C drive. The first
command would look a lot like this:

$SourceFile = '<C:

The dollar sign tells the computer that this variable is a string (a
set of characters), and the semicolon at the end of the line
signals the end of this command. If your finished program
doesn't run on the first try, often it is a result of forgetting to
include a semicolon. The sign

is important because it
identifies this as a read only file.


#opening source file cont’d

Now that you have identified the source file, you have to have
Perl try an open it. Use the open command to do this:

open(INFILE, $SourceFile) or die "The file $SourceFile could
not be found.

This command tries to open the file specified by
Now don't be alarmed, the

command merely reports the
message that follows it if there is an error and stops the program.
So if

does not point to a valid file, the program
stops. Note: The
n character simply represents a new line.

#Define and open output file,
stop the program if there is an

Opening an output file is very similar. First you specify a
destination file for the program's output. Let's define our output

$OutputFile= '>C:

Now we have the location of our output file specified in a string
variable called
. Note the

designates a string
variable and the semicolon marks the end of the command. Also,
in contrast to the read only marker

used when opening the
source file, this command uses the write marker

to tell the
computer this file will be for output. Don't worry if the file has
not been created before the program runs; Perl will create the file
for you.


#opening otuput file cont’d

Like the input file, the output file has to be opened. Try:

open(OUTFILE, $OutputFile) or die "The output file could not
be opened.

This command works exactly the same as it did when opening
the input file.

#Loop through the
entire file, line by line

With the input and ouput files open, we are ready to begin
filtering the data. A convenient way to do this is to have Perl
cycle through the data line by line. Try using a

loop like:

while (<INFILE>) {

#add commands here


This code executes all commands within the curly brackets until
the entire source file has been read. Note that

is a special
sort of command that does not require a semicolon after it. The
rest of our commands will go inside the curly brackets so that
they are executed while looping through the file.

#read in a line

To read in the next line of the file, add the command:

$Line = $_;

This line creates a variable named

and assigns it the
contents of the variable

is a special Perl variable, and a
full explanation of its contents is beyond our scope. For now, it
suffices to know that the command above assigns to the variable

the current line of text from the source file. Do not forget
the semicolon at the end of the line.

#process the line

Now that we have a line of text from the source file, we can edit
its contents before sending it to the output file. However, if the
current line is empty, we don't want to do anything to it, so we
add the command:

if($Line eq "
n") { next }

This command looks at the text stored in
. If

blank (if it only contains a next line character) then Perl
executes the

command. 'next' skips the rest of the
commands within the while loop and returns to the top of the
loop. Note that this command only has an effect if

something other than
. Also observe that no semicolon is
necessary after this line of code.

#more processing

If the line in question is not blank, the real fun begins. Let's
change any commas we find into tabs. Don't be intimated by the
appearance of the next command, it's actually not that
complicated. Try:

$Line =~ s/,/

#still processing

$Line =~ s/,/


identifies a search and subsitute action. The character
between the first two slashes (
) is what the command searches
for, in this case a comma. The character between the second and
third slashes (
) is what will be substituted whenever a match is
found. In this case,

is the code for a tab character. So each a tab
character is substituted for each comma. Finally, the

stands for
global. This ensures that all commas found are replaced. If you
leave out the

then only the first comma on the line will be
changed. To summarize, this command looks at the current line
and replaces each and every comma with a tab character (denoted
here by

#output the newly edited line to a
new file

At this point we are almost done. All we have to do is output the
filtered line to our new data file. This can be easily done with
the command:

print OUTIFLE $Line;

As you might guess, this command prints the contents of

to the
. Now the program is ready to handle a new line
of output, so if no more commands are added the program
returns to the top of loop.

#End of Script

The loop will continue until every line of the

has been
read. After the loop has finished running, the program is done. At
the end of the program your source file will remain unchanged,
but you should also have a newly filtered data file.
Congratulations! You used Perl to filter data! You are now ready
to try the optional filtering components available throughout the

Your final output~

#David Hillman


#This program changes a comma
delimited data file to a tab
delimited one

#Define source file and open it, stop the program if there is an

$SourceFile = '<C:

open(INFILE, $SourceFile) or die "The database $SourceFile
could not be found.

#Define output file and open it, stop the program if there is an

$OutputFile = '>C:

open(OUTFILE, $OutputFile) or die "The output file could not be

Final cont’d

#loop through the entire source file

while(<INFILE>) {

#read in the next line of text from the source file

$Line = $_;

#if the line is blank, do nothing

if($Line eq "
n") { next }

#replace all commas with tabs

$Line =~ s/,/

#add the newly edited line to the output file

print OUTFILE $Line;



* Let’s look at the Sea Surface Temperatures Case Study

Page 90

* Find the data! If you look at the website to see an

explanation of the data, you will find out what each



Time to use Perl for Data Filtering!

In your workbook…

Now follow along with the instructions on
Data Filtering

Page 92

Use the Perl Reference guide provided

Work slowly and pay careful attention to
each line of script

Ask us questions if you need help!