Perl

coordinatedcapableΛογισμικό & κατασκευή λογ/κού

4 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

80 εμφανίσεις

1

Perl Tutorial

Practical extraction and report language

http://www.comp.leeds.ac.uk/Perl/start.html

2

Why Perl?


Perl is built around regular expressions


REs are good for string processing


Therefore Perl is a good scripting language


Perl is especially popular for CGI scripts


Perl makes full use of the power of UNIX


Short Perl programs can be very short


“Perl is designed to make the easy jobs easy,
without making the difficult jobs impossible.”
--

Larry Wall,
Programming Perl

3

Why not Perl?


Perl is
very

UNIX
-
oriented


Perl is available on other platforms...


...but isn’t always fully implemented there


However, Perl is often the best way to get some
UNIX capabilities on less capable platforms


Perl does not scale well to large programs


Weak subroutines, heavy use of global variables


Perl’s syntax is not particularly appealing


4

What is a scripting language?


Operating systems can do many things


copy, move, create, delete, compare files


execute programs, including compilers


schedule activities, monitor processes, etc.


A command
-
line interface gives you access to
these functions, but only one at a time


A scripting language is a “wrapper” language
that integrates OS functions

5

Major scripting languages


UNIX has
sh
,
Perl


Macintosh has
AppleScript, Frontier


Windows has no major scripting languages


probably due to the weaknesses of DOS


Generic scripting languages include:


Perl
(most popular)


Tcl
(easiest for beginners)


Python
(new, Java
-
like, best for large programs)

6

Perl Example 1

#!/usr/local/bin/perl

#

# Program to do the obvious

#

print 'Hello world.'; # Print a message

7

Comments on “Hello, World”


Comments are

#
to end of line


But the first line,
#!/usr/local/bin/perl
, tells where to
find the Perl compiler on your system


Perl statements end with semicolons


Perl is case
-
sensitive


Perl is compiled and run in a single operation

8

Variables


A variable is a name of a place where some information is stored. For
example:



$yearOfBirth = 1976;



$currentYear = 2000;



$age = $currentYear
-
$yearOfBirth;



print $age;



Same name can store strings:



$yearOfBirth = ‘None of your business’;



The variables in the example program can be identified as such because their
names start with a dollar ($). Perl uses different prefix characters for structure
names in programs. Here is an overview:



$: variable containing scalar values such as a number or a string


@: variable containing a list with numeric keys


%: variable containing a list with strings as keys


&: subroutine


9

Operations on numbers



Perl contains the following arithmetic operators:


+: sum


-
: subtraction


*: product


/: division


%: modulo division


**: exponent



Apart from these operators, Perl contains some built
-
in arithmetic
functions. Some of these are mentioned in the following list:



abs($x): absolute value


int($x): integer part


rand(): random number between 0 and 1


sqrt($x): square root



10

Test your understanding



$text =~ s/bug/feature/;



$text =~ s/bug/feature/g;



$text =~ tr/[A
-
Z]/[a
-
z]/;



$text =~ tr/AEIOUaeiou//d;



$text =~ tr/[0
-
9]/x/cs;



$text =~ s/[A
-
Z]/CAPS/g;


11

Examples


# replace first occurrence of "bug"


$text =~ s/bug/feature/;



# replace all occurrences of "bug"


$text =~ s/bug/feature/g;



# convert to lower case


$text =~ tr/[A
-
Z]/[a
-
z]/;



# delete vowels


$text =~ tr/AEIOUaeiou//d;



# replace nonnumber sequences with a single x


$text =~ tr/[0
-
9]/x/cs;



# replace each capital character by CAPS


$text =~ s/[A
-
Z]/CAPS/g;


12

Regular expressions



\
b: word boundaries


\
d: digits


\
n: newline


\
r: carriage return


\
s: white space characters


\
t: tab


\
w: alphanumeric characters


^: beginning of string


$: end of string


.: any character


[bdkp]: characters b, d, k and p


[a
-
f]: characters a to f


[^a
-
f]: all characters except a to f


abc|def: string abc or string def




*: zero or more times


+: one or more times


?: zero or one time


{p,q}: at least p times and at most q times


{p,}: at least p times


{p}: exactly p times



Examples:

1. Clean an HTML formatted text



2. Grab URLs from a Web page



3. Transform all lines from a file into

lower case



13

Lists and arrays



@a = (); # empty list



@b = (1,2,3); # three numbers



@c = ("Jan","Piet","Marie"); # three strings



@d = ("Dirk",1.92,46,"20
-
03
-
1977"); # a mixed list



Variables and sublists are interpolated in a list


@b = ($a,$a+1,$a+2); # variable interpolation


@c = ("Jan",("Piet","Marie")); # list interpolation


@d = ("Dirk",1.92,46,(),"20
-
03
-
1977"); # empty list



# don’t get lists containing lists


just a simple list


@e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie")







14

Lists and arrays




Practical construction operators


($x..$y)






@x = (1..6); # same as (1, 2, 3, 4, 5, 6)




@z = (2..5,8,11..13); # same as (2,3,4,5,8,11,12,13)



qw() "quote word" function




qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie").




15

Split


It takes a regular expression and a string, and splits the string into a list, breaking it into pieces at
places where the regular expression matches.



$string = "Jan Piet
\
nMarie
\
tDirk";

@list = split /
\
s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" )


# remember
\
s is a white space



$string = " Jan Piet
\
nMarie
\
tDirk
\
n"; # empty string at begin and end!!!

@list = split /
\
s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" )


$string = "Jan:Piet;Marie
---
Dirk"; # use any regular expression...


@list = split /[:;]|
---
/, $string; # yields ( "Jan","Piet","Marie","Dirk" )


$string = "Jan Piet"; # use an empty regular expression to split on letters


@letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")


16

More about arrays


@array = ("an","bert","cindy","dirk");



$length = @array; # $length now has the value 4




print $length; # prints 4



print $#array; # prints 3, last valid subscript



print $array[$#array] # prints "dirk"



print scalar(@array) # prints 4



17

Working with lists


Subscripts convert lists to strings

@array = ("an","bert","cindy","dirk");

print "The array contains $array[0] $array[1] $array[2] $array[3]";


# interpolate

print "The array contains @array";


function
join
STRING LIST
.

$string =
join

":", @array;

# $string now has the value "an:bert:cindy:dirk"


Iteration over lists

for( $i=0 ; $i<=$#array; $i++){


$item = $array[$i];


$item =~ tr/a
-
z/A
-
Z/;


print "$item ";

}


foreach
$item (@array){


$item =~ tr/a
-
z/A
-
Z/;


print "$item "; # prints a capitalized version of each item

}


18

More about arrays


multiple value assignments


($a, $b) = ("one","two");


($onething, @manythings) = (1,2,3,4,5,6)



# now $onething equals 1



# and @manythings = (2,3,4,5,6)


($array[0],$array[1]) = ($array[1],$array[0]);



# swap the first two



Pay attention to the fact that assignment to a variable first
evaluates the right hand
-
side of the expression, and then makes a
copy of the result



@array = ("an","bert","cindy","dirk");


@copyarray = @array; # makes a deep copy


$copyarray[2] = "XXXXX";


19

Manipulating lists and their elements PUSH



push
ARRAY LIST





appends the list to the end of the array.






if the second argument is a scalar rather than a list, it appends it as the last
item of the array.



@array = ("an","bert","cindy","dirk");


@brray = ("eve","frank");



push @array, @brray;


# @array is ("an","bert","cindy","dirk","eve","frank")



push @brray, "gerben";


# @brray is ("eve","frank","gerben")




20

Manipulating lists and their elements POP



pop
ARRAY

does the opposite of push. it removes the last item of
its argument list and returns it.


If the list is empty it returns undef.






@array = ("an","bert","cindy","dirk");



$item = pop @array;



# $item is "dirk" and @array is ( "an","bert","cindy")



shift
@array

removes the first element
-

works on the left end of the
list, but is otherwise the same as pop.



unshift (@array,

@newStuff)

puts stuff on the left side of the list,
just as push does for the right side.



21

Grep


grep
CONDITION LIST





returns a list of all items from list that satisfy some
condition.




For example:




@large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25)






@i_names = grep /i/, @array; # returns ("cindy","dirk")




22

map



map
OPERATION LIST




is an extension of grep, and performs an arbitrary operation on
each element of a list.




For example:



@array = ("an","bert","cindy","dirk");





@more = map $_ + 3, (1,2,4,8,16,25);


# returns (4,5,7,11,19,28)



@initials = map substr($_,0,1), @array;


# returns ("a","b","c","d")



23

Hashes (Associative Arrays)

-
associate keys with values


named with %

-
allows for almost instantaneous lookup of a
value

that is associated with some particular
key





Examples

if %wordfrequency is the hash table,

$wordfrequency{"the"} = 12731; # creates key "the", value 12731

$phonenumber{"An De Wilde"} = "+31
-
20
-
6777871";

$index{$word} = $nwords;

$occurrences{$a}++; # if this is the first reference,


# the value associated with $a will


# be increased from 0 to 1

24

Hash Operations

-
%birthdays = ("An","25
-
02
-
1975","Bert","12
-
10
-
1953","Cindy","23
-
05
-
1969","Dirk","01
-
04
-
1961");

-
#
fill the hash


-
%birthdays = (An => "25
-
02
-
1975", Bert => "12
-
10
-
1953", Cindy =>
"23
-
05
-
1969", Dirk => "01
-
04
-
1961" );

-
#
fill the hash; the same as above, but more explicit


-
@list = %birthdays; # make a list of the key/value pairs


-
%copy_of_bdays = %birthdays; #
copy a hash


25

Hashes (What if not there?)


-
Existing, Defined and true.


-
If the value for a key does not exist in the hash, the access
to it returns the undef value.


-
special test function exists(
HASHENTRY
), which returns
true if the hash key exists in the hash


-
if($hash{$key}){...}, or if(defined($hash{$key})){...}

-

return false

if the key $key has no associated
value

-
print "Exists
\
n" if exists $array{$key};


26

Perl Example 2

#!/ex2/usr/bin/perl

# Remove blank lines from a file

# Usage: singlespace < oldfile > newfile


while ($line = <STDIN>) {


if ($line eq "
\
n") { next; }


print "$line";

}

27

More Perl notes


On the UNIX command line;


<
filename

means to get input from this file


>
filename

means to send output to this file


In Perl,
<STDIN>
is the input file,
<STDOUT>

is the output
file


Scalar variables start with
$


Scalar variables hold strings or numbers, and they are
interchangeable


Examples:


$priority = 9;


$priority = '9';


Array variables start with
@

28

Perl Example 3

#!/usr/local/bin/perl

# Usage: fixm <filenames>

# Replace
\
r with
\
n
--

replaces input files


foreach $file (@ARGV) {


print "Processing $file
\
n";


if (
-
e "fixm_temp") { die "*** File fixm_temp already exists!
\
n"; }


if (!
-
e $file) { die "*** No such file: $file!
\
n"; }


open DOIT, "| tr
\
'
\
\
015'
\
'
\
\
012' < $file > fixm_temp"


or die "*** Can't: tr '
\
015' '
\
012' < $ file > $ fixm_temp

\
n";


close DOIT;


open DOIT, "| mv
-
f fixm_temp $file"


or die "*** Can't: mv
-
f fixm_temp $file
\
n";


close DOIT;

}

29

Comments on example 3


In
# Usage: fixm <filenames>
, the angle brackets just mean to supply a
list of file names here


In UNIX text editors, the
\
r

(carriage return) character usually shows up
as
^M

(hence the name
fixm_temp
)


The UNIX command
tr '
\
015' '
\
012'
replaces all
\
015

characters (
\
r
) with

\
012

(
\
n
) characters


The format of the
open

and
close

commands is:


open

fileHandle
,

fileName


close

fileHandle
,

fileName



"| tr
\
'
\
\
015'
\
'
\
\
012' < $file > fixm_temp"

says: Take input from
$file
,
pipe it to the
tr

command, put the output on

fixm_temp


30

Arithmetic in Perl

$a = 1 + 2; # Add 1 and 2 and store in $a

$a = 3
-

4; # Subtract 4 from 3 and store in $a

$a = 5 * 6; # Multiply 5 and 6

$a = 7 / 8; # Divide 7 by 8 to give 0.875

$a = 9 ** 10; # Nine to the power of 10, that is, 9
10

$a = 5
%

2; # Remainder of 5 divided by 2

++$a; # Increment $a and then return it

$a++; # Return $a and then increment it

--
$a; # Decrement $a and then return it

$a
--
; # Return $a and then decrement it

31

String and assignment operators

$a = $b . $c; # Concatenate $b and $c

$a = $b x $c; # $b repeated $c times


$a = $b; # Assign $b to $a

$a += $b; # Add $b to $a

$a
-
= $b; # Subtract $b from $a

$a .= $b; # Append $b onto $a

32

Single and double quotes


$a = 'apples';


$b = 'bananas';


print $a . ' and ' . $b;


prints:
apples and bananas


print '$a and $b';


prints:
$a and $b


print "$a and $b";


prints:
apples and bananas


33

Arrays


@food = ("apples", "bananas", "cherries");


But…



print $food[1];


prints
"bananas"



@morefood = ("meat", @food);


@morefood ==


("meat", "apples", "bananas", "cherries");


($a, $b, $c) = (5, 10, 20);

34

push

and
pop


push

adds one or more things to the end of a list


push (@food, "eggs", "bread");


push
returns the new length of the list


pop

removes and returns the last element


$sandwich = pop(@food);


$len = @food; # $len gets length of @food


$#food # returns index of last element

35

foreach

# Visit each item in turn and call it $morsel


foreach $morsel (@food)

{


print "$morsel
\
n";


print "Yum yum
\
n";

}

36

Tests


“Zero” is
false
. This includes:


0, '0', "0", '', ""


Anything not
false

is
true


Use

==
and

!=
for numbers,

eq
and

ne
for
strings


&&
,
||
, and
!
are
and
,
or
, and
not
, respectively.

37

for

loops


for
loops are just as in C or Java



for ($i = 0; $i < 10; ++$i)

{


print "$i
\
n";

}

38

while

loops

#!/usr/local/bin/perl

print "Password? ";

$a = <STDIN>;

chop $a; # Remove the newline at end

while ($a ne "fred")

{


print "sorry. Again? ";


$a = <STDIN>;


chop $a;

}

39

do..while

and

do..until

loops

#!/usr/local/bin/perl

do

{


print "Password? ";


$a = <STDIN>;


chop $a;

}

while ($a ne "fred");

40

if
statements

if ($a)

{


print "The string is not empty
\
n";

}

else

{


print "The string is empty
\
n";

}

41

if
-

elsif

statements

if (!$a)


{ print "The string is empty
\
n"; }

elsif (length($a) == 1)


{ print "The string has one character
\
n"; }

elsif (length($a) == 2)


{ print "The string has two characters
\
n"; }

else


{ print "The string has many characters
\
n"; }

42

Why Perl?


Two factors make Perl important:


Pattern matching/string manipulation


Based on regular expressions (REs)


REs are similar in power to those in Formal Languages…


…but have many convenience features


Ability to execute UNIX commands


Less useful outside a UNIX environment


43

Basic pattern matching


$sentence =~ /the/


True if
$sentence

contains
"the"


$sentence = "The dog bites.";

if ($sentence =~ /the/) #

is
false


…because Perl is case
-
sensitive


!~
is "does not contain"

44

RE special characters

. # Any single character except a newline


^

# The beginning of the line or string


$ # The end of the line or string


* # Zero or more of the last character


+ # One or more of the last character


? # Zero or one of the last character

45

RE examples

^
.*$ # matches the entire string


hi.*bye # matches from "hi" to "bye" inclusive


x +y # matches x, one or more blanks, and y


^
Dear # matches "Dear" only at beginning


bags? # matches "bag" or "bags"


hiss+ # matches "hiss", "hisss", "hissss", etc.

46

Square brackets

[qjk] # Either q or j or k


[
^
qjk] # Neither q nor j nor k


[a
-
z] # Anything from a to z inclusive


[
^
a
-
z] # No lower case letters


[a
-
zA
-
Z] # Any letter


[a
-
z]+ # Any non
-
zero sequence of


# lower case letters

47

More examples

[aeiou]+ # matches one or more vowels


[
^
aeiou]+ # matches one or more nonvowels


[0
-
9]+ # matches an unsigned integer


[0
-
9A
-
F] # matches a single hex digit


[a
-
zA
-
Z] # matches any letter


[a
-
zA
-
Z0
-
9_]+ # matches identifiers

48

More special characters

\
n # A newline

\
t # A tab

\
w # Any alphanumeric; same as [a
-
zA
-
Z0
-
9_]

\
W # Any non
-
word char; same as [^a
-
zA
-
Z0
-
9_]

\
d # Any digit. The same as [0
-
9]

\
D # Any non
-
digit. The same as [^0
-
9]

\
s # Any whitespace character

\
S # Any non
-
whitespace character

\
b # A word boundary, outside [] only

\
B # No word boundary

49

Quoting special characters

\
| # Vertical bar

\
[ # An open square bracket

\
) # A closing parenthesis

\
* # An asterisk

\
^

# A carat symbol

\
/ # A slash

\
\

# A backslash

50

Alternatives and parentheses

jelly|cream # Either jelly or cream


(eg|le)gs # Either eggs or legs


(da)+ # Either da or dada or


# dadada or...

51

The

$_
variable


Often we want to process one string repeatedly


The
$_
variable holds the
current string


If a subject is omitted,
$_
is assumed


Hence, the following are equivalent:


if ($sentence =~ /under/) …


$_ = $sentence; if (/under/) ...

52

Case
-
insensitive substitutions


s/london/London/i


case
-
insensitive substitution; will replace
london
,
LONDON
,
London
,
LoNDoN
, etc.


You can combine global substitution with case
-
insensitive substitution


s/london/London/gi

53

Remembering patterns


Any part of the pattern enclosed in parentheses
is assigned to the special variables
$1
,
$2
,
$3
,
…,
$9


Numbers are assigned according to the left
(opening) parentheses


"The moon is high" =~ /The (.*) is (.*)/


Afterwards,
$1 = "moon"

and
$2 = "high"

54

Dynamic matching


During the match, an early part of the match that
is tentatively assigned to
$1
,
$2
, etc. can be
referred to by

\
1,
\
2,

etc.


Example:


\
b.+
\
b

matches a single word


/(
\
b.+
\
b)
\
1/

matches repeated words


"Now is the the time" =~ /(
\
b.+
\
b)
\
1/


Afterwards,
$1 = "the"