PERL_language_notes - Tufts

helmetpastoralSoftware and s/w Development

Dec 13, 2013 (3 years and 6 months ago)

60 views

Perl language notes


1

Notes from
Mastering Perl for Bioinformatics

by James Tisdall. O’Reilly & Assoc’s 2003.

Web page for book = www.oreilly.com/catalog/mperlbio


Basics:

Run perl programs using
perl prog_name


Edit in MSWord saving as text only with line breaks.

Default file

extension is .pl. Modules or classes must use .pm.


Edit data files the same way, but use text only if inputs are longer than one line, like sequence files.


Begin files with ref to perl program
#!/usr/bin/perl;

Current folder & standard perl folders chec
ked when calls to access file made.

use ref /usr/etc/etc/folder_name;

allows other local folder to be accessed, or on command line by


perl
-
I/dir/dir/lastdir progam.pl


use warnings;
gives debugging info &
use strict;

forces
my

to define variables.


#
is
used for notes (rest of line after # is ignored by program)


Variables & functions to manipulate them:

$var

is
scalar

(auto parsed to # or text, use ‘###’ to treat # as text)

Can assign multiple things at once w/ ()’s ($a,$b,$c) = (1,2,3).

Arrays:

@arr = (
1, “2”, ‘cow’, “frog”, $var1)

makes 1D array. @a=
qw
(a b c) omits need for “” & ,.


Access elements by
$elem = $arr[0]

where 0 is 1
st

element


Using array where Perl expects a scalar value returns # of elements in array, e.g. if (@arr <5)


Can provide range

to get multiple elements in array @arr[1,2] @arr[0..4] @arr[3..$#arr] or even
[1../search_pattern/].


$count = @array
gives # of elements in array, or use
$#array:

gives position of last element


delete $array[2..4]

deletes array elements 2
-
4. Can work w
/ hash keys too.


exists $array[x]
true if array element (or $hash{key}) exists.

join expr list
joins elements of list/array separated by expr

pop array
remove & return last element of array

push @array list

put single or multiple elements of list onto en
d of array

reverse @array

reverses order or array (w/o sorting), in scalar context reverses string

shift @array
remove & return first element

sort(@arr)
sorts array (
reverse(@arr)

does reverse? without sorting?).

splice array, offset, ln, list
remove ln el
ements of array from offset & replace w/ list

unshift array, list

add list to beginning of array

Hashes:
%h = (‘key1’, ‘val1’, key2=>’val2’)
makes hash, w/ => equiv to “,” but allows you to not put
quotes around key2.


Access elements by
$value = $h{‘key_n
ame’}


Get all keys with
@keys = keys %h

or values with
@vals = values %h

my $var

defines $var locally within subroutine or file where my is called
-

outside of which it is removed
from memory UNLESS it is referred to by any subroutine (a “closure”). If th
at subroutine(s) is enclosed in
the same block it is the only way to access that variable… e.g.

{ my $var; sub up_var ($var++}; sub get_var{print “$var”} } often used in OO modules

Default variables (e.g. things passed to subroutines)
$_

(for 1) or

@_


$!

(error message),
$&

string returned from =~ binding functions.


@ARGV

is array of command line arguments to script.


Note for $obj=OO_Class
-
>new calls 1
st

element in @_ is ‘OO_Class’ & any subsequent $obj
-
>method(args) $obj (reference) is 1
st

argu
ment passed.

Variables must be initialized (assigned w/ =) before they can be used in calcs or print statements etc. For
scalars =’’ or 0 works, for arrays & hashes @a or %a= () OK. Or w/ refs $a = [] or {}.


Perl language notes


2

Closures

variables


References:

$ref =
\
$var

gi
ves ref to memory location of $var

$value_of_var = $$ref

(dereferenced by $, or @ or % for array or hash refs)

$array_ref = [0,1,2,3]
makes ref to anonomous array

access whole array with
@arr=@$array_ref

access element w/
$val =$$array_ref[0]
(returns 0) O
R
=$array_ref
-
>[0]

$hash_ref = {key => ‘val’, key2 =>’val2’}

makes ref to anon hash

access whole hash with
%$hash_ref
, or data w/
$$hash_ref{key}
OR
$hash_ref
-
>{key}


can, if desired, make this clearer with {}’s, e.g. $$ref equals ${$ref}

ref EXPR
if EXPR
is reference returns type of thing it points to e.g. SCALAR, ARRAY, REF, HASH or
OO_Class1 if it has been blessed by an OO module, else returns false.


Complex data structures:

Matrices: Can specify 1) directly with

$arr[x][y]=
, or…

2) by filling array wit
h refs to other arrays. Simplest
@arr = ([1,2],[3,4])

which puts anon arrays into array.
Better: define refs
$a=[1,2]; $b=[3,4]

& put in array
@arr=($a,$b)
. Doing this (I think) lets you pull back
elements normally as

$arr[x][y]


3) Most flexibly, by makin
g everything a reference, so
$arr=[[1,2],[3,4]]
or

$arr=[$a,$b]
. If so, need to
derefrence to access elements, as
$$arr[x][y] or $arr
-
>[x][y] or @{$arr
-
>[x]}

to get array from posit x.

If
everything is always a reference, it’s most flexible, allowing compl
ex mixed data structures, such as
$mess = [1,{k1=>’hi’,k2=>[“what”,”the”,”hell?”]},[1,2, [9,8.7]],”end”], that can be accessed like $mess
-
>[0]
gives 1, ${$mess
-
>[1]}{k1} gives “hi’, ${${$mess
-
>[1]}{k2}}[2] gives “hell?” & ${${$mess
-
>[2]}[2]}[0] is 9.
Most
sensibly these could also be written with many arrows $mess
-
>[1]
-
>{k2}
-
>[2] or @mess
-
>[1]
-
>{k2}.

Order appears to be from inside out or left to right. So pointer to top level array position 1
st
, etc.

Still darn confusing re when defreferencing is needed, e
tc.


Operators:

Logical:
not

(or

!
) (returns
true

if something is
false
) see If statement

and

(
&&
),
or

(
||
)
-

meaning either,
xor

(meaning only one not both)


Note statement1 or statement2 only executes statement 2 if 1 is false.

Comparison:
==
(for #’s) or

eq

(for strings),
!=
(or
ne
for strings) also
< (lt), <= (le), >= (ge), > (gt)

Assignament:
$a = $b
assigns value of $b to $a,
$a++

or

$a
--

(increment or decrement)

$a+=$b

($a=$a+$b), also
-
=,*=,/=
($a/$b),
**=
($a raised to $b) &
%=

remainder of $a/$b, fo
r
strings

$a.=$b

appends $b to $a,
$a x=$b

(repeat $a $b times)


Common programming functions:

die args
end the program printing args


for (intial condit; continue so long as this is true; do each iteration) {statements;}


e.g.
for($i=1;$i<10;$i++){}

**War
ning!! Must use semicolons!

also
foreach var (list or array) {block}

for each element in list/array passed to var

also
while(condition) {}
,
until(condition){}
&
do {block} while/until (condition)

next;

skips to next iteration early


if (logical test) {stat
ement;} elseif {statement;} else {statement;}
logical test fails if # =0, string eq “”,
or arrays & hashes are empty or if
false

special key returned, not
true

unless

same as
if(not test) {}


localtime
gives local time, useful for timestamps, also
gmtime

f
or greenwich mean


print OPTIONAL_FILEHANDLE “text “,”next text “, “text$variable
\
n”, @arr, “@arr”


note, in “”’s (rather than ‘ ‘) processes
\
n (newline)
\
t (tab) & $varaible contents

Perl language notes


3


@arr w/o “” no spaces, with “” spaces.
printf

allows formatting (woul
d need look up)


package Name;
Establishes package outside of which the same $var can be given different values.
Leave pkg when new package declaration made, or at end of {} or module where declaration made.
Values from each pkg can be returned by $Name::v
ar or $Name2::var, etc.


sub subroutine_name {}

can call with
subroutine_name(args)

feeding args to @_. Note old syntax
(still allowed) prepends & (e.g. &subroutine_name(args)). & is ignored.

Returns last thing assigned before last } or earlier if use
retu
rn (args);

Note subroutines are accessible even if hidden in a block of code that never executes & a global variable
referred to in a sub (not marked by my within the subroutine) is never closed due to going out of bounds.


Math & other simple functions:

a
bs number
returns absolute value,
atan2 Y,X
arcan Y/X,
cos $in_radians
,
exp EXPR
e to the EXPR,
hex EXPR

returns decimal val from hex,
int EXPR
integer,
log EXPR

natural log,
rand EXPR
pseudorandom val 0
-
EXPR or 0
-
1 (if no EXPR),
sin EXPR
,
sqrt EXPR
,


Fil
e handling:

Files can be passed to program w/
perl program file1 file2
(can omit perl?)

Or
open(FILEHANDLE, “file_name”);

accessed by @arr=<STDIN>

Can do (FH, “<”, “file_name”) to indicate input, “>” ouput or “>>”ouput append to existing

Lines of file is i
n <FILEHANDLE> array & accessed by
@arr= <FILEHANDLE>;

foreach $val <FILEHANDLE> {}

steps thru file assigning each line to val

When finished
close(FILEHANDLE);

read (FH, scalar, length, offset)
puts data of length from current position w/ optional offset
into scalar

rename oldname newname
to rename file

seek FILEHANDLE, OFFSET, WHENCE
posits file pointer to offset bytes, if whence 1 offset added to
current posit, if 2 offset subtracted from end (could use to reset pointer by
seek(FH,0)
??

tell FH

gives cur
rent position in bytes

Position in file can be reffed by range e.g. while <FH> {if (1.. /search pattern/) {next;}, where 1 is 1
st

line


Text handling & modification

Binding operators:

Search:

$a =~ /pattern/

returns pattern in $& special var if pattern in
$a (but doesn’t change $a?), same
as m/pattern.
pos $a
gives position in string where last m// search left off.

Substitute:

$a =~ s/pattern1/pattern2/

replaces 1
st

1 w/ 2

Transpose:

$a =~ tr/123/567/

converts all 1’s to 5’s etc. in A (can use for DNA compl
ent)

Modifiers: / … /
g

(match all instances), //
s

(let . match newline) //
I

(ignore up/low case),
d??

Special chars & ranges:

.

(any char)

\
s

(whitespace),
\
S

(nonwhitspace),
\
d

(digit 0
-
9)

[1234]

(any of this set),
[^1234]

(any not in set),
(wd1|wd2|wd3)

(any of these 3 words)

^
line start,
$

line end.

Groups indicated by multiple ()’s will output to $1, $2 etc. w/ 1
st

“(“ encountered
-
>$1.

Prepending
\

allows search for Metacharacters
\
|(){{^$*+? or . (e.g. /
\
\
/ finds “
\


Quantifiers:

*

(0 or more of thi
ng, e.g. x*),
+

(1 or more),
?

(0 or 1)
{3}

3,
{3,6}

3 to 6,
{3,}

3 or more.
Generally return max, so ‘ABCCCCD’ =~ /A.*C/ gives ABCCCC. To get shortest string append ? to
quantifier, so =~/A.*?C/ gives ABC.

chomp $str or list/array
removes terminal newlin
es from $str or array,
chop

removes last char

index string substring
returns position of 1
st

instance,
rindex

gives last instance

lc EXPR
returns lower case

length EXPR
gives length in characters

reverse $string

reverses

split /pattern/,$str
returns array

of $str split at every /pattern/. if pattern is omitted uses white spaces as
pattern

Perl language notes


4

substr($string,offset, length,replacement)

offset = start posit
-
1 (e.g. substr($s,0,1) gives 1
st

char.
Negative offset = distance from right end. Length omitted
-
> $str e
nd. Length negative, leave # chars to
end off. So… substr(“ABCDEF”, 2,
-
2) gives ‘CD’. If replacemnet specified, replace substr w/ it.

uc($str)

returns upper case of $str


Modules

Must end with
1;

as last statement and file name must end with .pm (e.g. mod
ule1.pm)

Made accesible by
use module1;
can specify subdiretory like so
use dir::subdir:module1

Subroutines in module called by program using
module1::subrout(args)


Built
-
in modules

AUTOLOAD; use vars ‘$AUTOLOAD’;
or
our $AUTOLOAD

in Perl 5.6 or greater,
then any call to an
undefined subroutine calls up autoload passing subroutine name, typically in “operation_attribute” form, in
$AUTOLOAD + any args)

sub AUTOLOAD {my ($self, @args) = @_; my ($operation, $attribute) = ($AUTOLOAD =~
/(get|set)(_
\
w+)$/); if
($operation eq ‘get’ AND exists $self
-
>{$attribute}) {

no strict ‘refs’; *{$AUTOLOAD} = sub {shift
-
>{$attribute}; no strict ‘refs’;

(this turns off strict briefly, uses * to put $AUTOLOAD value as subroutine in symbols table, uses shift (on
default @_ to
pull $obj & accesses $attribute key in $obj, then toggles off strict)

return $self
-
>$attribute; }
(performs proper subroutine function 1
st

time called, in later calls subroutine
will have been defined by AUTOLOAD).


Carp: use Carp

gives
carp(statement)
pr
ints more detailed error message &
croak(statement)

does
this & dies.


DESTROY

need not be defined by
use

statement, automatically removes any variable that is out of
scope (e.g. locally defined by my in {}’s). Can define
sub DESTROY {thing to do;}

to hav
e other things,
such as decreasing running count of data objects, called when DESTROY activates.


DB_File
:
use DB_File;

allows database file of hash to be stored in memory after exit from program, see
perlman DB_File for details. Uses
tie (%hash, ‘DB_File
’, $file_name, flags, mode, $DB_HASH)
where
‘DB_File’ & $DB_HASH must be verbatim, flags can be of type O_RDWR | O_CREAT & mode is 0444
(??). Apparently DB files are space delimited


CPAN modules

For more info use
perldoc CPAN
, for info about installed mod
ule
perldoc mod_dir::mod_name

Find modules by browsing www.cpan.org

Insall using
perl

MCPAN

e ‘install module_dir::module_name’



Objects, Classes, Methods & Object Oriented Programming


Object is a datastructure blessed by OO module (& thus part of a Cl
ass) by calling
new

Methods are defined in the class & are only “legit” way of accessing data in objects


Classes of objects are defind & managed in special module .pm files

Begin with
package Class1;

statement where Class1 is filename without .pm

After ot
her use calls then…

{ my %attribute_table = (_name => [‘default_value’, ‘permissions.e.g.read.write’, _dat=> etc.);

sub _all_attributes {keys %_attribute_table;} }
(this sub call w/in same {} sets closure on %_attribute
table, etc., note prepended undersc
ores indicates something accessible only in module & not intended to
be accessed by calling program)

sub new {


my ($class, % arg) = @_;
(1
st

arg when OO module sub called is always class name)


… (tests to confirm keys in %arg are defined in %attribute ta
ble etc.)

Perl language notes


5


return bless( { _name => $arg{name} || die, _dat => $arg{dat} }, $class || “?”);

}

(note, here use || or to give default in case of failure


sub get_name { $_[0]
-
> {_name} };

(a method: default is to return resulting value
-
>assoc’d with key
)

sub get_dat { $_[0]
-
> {_dat} };

(note, $obj
-
>get_dat call, receives $obj ref as 1
st

arg, then any others)

sub set_name{


my($input_obj, $name) = @_;


if $name { $input_obj
-
>{_name} = $name;

}

(changes ‘keyname’ that might be specified in calling program

to “_keyname”, which will prevent their
direct access except through subs in class1. Bless associates object with Class1 so that $obj
-
>subroutine() calls use Methods in Class1.)

1;

=head1 Documentation 1

Descriptors

=head1 more documentaiton

Examples etc.

=cut
(this is “
POD
”/”plain old documentation”; everyting between =head1 & = cut ignored by program,
but is called up with
perldoc module_name
)



Program that makes Class 1 objects & manipulates them

use Class1;

my $obj1 = Class1
-
>new(name=>’Thing1’,dat=
>1);
(Class1
-
>sub call passes “Class1” as 1
st

arg). $obj
is a blessed reference that if I call $obj1
-
>method, it will pass $obj to method in the Class1 module)

print $obj1
-
>get_name, “
\
n”;
(apparently this calls the name function in Class1 passing it an a
rray
containing the $obj1 reference and then any other input values… prints Thing1

$obj1
-
>set_name(“Thing2”);

print $obj1
-
>get_name, “
\
n”;
prints Thing2

print ref($obj1)
; prints Class1 (since it’s been blessed by class name)


Inheritance:

use base (“Module
_name”);

uses Module name as a base module. Any subs changed in current
module are used, otherwise module_name subroutines used as default.



Relational Databases & SQL