Bioinformatics

tastelesscowcreekΒιοτεχνολογία

4 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

93 εμφανίσεις

Introduction to Perl

Bioinformatics

What is Perl?


Practical Extraction and Report
Language


A scripting language


Components


an interpreter


scripts: text files created by user describing
a sequence of steps to be performed by
the interpreter

Installation


Create a Perl directory under C:
\


Either


Download AP.msi from the course website
(
http://curry.ateneo.net/~jpv/BioInf07/
) and
execute (installs into C:
\
Perl directory)


Or download and unzip AP.zip into C:
\
Perl


Reset path variable first (or edit
C:
\
autoexec.bat) so that you can execute
scripts from MSDOS


C> path=%path%;c:
\
Perl
\
bin

Writing and Running

Perl Scripts


Create/edit script (extension: .pl)


C> edit first.pl





Execute script


C> perl first.pl


* Tip: place your scripts in a separate work directory

# my first script

print “Hello World”;

print “this is my first script”;

Perl Features


Statements


Strings


Numbers and Computation


Variables and Interpolation


Input and Output


Files


Conditions and Loops


Pattern Matching


Arrays and Lists

Statements


A Perl script is a sequence of
statements


Examples of statements

print “Type in a value”;

$value = <>;

$square = $value * $value;

print “The square is ”, $square, “
\
n”;

Comments


Lines that start with # are ignored by
the Perl interpreter

# this is a comment line


In a line, characters that follow # are
also ignored

$count = $count + 1; # increment $count


Strings


String


Sequence of characters


Text


In Perl, characters should be surrounded by
quotes


‘I am a string’


“I am a string”


Special characters specified through escape
sequences (preceded by a
\

)


“a newline
\
n and a tab
\
t”

Numbers


Integers specified as a sequence of
digits


6


453


Decimal numbers:


33.2


6.04E24 (scientific notation)


Variables


Variable: named storage for values
(such as strings and numbers)


Names preceded by a $


Sample use:

$count = 5; # assignment statement

$message = “Hello”; # another assignment

print $count; # print the value of a variable

Computation


Fundamental arithmetic operations:


+
-

* /


Others


** exponentiation


() grouping


Example (try this out as a Perl script)

$x = 4;

$y = 2;

$z = (3 + $x) ** $y;

print $z, “
\
n”;

Interpolation


Given the following script:

$x = “Smith”;

print “Good morning, Mr. $x”;

print ‘Good morning, Mr. $x’;


Strings quoted with “” perform expansions
on variables


escape characters like
\
n are also interpreted
when strings are quoted with “” but not when they
are quoted with ‘’

Input and Output


Output


print

function


Escape characters


Interpolation


Input


Bracket operator (e.g., $line = <>; )


Not typed (takes in strings or numbers)

Input Files


Opening a file


open INFILE, ’data.txt’;


Input


$line = <INFILE>;


Closing a file


close INFILE;

Output Files


Opening


open OUTFILE, ’>result.txt’;


Or, open OUTFILE, ’>>result.txt’; #append


Input


print OUTFILE “Hello”;


Closing files


close OUTFILE;

Conditions


Can execute statements conditionally


Syntax:



Example:


if (
condition
)

if ( $num > 1000 )



{



{



statement



print “Large”;



statement


}






}

If
-

Else

$num = <>;

if ( $num > 1000 )

{


print “Large number
\
n”;

}

else

{


print “Small number
\
n”;

}

print “Thanks
\
n”;

Loops


Repetitive execution


Syntax:



Example:


while (
condition
)

$count = 0;



{



while ( $count < 10 )



statement


{



statement



print “counting
-
”, $count;








$count = $count + 1;



}



}

Conditions


(
expr symbol expr

)


Numbers

==


equal


<=

less than or equal


!=


not equal

>=

greater than or equal

<


less than

>


greater than


Strings

eq ne lt gt le ge

=~


pattern match

Functions


length $str


returns number of characters in $str


defined $str

tests if $str is a valid string




(useful for testing if $line=<>;




suceeded)


chomp $str

removes last character from $str




(useful because $line=<>; includes




the newline character)


print $var


displays $var on output device

Pattern Matching


<string> =~ <pattern>

is a condition that that checks if a string
matches a pattern


Simplest case: <pattern> specifies a search
substring

Example: if
(s =~ /bio/)



holds TRUE if s is “molecular biology”,
“bioinformatics”, “the bionic man”;

FALSE if s is “chemistry”, “bicycle”, “a BiOpsy”

Special pattern matching
characters


\
w


letters (word character)


\
d


digit


\
s


space character (space, tab
\
n)



if ( s =~ /
\
w
\
w
\
s
\
d
\
d
\
d/ ) …

holds TRUE for “CS 123 course”,

“Take Ma 101 today”

FALSE for “Only 1 number here”

Special pattern matching
characters


.

any character


^

beginning of string/line


$

end of string or line



if ( s =~ /^
\
d
\
d
\
d
\
ss..r/ ) …

holds TRUE for “300 spartans”

FALSE for “all 100 stars”

Groups and Quantifiers


[xyz] character set


| alternatives


* zero or more


+ 1 or more


? 0 or 1


{M} exactly M


{M,N} between M and N characters

NCBI file Example

/VERSION
\
s+(
\
S+)
\
s+GI:(
\
S+)/



Matches a version line


Parenthesis groups characters for future
retrieval


$1 stands for the first version number,

$2 gets the number after “GI:”