Programming in Perl

hollowtexicoΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 9 μήνες)

80 εμφανίσεις

Programming in Perl

Programming Languages and Uses in Bioinformatics
Perl, Python
Pros:
reformatting data files
reading, writing and parsing files
building web pages and database access
building work flow pipelines
can write scripts quickly; good for one-time use scripts
Cons:
higher memory usage
slower calculation performance
Java
Pros:
building applications with graphical user interfaces (GUI)
Very fast at calculations such as sequence alignments
good for developing long term use applications like Agilent Genomic Workbench
low memory usage
Cons:
requires more time to write
C,C++
Pros:
Very fast at calculations such as sequence alignments
low memory usage
Cons:
requires more time to write
If I can fail six times a minute, I’ll acquire knowledge six times faster
than a programmer who fails only once every minute.
Greg Voss, Polymorphic C
Dr. Dobb’s Journal, August 1994
Programming Development approach
write a small chunk of code
any errors?
yes
no
fix errors
run it
knowledge
applied
loop
knowledge
acquired
loop
Program Execution Basics
script starts from line 1
cd
# go to your home directory
mkdir 
genomics_lab/ws2
# create a directory called ws2
cd 
genomics_lab/ws2
# go to your ws2 directory
gedit
 hello_world.pl 
&

# create a file called hello_world.pl
#
&
sends process to the background
Programming in Perl
#!/usr/bin/perl
print


Hello World!
\n”;
 
# first line lets UNIX know that this is a Perl script

#
print
is a Perl function

#
\n
is the new line character
Save your file and exit gedit
UNIX commands
Perl scripts
Perl scripts
whoami
# show your user name
groups
# show all groups you belong to
­rw­rw­r­­   1 genomics genomics     6 Jan 29 22:07 hello_world.pl
Directories show up as
blue
font
non-executable files show up as
white
font

permissions - user group size date_modified file and directory names
drwxr­xr­x
user
group
other
r

= read
w
= write
x

= execute
d
= directory
-
= file
ls ­l
#show the permissions for all files
Change the permissions of your hello_world.pl script to executable
chmod
 
+x
 hello_world.pl
# change mode to executable
Directories show up as
blue
font
Executable files show up as
green
font
Now run your script from the UNIX command line (Terminal) using either of
these two commands
perl
 hello_world.pl
./hello_world.pl
­rwxr­xr­x   1 genomics genomics     6 Jan 29 22:07 
hello_world.pl
Variables in Perl
Variables are used to hold data. Different data types use different variable types.
There are three data structures in Perl:
scalar
,
array
and
hash
You should use a variable name that is informative of its contents.
Use lowercase characters and separate words with underscore character.
scalar variables
starts with $ and holds a single unit of data containing text, numbers, ...
$percent_gc = 85.3;
$day = “Monday”;
$book_title = “The cat in the hat”;
$today = $day;
$percent_gc = 85.3 + .7;
gedit
 hello_world.pl 
&

# edit your hello_world.pl script
Using scalar variables in Perl
#!/usr/bin/perl -w
$my_name =

Michael
”;
print


Hello $my_name!
\n”;
# $my_name is a variable that is assigned
# the value: Michael
# use -w to turn on warnings
Now run your script from the UNIX command line (Terminal)
./hello_world.pl
gedit
 hello_world.pl 
&

# edit your hello_world.pl script
Using arguments in Perl
#!/usr/bin/perl -w
$my_name =
shift
;
print


Hello $my_name!
\n”;
# $my_name is a variable that is assigned
# the value of the argument
# use -w to turn on warnings
Now run your script from the UNIX command line and use your name as an argument
./hello_world.pl Michael
gedit
 hello_world_loops.pl 
&

# create a new script
for Loops
#!/usr/bin/perl -w
$i =
shift ||
1
;
for (
1
..
$i
) {
$loop
= $_;
print


$loop
\t
Hello World!
\n”;
}
chmod +x 

hello_world_loops.pl
   
# make your script executable
Now run your script from the UNIX command line and use a number as an argument

./hello_world_loops.pl 9
# defaults to 1 if no argument is given
#
||
is syntax for an OR condition
#
$i
is often used for iteration
#
$_
is a Perl reserved variable which

contains what iteration the loop is on
for loops are used to loop through designated blocks code
i.e. executing a block of code a specified number of iterations
#
\t
is to insert a tab in the output
Conditional statements
if (condition) {
code;
}
if (condition) {
code;
}
elsif (different condition) {
different code;
}
else {
other code;
}
if (condition) {
code;
}
elsif (different condition) {
different code;
}
if (condition) {
code;
}
else {
other code;
}
# (condition) is either TRUE or FALSE
# TRUE
=
1
# FALSE
=
0
Assignment vs Conditional statements
#!/usr/bin/perl -w
$number
=
1
;
if (
$number
==
1
) {
print “
Number equals 1
\n”;
}
else {
print “
Number does not equal 1
\n”;
}
$number
=
1;
# you are assigning a value of 1 to $number
if ($number
==
1) { }
# you are comparing $number to the value of 1
use one
=
sign
use two
==
signs
Comparison Operators
==
comparing if numeric values are equal
eq
comparing if text strings are the same
!=
comparing if numeric values are not equal
ne
comparing if text strings are not the same
Other numeric comparisons
>
>=
<
<=
What is the symbol for not greater than?
!>
Write a script to output the following using a for loop
from 1 to 10 and conditional statements
1
2
3
4
five
6
7
8
9
ten
#!/usr/bin/perl -w
for (
1
..
10
) {
if ($_ ==
5
) {
print “
five
\t”;
}
elsif ($_ ==
10
) {
print “
ten
”;
}
else {
print “$_\t”;
}
}
print “\n”;
length( ) function in Perl
#!/usr/bin/perl -w
#this script will evaluate the length of a DNA sequence
$dna_sequence
=
“GAGTCGAATCGT”
;
$dna_length
= length(
$dna_sequence
);
#get DNA length
if (
$dna_length
>
10
) {
print “
DNA sequence is longer than 10 bases.
\n”;
}
elsif (
$dna_length
<
10
) {
print “
DNA sequence is shorter than 10 bases.
\n”;
}
else {
print “
DNA sequence is 10 bases long.
\n”;
}
comments about your script begin with
#
and can be placed on a line by themselves or
at the end of a line of code
substr( ) function in Perl
#!/usr/bin/perl -w
$dna_sequence = “GCATCGAATCGT”
;
#start at $dna_sequence character 0 and get 1 character(s)
$first_nucleotide
=

substr(
$dna_sequence, 0, 1
);
if (
$first_nucleotide
eq “
G
”) {
print “
G is the first base in the DNA sequence.
\n”;
}
else {
print “
G is not the first base in the DNA sequence.
\n”;
}
substr
position counting starts at 0
$dna_sequence = “GCATCGAATCGT”
;

position     0123...    11
substr(
$dna_sequence, 1, 3
);

# will give you CAT
Logical Expressions
if

(
$nucleotide
eq
“G”
)

{
$gc_count
=
$gc_count + 1
;
}
elsif

(
$nucleotide
eq
“C”
) {
$gc_count
=
$gc_count + 1
;
}
| |
is the syntax for OR
&&
is the syntax for AND
if

(
$nucleotide
eq
“G”
||
$nucleotide
eq
“C”
) {
$gc_count
=
$gc_count + 1
;
}
# we can combine two conditions to make our code more concise
Now this line is needed only once.
Logical OR syntax (two pipes with no space)
Logical Expressions
#!/usr/bin/perl -w
$chromosome = 1;
$gene_start = 160000000;
if

(
$chromosome
==
1
&&
$gene_start
>=
125000001
) {
print


Your gene is on the q arm of chromosome 1
\n”;
}
| |
is the syntax for OR
&&
is the syntax for AND
Logical AND syntax (two & characters with no space)
script to get RNA transcript of your DNA
#!/usr/bin/perl -w
#This script will transcribe DNA to RNA
$dna_sequence
= “
GCATCGAATCGT
”;
$dna_length
= length(
$dna_sequence
);
#get DNA length
for (
0
.. (
$dna_length - 1
)) {
$nucleotide
= substr(
$dna_sequence
, $_,
1
);
if (
$nucleotide
eq “
T
”) {
print “
U
”;
}
else {
print “
$nucleotide
”;
}
}
print “\n”;
exit
# exit the terminal session; Ctrl+d
Write a script to get %GC of a DNA sequence


get the DNA sequence as a command line argument


print the percent GC


hints:

increment a value each time a nucleotide matches

all of these lines of code do the same calculation

$gc_count = $gc_count + 1;

$gc_count += 1;

$gc_count++;
Make sure you understand the following Perl functions and variables:
for (1 .. 10)
\n
&&
if
$_
\t
||
elsif
else
==
substr( )
eq
length( )
To be completed before next class