BINF 634 Bioinformatics Programming

tastelesscowcreekBiotechnology

Oct 4, 2013 (4 years and 1 month ago)

73 views

BINF 634 Fall 2012
-

LECTURE06

1

Outline



Lab 1 (Quiz 3) Solution



Program 2



Scoping



Algorithm efficiency



Sorting



Hashes



Review for midterm



Quiz 4





Outline

BINF 634 Fall 2012
-

LECTURE06

2

Lab 1 Solution

1.
What is a pattern that matches the
substring
“world” occurring

anywhere in the input string, e.g.


hello cold cruel
world

hello
world
news tonight

hello
world
.pl is a script


Solution:


/world/

2. What is a pattern that matches the

word
“world” occurring anywhere in

the input string, e.g.

hello cold cruel
world

hello
world
news tonight


but not

helloworld.pl is a script


Solution:


/
\
bworld
\
b/

Lab1

BINF 634 Fall 2012
-

LECTURE06

3

Lab 1 Solution

3. What is a pattern that matches the

word
“world” only if occurs at the end

of the string, i.e

hello cold cruel
world

but not


next is
world
news tonight

hello cold cruelworld


Solution:

/
\
bworld
\
b$/

4. What is a pattern that matches a

string that starts with the
word
“hello”

OR ends in the
word
“world”, e.g.


hello and good night

that’s all for tonight world


Solution:

/^
\
bhello
\
b|
\
bworld
\
b$/

Lab1

BINF 634 Fall 2012
-

LECTURE06

4

Lab 1 Solution

5. What is a pattern that matches a

string that starts with the
word
“hello”

OR “bye”, AND ends with the
word

“world”, e.g.

bye cold cruel world

hello cold cruel world


but not


hello cold cruel world?

hello cold cruelworld


Solution:

/^
\
b(hello|bye)
\
b.+
\
bworld
\
b$/

6. What is a pattern that matches a

substring
“world” occurring 1 or more

times at

the end of the line, e.g.

This string ends in world

This string ends in worldworld

This string ends in worldworldworld


Solution:

/(world)+$/

Lab1

BINF 634 Fall 2012
-

LECTURE06

5

Lab 1 Solution

7.
What is a pattern that matches one

or more of backslashes immediately

Followed by one or more asterisks, e.g.


\
\
\
\
*****


but not


\
\
\
\
*****
\


Solution:


/
\
\
+
\
*+$/


Lab1

BINF 634 Fall 2012
-

LECTURE06

6

Lab 1 Solution

8.
What is a pattern that matches any line of input

that has the same word repeated

two or more times
in a row
. In this problem, words

can be considered to be

sequences of letters
a
to
z, A
to
Z
, digits, and

underscores. Whitespace between

words may differ, e.g.


Paris in the the spring

I thought that that was the problem


For this example you will need to use
backreferences.
A

backreference is a reference to a string captured with

parentheses. (Recall that in Perl, captured

strings are referred to as
$1,…,$9
) In a regular expression,

you can refer to captured strings, while the pattern is being

matched, as
\
1,…
\
9
. For example,

/(AT)G(
\
1)/
matches a 5 character string
ATGAT
.

Note: Strictly speaking the inclusion of backreferences makes

the Perl pattern recognition language
nonregular.
Nevertheless,

we still refer to it as thePerl regular expression language.



Solution:

/
\
b(
\
S+)
\
b(
\
s+
\
1
\
b)+/


Understanding this

\
b #start at a word boundary (begin letters)

(
\
S+) #find chunk of nonwhite space

\
b #until another word boundary (end letters)

(
\
s+ #separated by some white space

\
1 #and that very same chunk again

\
b) #until another word boundary

+ #one or more sets of these


Lab1



Program 2 Solution

BINF 634 Fall 2012
-

LECTURE06

7

BINF 634 Fall 2012
-

LECTURE06

8

Be Careful With Scope

#!/
usr
/bin/
perl


use strict;

use warnings;


my $x = 23;


print "value in main body is $x
\
n";

mysub
($x);

print "value in main body is $x
\
n";

exit;



sub
mysub
{


print "value in subroutine is $x
\
n";

$x=33;



}


value in main body is 23

value in subroutine is 23

value in main body is 33

#!/usr/bin/perl

use strict;

use warnings;

{

my $x = 23;


print "value in main body is $x
\
n";

mysub($x);

print "value in main body is $x
\
n";

exit;

}


sub mysub{


print "value in subroutine is $x
\
n";

$x=33;



}


This will not compile

Scoping

BINF 634 Fall 2012
-

LECTURE06

9

Be Careful With Scope (cont.)

#!/usr/bin/perl

use strict;

use warnings;

{

my $x = 23;


print "value in main body is $x
\
n";

mysub($x);

print "value in main body is $x
\
n";

exit;

}


sub mysub{

my($x) = @_;


$x=33;


print "value in subroutine is $x
\
n";

}

value in main body is 23

value in subroutine is 33

value in main body is 23

Scoping

Data Structures and Algorithm Efficiency


# An inefficient way to compute intersections


my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();


for my $i (@a) {


for my $j (@b) {


if ($i eq $j) {


push @intersection, $i;


last;


}


}

}

print "@intersection
\
n";

exit;


Output:

A C D G H I J K X Z

Algorithm is O(N
2
)

N = size of Lists

Algorithm Efficiency

10

BINF 634 Fall 2012
-

LECTURE06

Algorithm is O(N)

N = size of Lists

Data Structures and Algorithm Efficiency



# A better way to compute intersections


my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();


# "mark" each item in @a

my %mark = ();

for my $i (@a) { $mark{$i} = 1 }


# intersection = any "marked" item in @b

for my $j (@b) {


if (exists $mark{$j}) {



push @intersection, $j;


}

}

print "@intersection
\
n";

exit;


Output:

A C D G H I J K X Z

version 2

version 1

Algorithm Efficiency

11

BINF 634 Fall 2012
-

LECTURE06

BINF 634 Fall 2012
-

LECTURE06

12

Demonstration


Unix commands:


/usr/bin/time


head


diff


cmp


%
wc
-
l list1 list2


24762 list1


12381 list2


37143 total


% /usr/bin/time intersect1.pl list1 list2 > out1


22.91 real 22.88 user 0.02 sys


% /usr/bin/time intersect2.pl list1 list2 > out2


0.06 real 0.05 user 0.00 sys


22.88/.05 = 458


Algorithm Efficiency

BINF 634 Fall 2012
-

LECTURE06

13

Hashes and Efficiency


Hashes provide a very fast way to look up information associated with
a set of scalar values (keys)


Examples:


Count how many time each word appears in a file


Also: whether or not a certain work appeared in a file


Count how many time each codon appears in a DNA sequence


Whether a given codon appears in a sequence


How many time an item appears in a given list


Intersections

Hashes

BINF 634 Fall 2012
-

LECTURE06

14

Examples

1.
Write a subroutine get_intersection(
\
@a,
\
@b) that
returns the intersection of two lists.


2.
Write a subroutine first_list_only(
\
@a,
\
@b) that
returns the items that are in list @a but not in @b.


3.
Write a subroutine unique(@a) that return the unique
items in list @a (that is, remove the duplicates).


4.
Write a subroutine dups($n, @a) that returns a list of
items that appear in @a at least $n times.


Hashes

BINF 634 Fall 2012
-

LECTURE06

15

Sorting


sort
LIST
--

returns list sorted in
string order



sort
BLOCK LIST
--

compares according to BLOCK



sort

USERSUB LIST
--

compares according subroutine
SUB


Sorting

BINF 634 Fall 2012
-

LECTURE06

16

Sorting Our First Attempt

#!/usr/bin/perl

use strict;

use warnings;

{


my(@unsorted) = (17, 8, 2,
111);


my(@sorted) = sort @unsorted;



print "@unsorted
\
n";



print "@sorted
\
n";



exit;



}

Output:

17 8 2 111

111 17 2 8

Sorting

BINF 634 Fall 2012
-

LECTURE06

17

The Comparison Operator

1. $a <=> $b returns 0 if equal, 1 if $a > $b,
-
1 if
$a < $b



2. The "cmp" operator gives similar results for
strings



3. $a and $b are special global variables:



do NOT declare with "my" and do NOT modify.

Sorting

BINF 634 Fall 2012
-

LECTURE06

18

Sorting Numerically

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort { $a <=> $b }@unsorted;


print "@unsorted
\
n";


print "@sorted
\
n";


exit;



}

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2012
-

LECTURE06

19

Sorting Using a Subroutine

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort numerically
@unsorted;


print "@unsorted
\
n";


print "@sorted
\
n";


exit;



}


sub numerically { $a <=> $b }

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2012
-

LECTURE06

20

Sorting Descending

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@reversesorted) = reverse sort
numerically @unsorted;


print "@unsorted
\
n";


print "@reversesorted
\
n";


exit;



}


sub numerically { $a <=> $b }

Output:

17 8 2 111

111 17 8 2

Sorting

BINF 634 Fall 2012
-

LECTURE06

21

Sorting DNA by Length

!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;


## Sort @dna by length:

@dna = sort { length($a) <=> length($b) }@dna;

print "@dna
\
n"; # Output: GT TTTT CTCAT TATAATG


exit;

}

Output:

GT TTTT CTCAT
TATAATG

Sorting

BINF 634 Fall 2012
-

LECTURE06

22

Sorting DNA by Number of T’s
(Largest First)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT
CTCAT /;


@dna = sort { ($b =~ tr/Tt//) <=>
($a =~ tr/Tt//) } @dna;

print "@dna
\
n"; # Output: TTTT
TATAATG CTCAT GT



exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2012
-

LECTURE06

23

Sorting DNA by Number of T’s
(Largest First) (Take 2)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;


@dna = reverse sort {


($a =~ tr/Tt//) <=> ($b =~ tr/Tt//) }
@dna;


print "@dna
\
n"; # Output: TTTT TATAATG
CTCAT GT



exit;

}

Output:

TTTT TATAATG CTCAT GT


Sorting

BINF 634 Fall 2012
-

LECTURE06

24

Sorting Strings Without Regard to
Case

#!/usr/bin/perl

use strict;

use warnings;

{

# Sort strings without regard
to case:

my(@unsorted) = qw/ mouse Rat
HUMAN eColi /;

my(@sorted) = sort { lc($a)
cmp lc($b) } @unsorted;


print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

mouse Rat HUMAN eColi

eColi HUMAN mouse Rat

Sorting

BINF 634 Fall 2012
-

LECTURE06

25

Sorting Hashes by Value

#!/usr/bin/perl

use strict;

use warnings;

{

my(%sales_amount) = ( auto=>100,
kitchen=>2000, hardware=>200
);

sub bysales { $sales_amount{$b}
<=> $sales_amount{$a} }

for my $dept (sort bysales keys
%sales_amount) {

printf "%s:
\
t%4d
\
n", $dept,
$sales_amount{$dept};

}


exit;

}

Output:

kitchen:2000

hardware: 200

auto: 100


Sorting

BINF 634 Fall 2012
-

LECTURE06

26

Review for Midterm BINF634


Material


Tisdall Chapters 1
-
9


Wall Chapter 5


Lecture notes



The exam will be open book and notes



You cannot work together on it



You cannot use outside material



You will have the full period to take the midterm



You will be asked to program



Midterm

BINF 634 Fall 2012
-

LECTURE06

27

Some Example Questions


Given two DNA fragments contained in $DNA1 and $DNA2 how can we
concatenate these to make a third string $DNA3?




Midterm

BINF 634 Fall 2012
-

LECTURE06

28

Some Example Questions


What does this line of code do?

$RNA = ~ s/T/U/ig

Midterm

BINF 634 Fall 2012
-

LECTURE06

29

Some Example Questions


What does this statement do?


$revcom =~ tr/ACGT/TGCA/;

Midterm

BINF 634 Fall 2012
-

LECTURE06

30

Some Example Questions


What do these four lines do?

@bases = (‘A’, ‘C’, ‘G’, ‘T’);

$base1 = pop @bases;

unshift (@bases, $base1);

print “@bases
\
n
\
n”;



Midterm

BINF 634 Fall 2012
-

LECTURE06

31

Some Example Questions


What does this code snippet do if COND is true

unless(COND){

#do something

}


Midterm

BINF 634 Fall 2012
-

LECTURE06

32

Some Example Questions


What does this code fragment do?

$protein = join(‘’,@protein)

Midterm

BINF 634 Fall 2012
-

LECTURE06

33

Some Example Questions


What does this code fragment do?

$myfile = “myfile”;

Open(MYFILE, “>$myfile”)

Midterm

BINF 634 Fall 2012
-

LECTURE06

34

Some Example Questions


What does this code fragment do?

while($DNA =~ /a/ig){$a++}



Midterm

BINF 634 Fall 2012
-

LECTURE06

35

Some Example Questions


What is the effect of using the command

use strict;


at the beginning of your program?



Midterm

BINF 634 Fall 2012
-

LECTURE06

36

Some Example Questions


What is contained in the reserved variable
$0

and

in the array
@ARGV ?

Midterm

BINF 634 Fall 2012
-

LECTURE06

37

Some Example Questions


What is the difference between “pass by value” and “pass by
reference” ?

Midterm

BINF 634 Fall 2012
-

LECTURE06

38

Some Example Questions


What is a pointer and what does it mean to dereference a pointer?

Midterm

BINF 634 Fall 2012
-

LECTURE06

39

Some Example Questions


How do you invoke perl with the debugger?

Midterm

BINF 634 Fall 2012
-

LECTURE06

40

Some Example Questions


Given an array @verbs what is going on here?


$verbs[rand @verbs]


Midterm

For the Curious Regarding Data
Structures and Their Implications


Niklaus

Wirth, Algorithms + Data
Structures = Programs, Prentice Hall
1976.


Dated in terms of language, Pascal, but very well written and
understandable


BINF 634 Fall 2012
-

LECTURE06

41