# BINF 634 Bioinformatics Programming

Biotechnologie

4 oct. 2013 (il y a 5 années et 5 mois)

100 vue(s)

BINF 634 Fall 2012
-

LECTURE06

1

Outline

Lab 1 (Quiz 3) Solution

Program 2

Scoping

Algorithm efficiency

Sorting

Hashes

Review for midterm

Quiz 4

Outline

BINF 634 Fall 2012
-

LECTURE06

2

Lab 1 Solution

1.
What is a pattern that matches the
substring
“world” occurring

anywhere in the input string, e.g.

hello cold cruel
world

hello
world
news tonight

hello
world
.pl is a script

Solution:

/world/

2. What is a pattern that matches the

word
“world” occurring anywhere in

the input string, e.g.

hello cold cruel
world

hello
world
news tonight

but not

helloworld.pl is a script

Solution:

/
\
bworld
\
b/

Lab1

BINF 634 Fall 2012
-

LECTURE06

3

Lab 1 Solution

3. What is a pattern that matches the

word
“world” only if occurs at the end

of the string, i.e

hello cold cruel
world

but not

next is
world
news tonight

hello cold cruelworld

Solution:

/
\
bworld
\
b\$/

4. What is a pattern that matches a

string that starts with the
word
“hello”

OR ends in the
word
“world”, e.g.

hello and good night

that’s all for tonight world

Solution:

/^
\
bhello
\
b|
\
bworld
\
b\$/

Lab1

BINF 634 Fall 2012
-

LECTURE06

4

Lab 1 Solution

5. What is a pattern that matches a

string that starts with the
word
“hello”

OR “bye”, AND ends with the
word

“world”, e.g.

bye cold cruel world

hello cold cruel world

but not

hello cold cruel world?

hello cold cruelworld

Solution:

/^
\
b(hello|bye)
\
b.+
\
bworld
\
b\$/

6. What is a pattern that matches a

substring
“world” occurring 1 or more

times at

the end of the line, e.g.

This string ends in world

This string ends in worldworld

This string ends in worldworldworld

Solution:

/(world)+\$/

Lab1

BINF 634 Fall 2012
-

LECTURE06

5

Lab 1 Solution

7.
What is a pattern that matches one

or more of backslashes immediately

Followed by one or more asterisks, e.g.

\
\
\
\
*****

but not

\
\
\
\
*****
\

Solution:

/
\
\
+
\
*+\$/

Lab1

BINF 634 Fall 2012
-

LECTURE06

6

Lab 1 Solution

8.
What is a pattern that matches any line of input

that has the same word repeated

two or more times
in a row
. In this problem, words

can be considered to be

sequences of letters
a
to
z, A
to
Z
, digits, and

underscores. Whitespace between

words may differ, e.g.

Paris in the the spring

I thought that that was the problem

For this example you will need to use
backreferences.
A

backreference is a reference to a string captured with

parentheses. (Recall that in Perl, captured

strings are referred to as
\$1,…,\$9
) In a regular expression,

you can refer to captured strings, while the pattern is being

matched, as
\
1,…
\
9
. For example,

/(AT)G(
\
1)/
matches a 5 character string
ATGAT
.

Note: Strictly speaking the inclusion of backreferences makes

the Perl pattern recognition language
nonregular.
Nevertheless,

we still refer to it as thePerl regular expression language.

Solution:

/
\
b(
\
S+)
\
b(
\
s+
\
1
\
b)+/

Understanding this

\
b #start at a word boundary (begin letters)

(
\
S+) #find chunk of nonwhite space

\
b #until another word boundary (end letters)

(
\
s+ #separated by some white space

\
1 #and that very same chunk again

\
b) #until another word boundary

+ #one or more sets of these

Lab1

Program 2 Solution

BINF 634 Fall 2012
-

LECTURE06

7

BINF 634 Fall 2012
-

LECTURE06

8

Be Careful With Scope

#!/
usr
/bin/
perl

use strict;

use warnings;

my \$x = 23;

print "value in main body is \$x
\
n";

mysub
(\$x);

print "value in main body is \$x
\
n";

exit;

sub
mysub
{

print "value in subroutine is \$x
\
n";

\$x=33;

}

value in main body is 23

value in subroutine is 23

value in main body is 33

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x
\
n";

mysub(\$x);

print "value in main body is \$x
\
n";

exit;

}

sub mysub{

print "value in subroutine is \$x
\
n";

\$x=33;

}

This will not compile

Scoping

BINF 634 Fall 2012
-

LECTURE06

9

Be Careful With Scope (cont.)

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x
\
n";

mysub(\$x);

print "value in main body is \$x
\
n";

exit;

}

sub mysub{

my(\$x) = @_;

\$x=33;

print "value in subroutine is \$x
\
n";

}

value in main body is 23

value in subroutine is 33

value in main body is 23

Scoping

Data Structures and Algorithm Efficiency

# An inefficient way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

for my \$i (@a) {

for my \$j (@b) {

if (\$i eq \$j) {

push @intersection, \$i;

last;

}

}

}

print "@intersection
\
n";

exit;

Output:

A C D G H I J K X Z

Algorithm is O(N
2
)

N = size of Lists

Algorithm Efficiency

10

BINF 634 Fall 2012
-

LECTURE06

Algorithm is O(N)

N = size of Lists

Data Structures and Algorithm Efficiency

# A better way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

# "mark" each item in @a

my %mark = ();

for my \$i (@a) { \$mark{\$i} = 1 }

# intersection = any "marked" item in @b

for my \$j (@b) {

if (exists \$mark{\$j}) {

push @intersection, \$j;

}

}

print "@intersection
\
n";

exit;

Output:

A C D G H I J K X Z

version 2

version 1

Algorithm Efficiency

11

BINF 634 Fall 2012
-

LECTURE06

BINF 634 Fall 2012
-

LECTURE06

12

Demonstration

Unix commands:

/usr/bin/time

diff

cmp

%
wc
-
l list1 list2

24762 list1

12381 list2

37143 total

% /usr/bin/time intersect1.pl list1 list2 > out1

22.91 real 22.88 user 0.02 sys

% /usr/bin/time intersect2.pl list1 list2 > out2

0.06 real 0.05 user 0.00 sys

22.88/.05 = 458

Algorithm Efficiency

BINF 634 Fall 2012
-

LECTURE06

13

Hashes and Efficiency

Hashes provide a very fast way to look up information associated with
a set of scalar values (keys)

Examples:

Count how many time each word appears in a file

Also: whether or not a certain work appeared in a file

Count how many time each codon appears in a DNA sequence

Whether a given codon appears in a sequence

How many time an item appears in a given list

Intersections

Hashes

BINF 634 Fall 2012
-

LECTURE06

14

Examples

1.
Write a subroutine get_intersection(
\
@a,
\
@b) that
returns the intersection of two lists.

2.
Write a subroutine first_list_only(
\
@a,
\
@b) that
returns the items that are in list @a but not in @b.

3.
Write a subroutine unique(@a) that return the unique
items in list @a (that is, remove the duplicates).

4.
Write a subroutine dups(\$n, @a) that returns a list of
items that appear in @a at least \$n times.

Hashes

BINF 634 Fall 2012
-

LECTURE06

15

Sorting

sort
LIST
--

returns list sorted in
string order

sort
BLOCK LIST
--

compares according to BLOCK

sort

USERSUB LIST
--

compares according subroutine
SUB

Sorting

BINF 634 Fall 2012
-

LECTURE06

16

Sorting Our First Attempt

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2,
111);

my(@sorted) = sort @unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

17 8 2 111

111 17 2 8

Sorting

BINF 634 Fall 2012
-

LECTURE06

17

The Comparison Operator

1. \$a <=> \$b returns 0 if equal, 1 if \$a > \$b,
-
1 if
\$a < \$b

2. The "cmp" operator gives similar results for
strings

3. \$a and \$b are special global variables:

do NOT declare with "my" and do NOT modify.

Sorting

BINF 634 Fall 2012
-

LECTURE06

18

Sorting Numerically

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort { \$a <=> \$b }@unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2012
-

LECTURE06

19

Sorting Using a Subroutine

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort numerically
@unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2012
-

LECTURE06

20

Sorting Descending

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@reversesorted) = reverse sort
numerically @unsorted;

print "@unsorted
\
n";

print "@reversesorted
\
n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

111 17 8 2

Sorting

BINF 634 Fall 2012
-

LECTURE06

21

Sorting DNA by Length

!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

## Sort @dna by length:

@dna = sort { length(\$a) <=> length(\$b) }@dna;

print "@dna
\
n"; # Output: GT TTTT CTCAT TATAATG

exit;

}

Output:

GT TTTT CTCAT
TATAATG

Sorting

BINF 634 Fall 2012
-

LECTURE06

22

Sorting DNA by Number of T’s
(Largest First)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT
CTCAT /;

@dna = sort { (\$b =~ tr/Tt//) <=>
(\$a =~ tr/Tt//) } @dna;

print "@dna
\
n"; # Output: TTTT
TATAATG CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2012
-

LECTURE06

23

Sorting DNA by Number of T’s
(Largest First) (Take 2)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

@dna = reverse sort {

(\$a =~ tr/Tt//) <=> (\$b =~ tr/Tt//) }
@dna;

print "@dna
\
n"; # Output: TTTT TATAATG
CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2012
-

LECTURE06

24

Sorting Strings Without Regard to
Case

#!/usr/bin/perl

use strict;

use warnings;

{

# Sort strings without regard
to case:

my(@unsorted) = qw/ mouse Rat
HUMAN eColi /;

my(@sorted) = sort { lc(\$a)
cmp lc(\$b) } @unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

mouse Rat HUMAN eColi

eColi HUMAN mouse Rat

Sorting

BINF 634 Fall 2012
-

LECTURE06

25

Sorting Hashes by Value

#!/usr/bin/perl

use strict;

use warnings;

{

my(%sales_amount) = ( auto=>100,
kitchen=>2000, hardware=>200
);

sub bysales { \$sales_amount{\$b}
<=> \$sales_amount{\$a} }

for my \$dept (sort bysales keys
%sales_amount) {

printf "%s:
\
t%4d
\
n", \$dept,
\$sales_amount{\$dept};

}

exit;

}

Output:

kitchen:2000

hardware: 200

auto: 100

Sorting

BINF 634 Fall 2012
-

LECTURE06

26

Review for Midterm BINF634

Material

Tisdall Chapters 1
-
9

Wall Chapter 5

Lecture notes

The exam will be open book and notes

You cannot work together on it

You cannot use outside material

You will have the full period to take the midterm

You will be asked to program

Midterm

BINF 634 Fall 2012
-

LECTURE06

27

Some Example Questions

Given two DNA fragments contained in \$DNA1 and \$DNA2 how can we
concatenate these to make a third string \$DNA3?

Midterm

BINF 634 Fall 2012
-

LECTURE06

28

Some Example Questions

What does this line of code do?

\$RNA = ~ s/T/U/ig

Midterm

BINF 634 Fall 2012
-

LECTURE06

29

Some Example Questions

What does this statement do?

\$revcom =~ tr/ACGT/TGCA/;

Midterm

BINF 634 Fall 2012
-

LECTURE06

30

Some Example Questions

What do these four lines do?

@bases = (‘A’, ‘C’, ‘G’, ‘T’);

\$base1 = pop @bases;

unshift (@bases, \$base1);

print “@bases
\
n
\
n”;

Midterm

BINF 634 Fall 2012
-

LECTURE06

31

Some Example Questions

What does this code snippet do if COND is true

unless(COND){

#do something

}

Midterm

BINF 634 Fall 2012
-

LECTURE06

32

Some Example Questions

What does this code fragment do?

\$protein = join(‘’,@protein)

Midterm

BINF 634 Fall 2012
-

LECTURE06

33

Some Example Questions

What does this code fragment do?

\$myfile = “myfile”;

Open(MYFILE, “>\$myfile”)

Midterm

BINF 634 Fall 2012
-

LECTURE06

34

Some Example Questions

What does this code fragment do?

while(\$DNA =~ /a/ig){\$a++}

Midterm

BINF 634 Fall 2012
-

LECTURE06

35

Some Example Questions

What is the effect of using the command

use strict;

at the beginning of your program?

Midterm

BINF 634 Fall 2012
-

LECTURE06

36

Some Example Questions

What is contained in the reserved variable
\$0

and

in the array
@ARGV ?

Midterm

BINF 634 Fall 2012
-

LECTURE06

37

Some Example Questions

What is the difference between “pass by value” and “pass by
reference” ?

Midterm

BINF 634 Fall 2012
-

LECTURE06

38

Some Example Questions

What is a pointer and what does it mean to dereference a pointer?

Midterm

BINF 634 Fall 2012
-

LECTURE06

39

Some Example Questions

How do you invoke perl with the debugger?

Midterm

BINF 634 Fall 2012
-

LECTURE06

40

Some Example Questions

Given an array @verbs what is going on here?

\$verbs[rand @verbs]

Midterm

For the Curious Regarding Data
Structures and Their Implications

Niklaus

Wirth, Algorithms + Data
Structures = Programs, Prentice Hall
1976.

Dated in terms of language, Pascal, but very well written and
understandable

BINF 634 Fall 2012
-

LECTURE06

41