# Hashes and Sorting

Software and s/w Development

Dec 13, 2013 (4 years and 5 months ago)

79 views

BINF 634 Fall 2013
-

LECTURE06

1

Outline

Lab 1 Solution

Program 2

Scoping

Algorithm efficiency

Sorting

Hashes

Review for midterm

Quiz 3

Outline

Lab 1 Solution

BINF634 Fall 2013 Regular Expression Lab (Key)

All problems except number 9 are worth 11 points. Number 9 is worth 12 points.

1) Write a PERL regular expression that would match
only
the strings: “bat”, “at”,
and “t”.

/^
b?a?t
\$/

2) Write a PERL regular expression to recognize
any string
that contains the
substring “
jeff
”.

/
jeff
/

BINF 634 Fall 2013
-

LECTURE06

2

Lab 1 Solution

3) Write a PERL regular expression that would match the strings: “bat”, “
baat
”,

baaat
”, “baa…
aat
”, etc. (strings that start with b, followed by one or more a’s,
ending with a t).

/^
ba+t
\$/

4) Write a PERL regular expression that matches the strings: “hog”, “Hog”, “
hOg
”,
“HOG”, “
hOG
”, etc. (That is, “hog” written in any combination of uppercase or
lowercase letters.)

/^[
hH
][
oO
][
Gg
]\$/

5) Write a PERL regular expression that matches any positive number (with or
without a decimal point). Hint #1: if there is a decimal point, there must be at least
one digit following the decimal point. Hint #2: Since the dot “.” matches any
character, you must use
\
. to match a decimal point.

/^
\
d+(
\
.
\
d+)?\$/

BINF 634 Fall 2013
-

LECTURE06

3

Lab 1 Solution

6) Write a PERL regular expression to match any integer that doesn’t end in 8.

/^
\
d*[^8]\$/

7) Write a PERL regular expression to match any line with exactly two words (or
numbers) separated by any amount of whitespace (spaces or tabs). There may or
may not be whitespace at the beginning or end of the line.

^
\
s*
\
w+
\
s+
\
w+
\
s*\$

BINF 634 Fall 2013
-

LECTURE06

4

Program 2 Discussions

Questions on Program 2?

Discussions on the permute function

BINF 634 Fall 2013
-

LECTURE06

5

BINF 634 Fall 2013
-

LECTURE06

6

Be Careful With Scope

#!/
usr
/bin/
perl

use strict;

use warnings;

my \$x = 23;

print "value in main body is \$x
\
n";

mysub
(\$x);

print "value in main body is \$x
\
n";

exit;

sub
mysub
{

print "value in subroutine is \$x
\
n";

\$x=33;

}

value in main body is 23

value in subroutine is 23

value in main body is 33

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x
\
n";

mysub(\$x);

print "value in main body is \$x
\
n";

exit;

}

sub mysub{

print "value in subroutine is \$x
\
n";

\$x=33;

}

This will not compile

Scoping

BINF 634 Fall 2013
-

LECTURE06

7

Be Careful With Scope (cont.)

#!/usr/bin/perl

use strict;

use warnings;

{

my \$x = 23;

print "value in main body is \$x
\
n";

mysub(\$x);

print "value in main body is \$x
\
n";

exit;

}

sub mysub{

my(\$x) = @_;

\$x=33;

print "value in subroutine is \$x
\
n";

}

value in main body is 23

value in subroutine is 33

value in main body is 23

Scoping

Data Structures and Algorithm Efficiency

# An inefficient way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

for my \$i (@a) {

for my \$j (@b) {

if (\$i eq \$j) {

push @intersection, \$i;

last;

}

}

}

print "@intersection
\
n";

exit;

Output:

A C D G H I J K X Z

Algorithm is O(N
2
)

N = size of Lists

Algorithm Efficiency

8

BINF 634 Fall 2013
-

LECTURE06

Algorithm is O(N)

N = size of Lists

Data Structures and Algorithm Efficiency

# A better way to compute intersections

my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();

# "mark" each item in @a

my %mark = ();

for my \$i (@a) { \$mark{\$i} = 1 }

# intersection = any "marked" item in @b

for my \$j (@b) {

if (exists \$mark{\$j}) {

push @intersection, \$j;

}

}

print "@intersection
\
n";

exit;

Output:

A C D G H I J K X Z

version 2

version 1

Algorithm Efficiency

9

BINF 634 Fall 2013
-

LECTURE06

BINF 634 Fall 2013
-

LECTURE06

10

Demonstration

Unix commands:

/usr/bin/time

diff

cmp

%
wc
-
l list1 list2

24762 list1

12381 list2

37143 total

% /usr/bin/time intersect1.pl list1 list2 > out1

22.91 real 22.88 user 0.02 sys

% /usr/bin/time intersect2.pl list1 list2 > out2

0.06 real 0.05 user 0.00 sys

22.88/.05 = 458

Algorithm Efficiency

BINF 634 Fall 2013
-

LECTURE06

11

Hashes and Efficiency

Hashes provide a very fast way to look up information associated with
a set of scalar values (keys)

Examples:

Count how many time each word appears in a file

Also: whether or not a certain work appeared in a file

Count how many time each codon appears in a DNA sequence

Whether a given codon appears in a sequence

How many time an item appears in a given list

Intersections

Hashes

BINF 634 Fall 2013
-

LECTURE06

12

Examples

1.
Write a subroutine get_intersection(
\
@a,
\
@b) that
returns the intersection of two lists.

2.
Write a subroutine first_list_only(
\
@a,
\
@b) that
returns the items that are in list @a but not in @b.

3.
Write a subroutine unique(@a) that return the unique
items in list @a (that is, remove the duplicates).

4.
Write a subroutine dups(\$n, @a) that returns a list of
items that appear in @a at least \$n times.

Hashes

BINF 634 Fall 2013
-

LECTURE06

13

Sorting

sort
LIST
--

returns list sorted in
string order

sort
BLOCK LIST
--

compares according to BLOCK

sort

USERSUB LIST
--

compares according subroutine
SUB

Sorting

BINF 634 Fall 2013
-

LECTURE06

14

Sorting Our First Attempt

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2,
111);

my(@sorted) = sort @unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

17 8 2 111

111 17 2 8

Sorting

BINF 634 Fall 2013
-

LECTURE06

15

The Comparison Operator

1. \$a <=> \$b returns 0 if equal, 1 if \$a > \$b,
-
1 if
\$a < \$b

2. The "cmp" operator gives similar results for
strings

3. \$a and \$b are special global variables:

do NOT declare with "my" and do NOT modify.

Sorting

BINF 634 Fall 2013
-

LECTURE06

16

Sorting Numerically

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort { \$a <=> \$b }@unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2013
-

LECTURE06

17

Sorting Using a Subroutine

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort numerically
@unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2013
-

LECTURE06

18

Sorting Descending

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@reversesorted) = reverse sort
numerically @unsorted;

print "@unsorted
\
n";

print "@reversesorted
\
n";

exit;

}

sub numerically { \$a <=> \$b }

Output:

17 8 2 111

111 17 8 2

Sorting

BINF 634 Fall 2013
-

LECTURE06

19

Sorting DNA by Length

!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

## Sort @dna by length:

@dna = sort { length(\$a) <=> length(\$b) }@dna;

print "@dna
\
n"; # Output: GT TTTT CTCAT TATAATG

exit;

}

Output:

GT TTTT CTCAT
TATAATG

Sorting

BINF 634 Fall 2013
-

LECTURE06

20

Sorting DNA by Number of T’s
(Largest First)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT
CTCAT /;

@dna = sort { (\$b =~ tr/Tt//) <=>
(\$a =~ tr/Tt//) } @dna;

print "@dna
\
n"; # Output: TTTT
TATAATG CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2013
-

LECTURE06

21

Sorting DNA by Number of T’s
(Largest First) (Take 2)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;

@dna = reverse sort {

(\$a =~ tr/Tt//) <=> (\$b =~ tr/Tt//) }
@dna;

print "@dna
\
n"; # Output: TTTT TATAATG
CTCAT GT

exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2013
-

LECTURE06

22

Sorting Strings Without Regard to
Case

#!/usr/bin/perl

use strict;

use warnings;

{

# Sort strings without regard
to case:

my(@unsorted) = qw/ mouse Rat
HUMAN eColi /;

my(@sorted) = sort { lc(\$a)
cmp lc(\$b) } @unsorted;

print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

mouse Rat HUMAN eColi

eColi HUMAN mouse Rat

Sorting

BINF 634 Fall 2013
-

LECTURE06

23

Sorting Hashes by Value

#!/usr/bin/perl

use strict;

use warnings;

{

my(%sales_amount) = ( auto=>100,
kitchen=>2000, hardware=>200
);

sub bysales { \$sales_amount{\$b}
<=> \$sales_amount{\$a} }

for my \$dept (sort bysales keys
%sales_amount) {

printf "%s:
\
t%4d
\
n", \$dept,
\$sales_amount{\$dept};

}

exit;

}

Output:

kitchen:2000

hardware: 200

auto: 100

Sorting

BINF 634 Fall 2013
-

LECTURE06

24

Review for Midterm BINF634

Material

Tisdall Chapters 1
-
9

Wall Chapter 5

Lecture notes

The exam will be open book and notes

You cannot work together on it

You cannot use outside material

You will have the full period to take the midterm

You will be asked to program

Midterm

BINF 634 Fall 2013
-

LECTURE06

25

Some Example Questions

Given two DNA fragments contained in \$DNA1 and \$DNA2 how can we
concatenate these to make a third string \$DNA3?

Midterm

BINF 634 Fall 2013
-

LECTURE06

26

Some Example Questions

What does this line of code do?

\$RNA = ~ s/T/U/ig

Midterm

BINF 634 Fall 2013
-

LECTURE06

27

Some Example Questions

What does this statement do?

\$revcom =~ tr/ACGT/TGCA/;

Midterm

BINF 634 Fall 2013
-

LECTURE06

28

Some Example Questions

What do these four lines do?

@bases = (‘A’, ‘C’, ‘G’, ‘T’);

\$base1 = pop @bases;

unshift (@bases, \$base1);

print “@bases
\
n
\
n”;

Midterm

BINF 634 Fall 2013
-

LECTURE06

29

Some Example Questions

What does this code snippet do if COND is true

unless(COND){

#do something

}

Midterm

BINF 634 Fall 2013
-

LECTURE06

30

Some Example Questions

What does this code fragment do?

\$protein = join(‘’,@protein)

Midterm

BINF 634 Fall 2013
-

LECTURE06

31

Some Example Questions

What does this code fragment do?

\$myfile = “myfile”;

Open(MYFILE, “>\$myfile”)

Midterm

BINF 634 Fall 2013
-

LECTURE06

32

Some Example Questions

What does this code fragment do?

while(\$DNA =~ /a/ig){\$a++}

Midterm

BINF 634 Fall 2013
-

LECTURE06

33

Some Example Questions

What is the effect of using the command

use strict;

at the beginning of your program?

Midterm

BINF 634 Fall 2013
-

LECTURE06

34

Some Example Questions

What is contained in the reserved variable
\$0

and

in the array
@ARGV ?

Midterm

BINF 634 Fall 2013
-

LECTURE06

35

Some Example Questions

What is the difference between “pass by value” and “pass by
reference” ?

Midterm

BINF 634 Fall 2013
-

LECTURE06

36

Some Example Questions

What is a pointer and what does it mean to dereference a pointer?

Midterm

BINF 634 Fall 2013
-

LECTURE06

37

Some Example Questions

How do you invoke perl with the debugger?

Midterm

BINF 634 Fall 2013
-

LECTURE06

38

Some Example Questions

Given an array @verbs what is going on here?

\$verbs[rand @verbs]

Midterm

For the Curious Regarding Data
Structures and Their Implications

Niklaus

Wirth, Algorithms + Data
Structures = Programs, Prentice Hall
1976.

Dated in terms of language, Pascal, but very well written and
understandable

BINF 634 Fall 2013
-

LECTURE06

39