Hashes and Sorting

crashclappergapSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

64 views

BINF 634 Fall 2013
-

LECTURE06

1

Outline



Lab 1 Solution



Program 2



Scoping



Algorithm efficiency



Sorting



Hashes



Review for midterm



Quiz 3





Outline

Lab 1 Solution

BINF634 Fall 2013 Regular Expression Lab (Key)

All problems except number 9 are worth 11 points. Number 9 is worth 12 points.



1) Write a PERL regular expression that would match
only
the strings: “bat”, “at”,
and “t”.



/^
b?a?t
$/







2) Write a PERL regular expression to recognize
any string
that contains the
substring “
jeff
”.

/
jeff
/




BINF 634 Fall 2013
-

LECTURE06

2

Lab 1 Solution

3) Write a PERL regular expression that would match the strings: “bat”, “
baat
”,

baaat
”, “baa…
aat
”, etc. (strings that start with b, followed by one or more a’s,
ending with a t).

/^
ba+t
$/


4) Write a PERL regular expression that matches the strings: “hog”, “Hog”, “
hOg
”,
“HOG”, “
hOG
”, etc. (That is, “hog” written in any combination of uppercase or
lowercase letters.)

/^[
hH
][
oO
][
Gg
]$/


5) Write a PERL regular expression that matches any positive number (with or
without a decimal point). Hint #1: if there is a decimal point, there must be at least
one digit following the decimal point. Hint #2: Since the dot “.” matches any
character, you must use
\
. to match a decimal point.



/^
\
d+(
\
.
\
d+)?$/



BINF 634 Fall 2013
-

LECTURE06

3

Lab 1 Solution

6) Write a PERL regular expression to match any integer that doesn’t end in 8.

/^
\
d*[^8]$/



7) Write a PERL regular expression to match any line with exactly two words (or
numbers) separated by any amount of whitespace (spaces or tabs). There may or
may not be whitespace at the beginning or end of the line.



^
\
s*
\
w+
\
s+
\
w+
\
s*$


BINF 634 Fall 2013
-

LECTURE06

4


Program 2 Discussions


Questions on Program 2?



Discussions on the permute function

BINF 634 Fall 2013
-

LECTURE06

5

BINF 634 Fall 2013
-

LECTURE06

6

Be Careful With Scope

#!/
usr
/bin/
perl


use strict;

use warnings;


my $x = 23;


print "value in main body is $x
\
n";

mysub
($x);

print "value in main body is $x
\
n";

exit;



sub
mysub
{


print "value in subroutine is $x
\
n";

$x=33;



}


value in main body is 23

value in subroutine is 23

value in main body is 33

#!/usr/bin/perl

use strict;

use warnings;

{

my $x = 23;


print "value in main body is $x
\
n";

mysub($x);

print "value in main body is $x
\
n";

exit;

}


sub mysub{


print "value in subroutine is $x
\
n";

$x=33;



}


This will not compile

Scoping

BINF 634 Fall 2013
-

LECTURE06

7

Be Careful With Scope (cont.)

#!/usr/bin/perl

use strict;

use warnings;

{

my $x = 23;


print "value in main body is $x
\
n";

mysub($x);

print "value in main body is $x
\
n";

exit;

}


sub mysub{

my($x) = @_;


$x=33;


print "value in subroutine is $x
\
n";

}

value in main body is 23

value in subroutine is 33

value in main body is 23

Scoping

Data Structures and Algorithm Efficiency


# An inefficient way to compute intersections


my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();


for my $i (@a) {


for my $j (@b) {


if ($i eq $j) {


push @intersection, $i;


last;


}


}

}

print "@intersection
\
n";

exit;


Output:

A C D G H I J K X Z

Algorithm is O(N
2
)

N = size of Lists

Algorithm Efficiency

8

BINF 634 Fall 2013
-

LECTURE06

Algorithm is O(N)

N = size of Lists

Data Structures and Algorithm Efficiency



# A better way to compute intersections


my @a = qw/ A B C D E F G H I J K X Y Z /;

my @b = qw/ Q R S A C D T U G H V I J K X Z /;

my @intersection = ();


# "mark" each item in @a

my %mark = ();

for my $i (@a) { $mark{$i} = 1 }


# intersection = any "marked" item in @b

for my $j (@b) {


if (exists $mark{$j}) {



push @intersection, $j;


}

}

print "@intersection
\
n";

exit;


Output:

A C D G H I J K X Z

version 2

version 1

Algorithm Efficiency

9

BINF 634 Fall 2013
-

LECTURE06

BINF 634 Fall 2013
-

LECTURE06

10

Demonstration


Unix commands:


/usr/bin/time


head


diff


cmp


%
wc
-
l list1 list2


24762 list1


12381 list2


37143 total


% /usr/bin/time intersect1.pl list1 list2 > out1


22.91 real 22.88 user 0.02 sys


% /usr/bin/time intersect2.pl list1 list2 > out2


0.06 real 0.05 user 0.00 sys


22.88/.05 = 458


Algorithm Efficiency

BINF 634 Fall 2013
-

LECTURE06

11

Hashes and Efficiency


Hashes provide a very fast way to look up information associated with
a set of scalar values (keys)


Examples:


Count how many time each word appears in a file


Also: whether or not a certain work appeared in a file


Count how many time each codon appears in a DNA sequence


Whether a given codon appears in a sequence


How many time an item appears in a given list


Intersections

Hashes

BINF 634 Fall 2013
-

LECTURE06

12

Examples

1.
Write a subroutine get_intersection(
\
@a,
\
@b) that
returns the intersection of two lists.


2.
Write a subroutine first_list_only(
\
@a,
\
@b) that
returns the items that are in list @a but not in @b.


3.
Write a subroutine unique(@a) that return the unique
items in list @a (that is, remove the duplicates).


4.
Write a subroutine dups($n, @a) that returns a list of
items that appear in @a at least $n times.


Hashes

BINF 634 Fall 2013
-

LECTURE06

13

Sorting


sort
LIST
--

returns list sorted in
string order



sort
BLOCK LIST
--

compares according to BLOCK



sort

USERSUB LIST
--

compares according subroutine
SUB


Sorting

BINF 634 Fall 2013
-

LECTURE06

14

Sorting Our First Attempt

#!/usr/bin/perl

use strict;

use warnings;

{


my(@unsorted) = (17, 8, 2,
111);


my(@sorted) = sort @unsorted;



print "@unsorted
\
n";



print "@sorted
\
n";



exit;



}

Output:

17 8 2 111

111 17 2 8

Sorting

BINF 634 Fall 2013
-

LECTURE06

15

The Comparison Operator

1. $a <=> $b returns 0 if equal, 1 if $a > $b,
-
1 if
$a < $b



2. The "cmp" operator gives similar results for
strings



3. $a and $b are special global variables:



do NOT declare with "my" and do NOT modify.

Sorting

BINF 634 Fall 2013
-

LECTURE06

16

Sorting Numerically

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort { $a <=> $b }@unsorted;


print "@unsorted
\
n";


print "@sorted
\
n";


exit;



}

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2013
-

LECTURE06

17

Sorting Using a Subroutine

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@sorted) = sort numerically
@unsorted;


print "@unsorted
\
n";


print "@sorted
\
n";


exit;



}


sub numerically { $a <=> $b }

Output:

17 8 2 111

2 8 17 111

Sorting

BINF 634 Fall 2013
-

LECTURE06

18

Sorting Descending

#!/usr/bin/perl

use strict;

use warnings;

{

my(@unsorted) = (17, 8, 2, 111);

my(@reversesorted) = reverse sort
numerically @unsorted;


print "@unsorted
\
n";


print "@reversesorted
\
n";


exit;



}


sub numerically { $a <=> $b }

Output:

17 8 2 111

111 17 8 2

Sorting

BINF 634 Fall 2013
-

LECTURE06

19

Sorting DNA by Length

!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;


## Sort @dna by length:

@dna = sort { length($a) <=> length($b) }@dna;

print "@dna
\
n"; # Output: GT TTTT CTCAT TATAATG


exit;

}

Output:

GT TTTT CTCAT
TATAATG

Sorting

BINF 634 Fall 2013
-

LECTURE06

20

Sorting DNA by Number of T’s
(Largest First)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT
CTCAT /;


@dna = sort { ($b =~ tr/Tt//) <=>
($a =~ tr/Tt//) } @dna;

print "@dna
\
n"; # Output: TTTT
TATAATG CTCAT GT



exit;

}

Output:

TTTT TATAATG CTCAT GT

Sorting

BINF 634 Fall 2013
-

LECTURE06

21

Sorting DNA by Number of T’s
(Largest First) (Take 2)

#!/usr/bin/perl

use strict;

use warnings;

{

# Sorting strings:

my @dna = qw/ TATAATG TTTT GT CTCAT /;


@dna = reverse sort {


($a =~ tr/Tt//) <=> ($b =~ tr/Tt//) }
@dna;


print "@dna
\
n"; # Output: TTTT TATAATG
CTCAT GT



exit;

}

Output:

TTTT TATAATG CTCAT GT


Sorting

BINF 634 Fall 2013
-

LECTURE06

22

Sorting Strings Without Regard to
Case

#!/usr/bin/perl

use strict;

use warnings;

{

# Sort strings without regard
to case:

my(@unsorted) = qw/ mouse Rat
HUMAN eColi /;

my(@sorted) = sort { lc($a)
cmp lc($b) } @unsorted;


print "@unsorted
\
n";

print "@sorted
\
n";

exit;

}

Output:

mouse Rat HUMAN eColi

eColi HUMAN mouse Rat

Sorting

BINF 634 Fall 2013
-

LECTURE06

23

Sorting Hashes by Value

#!/usr/bin/perl

use strict;

use warnings;

{

my(%sales_amount) = ( auto=>100,
kitchen=>2000, hardware=>200
);

sub bysales { $sales_amount{$b}
<=> $sales_amount{$a} }

for my $dept (sort bysales keys
%sales_amount) {

printf "%s:
\
t%4d
\
n", $dept,
$sales_amount{$dept};

}


exit;

}

Output:

kitchen:2000

hardware: 200

auto: 100


Sorting

BINF 634 Fall 2013
-

LECTURE06

24

Review for Midterm BINF634


Material


Tisdall Chapters 1
-
9


Wall Chapter 5


Lecture notes



The exam will be open book and notes



You cannot work together on it



You cannot use outside material



You will have the full period to take the midterm



You will be asked to program



Midterm

BINF 634 Fall 2013
-

LECTURE06

25

Some Example Questions


Given two DNA fragments contained in $DNA1 and $DNA2 how can we
concatenate these to make a third string $DNA3?




Midterm

BINF 634 Fall 2013
-

LECTURE06

26

Some Example Questions


What does this line of code do?

$RNA = ~ s/T/U/ig

Midterm

BINF 634 Fall 2013
-

LECTURE06

27

Some Example Questions


What does this statement do?


$revcom =~ tr/ACGT/TGCA/;

Midterm

BINF 634 Fall 2013
-

LECTURE06

28

Some Example Questions


What do these four lines do?

@bases = (‘A’, ‘C’, ‘G’, ‘T’);

$base1 = pop @bases;

unshift (@bases, $base1);

print “@bases
\
n
\
n”;



Midterm

BINF 634 Fall 2013
-

LECTURE06

29

Some Example Questions


What does this code snippet do if COND is true

unless(COND){

#do something

}


Midterm

BINF 634 Fall 2013
-

LECTURE06

30

Some Example Questions


What does this code fragment do?

$protein = join(‘’,@protein)

Midterm

BINF 634 Fall 2013
-

LECTURE06

31

Some Example Questions


What does this code fragment do?

$myfile = “myfile”;

Open(MYFILE, “>$myfile”)

Midterm

BINF 634 Fall 2013
-

LECTURE06

32

Some Example Questions


What does this code fragment do?

while($DNA =~ /a/ig){$a++}



Midterm

BINF 634 Fall 2013
-

LECTURE06

33

Some Example Questions


What is the effect of using the command

use strict;


at the beginning of your program?



Midterm

BINF 634 Fall 2013
-

LECTURE06

34

Some Example Questions


What is contained in the reserved variable
$0

and

in the array
@ARGV ?

Midterm

BINF 634 Fall 2013
-

LECTURE06

35

Some Example Questions


What is the difference between “pass by value” and “pass by
reference” ?

Midterm

BINF 634 Fall 2013
-

LECTURE06

36

Some Example Questions


What is a pointer and what does it mean to dereference a pointer?

Midterm

BINF 634 Fall 2013
-

LECTURE06

37

Some Example Questions


How do you invoke perl with the debugger?

Midterm

BINF 634 Fall 2013
-

LECTURE06

38

Some Example Questions


Given an array @verbs what is going on here?


$verbs[rand @verbs]


Midterm

For the Curious Regarding Data
Structures and Their Implications


Niklaus

Wirth, Algorithms + Data
Structures = Programs, Prentice Hall
1976.


Dated in terms of language, Pascal, but very well written and
understandable


BINF 634 Fall 2013
-

LECTURE06

39