Recitation
12
Programming for Engineers in Python
Plan
Dynamic Programming
Coin Change problem
Longest Common Subsequence
Application to Bioinformatics
2
Teaching Survey
3
Please answer the teaching survey:
https://www.ims.tau.ac.il/Tal
/
This will help us to improve the course
Deadline:
4.2.12
Coin Change Problem
4
What
is the smallest number of coins I can use to make exact
change
?
Greedy solution: pick the largest coin first, until you reach
the change needed
In the US currency this works well:
Give change for
30
cents if you’ve got
1
,
5
,
10
, and
25
cent
coins:
25
+
5
→
2
coins
http://jeremykun.files.wordpress.com/
2012
/
01
/coins.jpg
The Sin of Greediness
5
What if you don’t have
5
cent coins?
You got
1
,
10
, and
25
Greedy solution:
25
+
1
+
1
+
1
+
1
+
1
→
6
coins
But a better solution
is:
10
+
10
+
10
→
3
coins!
So the greedy approach isn’t
optimal
The Seven Deadly Sins and the Four Last
Things by
Hieronymus
Bosch
http://en.wikipedia.org/wiki/File:Boschsevendeadlysins.jpg
Recursive Solution
6
Reminder
–
find the minimal # of coins needed to give exact
change with coins of specified values
Assume that we can use
1
cent coins so there is always
some
solution
Denote our coin list by c
1
, c
2
, …,
c
k
(c
1
=
1
)
k is the # of coins values we can use
Denote the change required by n
In the previous example:
n=
30
, k=
3
, c
1
=
1
, c
2
=
10
, c
3
=
25
Recursive Solution
7
Recursion Base:
If n=
0
then we need
0
coins
If k=
1
, c
1
=
1
, so we need n coins
Recursion Step:
If n<
c
k
we can’t use
c
k
→
We solve for n and c
1
,…,c
k

1
Otherwise, we can either use
c
k
or not use
c
k
If we use
c
k
→
we solve for n

c
k
and c
1
,…,
c
k
If we don’t use
c
k
→
we solve for n and c
1
,…,
c
k

1
Recursion Solution
8
def
coins_change_rec
(
cents_needed
,
coin_values
):
if
cents_needed
<=
0
:
# base
1
return
0
elif
len
(
coin_values
) ==
1
:
#
base
2
return
cents_needed
# assume that
coin_values
[
0
]==
1
elif
coin_values
[

1
] >
cents_needed
:
#
step
1
return
coins_change_rec
(
cents_needed
,
coin_values
[:

1
])
else
:
# step
2
s
1
=
coins_change_rec
(
cents_needed
,
coin_values
[:

1
] )
s
2
=
coins_change_rec
(
cents_needed

coin_values
[

1
],
coin_values
)
return
min
(s
1
, s
2
+
1
)
coins_rec.py
Repeated calls
9
We count how many times we call the recursive function for each set
of arguments:
calls = {}
def
coins_change_rec
(
cents_needed
,
coin_values
):
global
calls
calls[(
cents_needed
,
coin_values
)] =
calls.get
( (
cents_needed
,
coin_values
) ,
0
) +
1
…
>>>
print
'result
'
,
coins_change_rec
(
30
, (
1
,
5
,
10
,
25
))
result
2
>>>
print
'max
calls
'
,max
(
calls.values
())
max
calls
4
Dynamic Programing

Memoization
10
We want to store the values of calculation so we don’t repeat
them
We create a table called
mem
# of columns: # of cents needed +
1
# of rows: # of coin values +
1
The table is initialized with some illegal value
–
for example

1
:
mem
= [
[

1
for
y
in
range
(cents_needed+
1
)]
for
x
in
range
(
len
(
coin_values
))
]
Dynamic Programing

Memoization
11
For each call of the recursive function, we check if
mem
already has the answer:
if
mem
[
len
(
coin_values
)][
cents_needed
] ==

1
:
In case that it doesn’t (the above is
True
) we calculate it as
before, and we store the result, for example:
if
cents_needed
<=
0
:
mem
[
len
(
coin_values
)][
cents_needed
]
=
0
Eventually we return the value
return
mem
[
len
(
coin_values
)][
cents_needed
]
coins_mem.py
Dynamic Programing

Iteration
12
Another approach is to first build the entire matrix
This matrix holds the minimal number of coins we need to
get change for j cents using the first i coins (
c
1
, c
2
, …,
c
i
)
The solution will be
min_coins
[
k,n
]
–
the last element in the
matrix
This will save us the recursive calls, but will enforce us to
calculate all the values
apriori
Bottom

up
approach vs. the
top

down
approach of
memoization
Dynamic Programming approach
13
The point of this approach is that we have a recursive formula
to break apart a problem to sub problems
Then we can use different approaches to minimize the
number of calculations by storing the sub solutions in
memory
Bottom up

example matrix
14
Set n=
4
and k=
3
(coins are
1
,
2
, and
3
cents)
Base cases:
how many coins do I need to make change for zero cents?
Zero!
So
min_coins
[i,
0
]=
0
And how many pennies do I need to make
j
cents? Exactly
j
(we assumed we can use pennies)
So
min_coins
[
0
,j]=j
So the base cases give us:
0
1
2
3
4
0
?
?
?
?
0
?
?
?
?
Next
–
the recursion step
Bottom up

example matrix
15
For particular choice of
i,j
(but not i=
0
or j=
0
)
To determine
min_coins
[
i,j
]
–
the minimum # of coins to get
exact change of j using the first i coins
We can use the coin c
i
and add +
1
to
min_coins
[
i,j

c
i
] (only
valid if j>c
i
)
We can decide not to use
c
i
, therefore to use only c
0
,..,
c
i

1
,
and therefore
min_coins
[i

1
,j] .
So which way do we choose?
The one with the least coins!
min_coins
[
i,j
]
=
min(
min_coins
[
i,j

c
i
]
+
1
,
min_coins
[i

1
,j
])
Example matrix
–
recursion step
16
Set
n=
4
and
k=
3
(coins are
1
,
2
, and
3
cents)
So the base cases give us:
𝑀
=
0
1
2
3
4
0
1
1
2
2
0
1
1
1
2
M(
1
,
1
)=
1
M(
1
,
2
)=
1
M(
1
,
3
)=min(M(
1
,
1
)+
1
,M(
0
,
3
))=min(
2
,
2
)=
2
M(
1
,
4
)=min(M(
1
,
2
)+
1
, M(
0
,
4
))=min(
2
,
4
)=
2
etc…
coins_matrix.py
The code for the
matrix solution
and the idea is from
http://jeremykun.wordpress.com/
2012
/
01
/
12
/a

spoonful

of

python/
Longest Common Subsequence
17
Given two sequences (strings/lists) we want to find the
longest common subsequence
Definition
–
subsequence
: B is a subsequence of A if B can be
derived from A by removing elements from A
Examples
[
2
,
4
,
6
] is a subsequence of [
1
,
2
,
3
,
4
,
5
,
6
]
[
6
,
4
,
2
] is NOT
a subsequence of [
1
,
2
,
3
,
4
,
5
,
6
]
‘is’
is a
subsequence
of ‘
distance’
‘nice’
is NOT
a subsequence of
‘distance
’
Longest Common Subsequence
18
Given two subsequences (strings or lists) we want to find the
longest common subsequence:
Example for a LCS:
Sequence
1
:
H
U
MAN
Sequence
2
: C
H
I
M
P
AN
ZEE
Applications include:
BioInformatics
(next up)
Version Control
http://wordaligned.org/articles/longest

common

subsequence
The DNA
19
Our biological blue

print
A
sequence
made of four bases
–
A, G, C, T
Double strand:
A connects to T
G connects to C
Every triplet encodes for an
amino

acid
Example:
GAG
→
Glutamate
A chain of amino

acids is a
protein
–
the biological machine!
http://sips.inesc

id.pt/~nfvr/msc_theses/msc
09
b/
Longest common subsequence
20
The DNA changes:
Mutation: A
→
G, C
→
T, etc.
Insertion: AGC
→
A
T
GC
Deletion: AGC
→
A
‒
C
Given two non

identical sequences, we want to find the parts
that are common
So we can say how different they are
Which DNA is more similar to ours? The cat’s or the dog’s?
http://palscience.com/wp

content/uploads/
2010
/
09
/DNA_with_mutation.jpg
Recursion
21
An LCS of two sequences can be built from the
LCSes
of
prefixes of these sequences
Denote the sequences seq
1
and seq
2
Base
–
check if either sequence is empty:
If
len
(seq
1
) ==
0
or
len
(seq
2
) ==
0
:
return
[ ]
Step
–
build solution from shorter sequences:
If
seq
1
[

1
] == seq
2
[

1
]:
return
lcs
(seq
1
[:

1
],seq
2
[:

1
]) + [ seq
1
[

1
] ]
else
:
return
max
(
lcs
(seq
1
[:

1
],seq
2
),
lcs
(seq
1
,seq
2
[:

1
]),
key =
len
)
lcs_rec.py
Wasteful Recursion
22
For the inputs “MAN” and “PIG”, the calls are:
(
1
, ('', 'PIG'))
(
1
, ('M', 'PIG'))
(
1
, ('MA', 'PIG'))
(
1
, ('MAN', ''))
(
1
, ('MAN', 'P'))
(
1
, ('MAN', 'PI'))
(
1
, ('MAN', 'PIG'))
(
2
, ('MA', 'PI'))
(
3
, ('', 'PI'))
(
3
, ('M', 'PI'))
(
3
, ('MA', ''))
(
3
, ('MA', 'P'))
(
6
, ('', 'P'))
(
6
, ('M', ''))
(
6
, ('M', 'P'))
24
redundant calls!
http://wordaligned.org/articles/longest

common

subsequence
Wasteful Recursion
23
When comparing longer sequences with a small number of
letters the problem is worse
For example, DNA sequences are composed of A, G, T and C,
and are long
For
lcs
('ACCGGTCGAGTGCGCGGAAGCCGGCCGAA',
'GTCGTTCGGAATGCCGTTGCTCTGTAAA') we get an
absurd:
(('', 'GT'),
13
,
182
,
769
)
(('A', 'GT'),
13
,
182
,
769
)
(('A', 'G'),
24
,
853
,
152
)
(('', 'G'),
24
,
853
,
152
)
(('A', ''),
24
,
853
,
152
)
http://blog.oncofertility.northwestern.edu/wp

content/uploads/
2010
/
07
/DNA

sequence.jpg
DP Saves the Day
24
We saw the
overlapping sub problems
emerge
–
comparing the same sequences over and over again
W
e saw how we can find the solution from solution of sub
problems
–
a property we called
optimal substructure
Therefore we will apply a
dynamic programming
approach
Start with
top

down
approach

memoization
Memoization
25
We save results of function calls to refrain from calculating them again
def
lcs_mem
( seq
1
, seq
2
,
mem
=None ):
if
not
mem
:
mem
= { }
key = (
len
(seq
1
),
len
(seq
2
))
# tuples are immutable
if
key
not
in
mem
:
# result not saved yet
if
len
(seq
1
) ==
0
or
len
(seq
2
) ==
0
:
mem
[key
] = [
]
else
:
if
seq
1
[

1
] == seq
2
[

1
]:
mem
[key
]
=
lcs_mem
(seq
1
[:

1
]
, seq
2
[:

1
],
mem
)
+
[ seq
1
[

1
] ]
else
:
mem
[key
]
=
max
(
lcs_mem
(seq
1
[:

1
]
, seq
2
,
mem
),
lcs_mem
(
seq
1
, seq
2
[:

1
],
mem
), key=
len
)
return
mem
[key]
“maximum
recursion depth
exceeded”
26
We want to use our
memoized
LCS algorithm on two long DNA
sequences:
>>>
from
random
import
choice
>>>
def
base
():
…
return
choice(
'AGCT'
)
>>>
seq
1
=
str
([base()
for
x
in
range
(
10000
)])
>>>
seq
2
=
str
([base()
for
x
in
range
(
10000
)])
>>>
print
lcs
(seq
1
, seq
2
)
RuntimeError
: maximum recursion depth exceeded in
cmp
We need a different algorithm…
27
link
→
DNA Sequence Alignment
28
Needleman

Wunsch
DP Algorithm:
Python package:
http://
pypi.python.org/pypi/nwalign
On

line example:
http://
alggen.lsi.upc.es/docencia/ember/frame

ember.html
Code:
needleman_wunsch_algorithm.py
Lecture videos from TAU:
http://video.tau.ac.il/index.php?option=com_videos&view=
video&id=
4168
&Itemid=
53
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment