dynamic programming

disturbedtonganeseΒιοτεχνολογία

2 Οκτ 2013 (πριν από 3 χρόνια και 2 μήνες)

65 εμφανίσεις

Recitation
12


Programming for Engineers in Python

Plan


Dynamic Programming


Coin Change problem


Longest Common Subsequence


Application to Bioinformatics


2

Teaching Survey


3


Please answer the teaching survey:
https://www.ims.tau.ac.il/Tal
/


This will help us to improve the course


Deadline:
4.2.12



Coin Change Problem


4


What
is the smallest number of coins I can use to make exact
change
?


Greedy solution: pick the largest coin first, until you reach
the change needed


In the US currency this works well:


Give change for
30
cents if you’ve got
1
,
5
,
10
, and
25
cent
coins:


25
+
5


2
coins


http://jeremykun.files.wordpress.com/
2012
/
01
/coins.jpg

The Sin of Greediness

5


What if you don’t have
5
cent coins?


You got
1
,
10
, and
25


Greedy solution:
25
+
1
+
1
+
1
+
1
+
1


6
coins


But a better solution
is:
10
+
10
+
10


3
coins!


So the greedy approach isn’t
optimal


The Seven Deadly Sins and the Four Last
Things by
Hieronymus
Bosch

http://en.wikipedia.org/wiki/File:Boschsevendeadlysins.jpg

Recursive Solution

6


Reminder


find the minimal # of coins needed to give exact
change with coins of specified values


Assume that we can use
1
cent coins so there is always
some

solution


Denote our coin list by c
1
, c
2
, …,
c
k

(c
1
=
1
)


k is the # of coins values we can use


Denote the change required by n


In the previous example:


n=
30
, k=
3
, c
1
=
1
, c
2
=
10
, c
3
=
25

Recursive Solution

7


Recursion Base:


If n=
0
then we need
0
coins


If k=
1
, c
1
=
1
, so we need n coins



Recursion Step:


If n<
c
k

we can’t use
c
k



We solve for n and c
1
,…,c
k
-
1


Otherwise, we can either use
c
k

or not use
c
k


If we use
c
k



we solve for n
-
c
k

and c
1
,…,
c
k


If we don’t use
c
k



we solve for n and c
1
,…,
c
k
-
1

Recursion Solution

8

def

coins_change_rec
(
cents_needed
,
coin_values
):


if

cents_needed

<=
0
:
# base
1


return

0


elif

len
(
coin_values
) ==
1
:
#
base
2


return

cents_needed


# assume that
coin_values
[
0
]==
1


elif

coin_values
[
-
1
] >
cents_needed
:
#
step
1


return

coins_change_rec
(
cents_needed
,
coin_values
[:
-
1
])


else
:
# step
2


s
1
=
coins_change_rec
(
cents_needed
,
coin_values
[:
-
1
] )


s
2
=
coins_change_rec
(
cents_needed
-
coin_values
[
-
1
],








coin_values

)


return

min
(s
1
, s
2
+
1
)

coins_rec.py

Repeated calls

9


We count how many times we call the recursive function for each set
of arguments:

calls = {}

def

coins_change_rec
(
cents_needed
,
coin_values
):


global

calls


calls[(
cents_needed
,
coin_values
)] =


calls.get
( (
cents_needed
,
coin_values
) ,
0
) +
1




>>>
print

'result
'
,
coins_change_rec
(
30
, (
1
,
5
,
10
,
25
))

result
2

>>>
print

'max
calls
'
,max
(
calls.values
())

max
calls
4


Dynamic Programing
-

Memoization

10


We want to store the values of calculation so we don’t repeat
them


We create a table called
mem


# of columns: # of cents needed +
1


# of rows: # of coin values +
1


The table is initialized with some illegal value


for example
-
1
:


mem

= [

[
-
1
for

y
in

range
(cents_needed+
1
)]
for





x
in

range
(
len
(
coin_values
))
]

Dynamic Programing
-

Memoization

11


For each call of the recursive function, we check if
mem

already has the answer:

if

mem
[
len
(
coin_values
)][
cents_needed
] ==
-
1
:


In case that it doesn’t (the above is
True
) we calculate it as
before, and we store the result, for example:

if

cents_needed

<=
0
:


mem
[
len
(
coin_values
)][
cents_needed
]

=
0


Eventually we return the value

return

mem
[
len
(
coin_values
)][
cents_needed
]

coins_mem.py

Dynamic Programing
-

Iteration

12


Another approach is to first build the entire matrix


This matrix holds the minimal number of coins we need to
get change for j cents using the first i coins (
c
1
, c
2
, …,
c
i
)


The solution will be
min_coins
[
k,n
]


the last element in the
matrix


This will save us the recursive calls, but will enforce us to
calculate all the values
apriori


Bottom
-
up

approach vs. the
top
-
down

approach of
memoization


Dynamic Programming approach

13


The point of this approach is that we have a recursive formula
to break apart a problem to sub problems


Then we can use different approaches to minimize the
number of calculations by storing the sub solutions in
memory


Bottom up
-

example matrix

14


Set n=
4
and k=
3
(coins are
1
,
2
, and
3
cents)


Base cases:


how many coins do I need to make change for zero cents?
Zero!


So
min_coins
[i,
0
]=
0


And how many pennies do I need to make

j

cents? Exactly

j
(we assumed we can use pennies)


So
min_coins
[
0
,j]=j


So the base cases give us:

0
1
2
3
4
0
?
?
?
?
0
?
?
?
?


Next


the recursion step

Bottom up
-

example matrix

15


For particular choice of
i,j

(but not i=
0
or j=
0
)


To determine
min_coins
[
i,j
]


the minimum # of coins to get
exact change of j using the first i coins


We can use the coin c
i

and add +
1
to
min_coins
[
i,j
-
c
i
] (only
valid if j>c
i
)


We can decide not to use
c
i

, therefore to use only c
0

,..,

c
i
-
1
,

and therefore
min_coins
[i
-
1
,j] .


So which way do we choose?


The one with the least coins!

min_coins
[
i,j
]

=


min(
min_coins
[
i,j
-
c
i
]
+
1
,
min_coins
[i
-
1
,j
])



Example matrix


recursion step

16


Set
n=
4

and
k=
3

(coins are
1
,
2
, and
3
cents)


So the base cases give us:

𝑀
=
0
1
2
3
4
0
1
1
2
2
0
1
1
1
2


M(
1
,
1
)=
1


M(
1
,
2
)=
1


M(
1
,
3
)=min(M(
1
,
1
)+
1
,M(
0
,
3
))=min(
2
,
2
)=
2


M(
1
,
4
)=min(M(
1
,
2
)+
1
, M(
0
,
4
))=min(
2
,
4
)=
2


etc…


coins_matrix.py

The code for the
matrix solution
and the idea is from
http://jeremykun.wordpress.com/
2012
/
01
/
12
/a
-
spoonful
-
of
-
python/

Longest Common Subsequence

17


Given two sequences (strings/lists) we want to find the
longest common subsequence


Definition


subsequence
: B is a subsequence of A if B can be
derived from A by removing elements from A


Examples


[
2
,
4
,
6
] is a subsequence of [
1
,
2
,
3
,
4
,
5
,
6
]


[
6
,
4
,
2
] is NOT
a subsequence of [
1
,
2
,
3
,
4
,
5
,
6
]


‘is’
is a
subsequence
of ‘
distance’


‘nice’
is NOT
a subsequence of
‘distance


Longest Common Subsequence

18


Given two subsequences (strings or lists) we want to find the
longest common subsequence:


Example for a LCS:


Sequence
1
:
H
U
MAN


Sequence
2
: C
H
I
M
P
AN
ZEE



Applications include:


BioInformatics

(next up)


Version Control




http://wordaligned.org/articles/longest
-
common
-
subsequence

The DNA

19


Our biological blue
-
print


A
sequence

made of four bases


A, G, C, T


Double strand:


A connects to T


G connects to C


Every triplet encodes for an
amino
-
acid


Example:
GAG

Glutamate


A chain of amino
-
acids is a
protein


the biological machine!

http://sips.inesc
-
id.pt/~nfvr/msc_theses/msc
09
b/

Longest common subsequence

20


The DNA changes:


Mutation: A

G, C

T, etc.


Insertion: AGC



A
T
GC


Deletion: AGC


A

C




Given two non
-
identical sequences, we want to find the parts
that are common


So we can say how different they are


Which DNA is more similar to ours? The cat’s or the dog’s?


http://palscience.com/wp
-
content/uploads/
2010
/
09
/DNA_with_mutation.jpg

Recursion

21


An LCS of two sequences can be built from the
LCSes

of
prefixes of these sequences


Denote the sequences seq
1
and seq
2


Base



check if either sequence is empty:

If

len
(seq
1
) ==
0
or

len
(seq
2
) ==
0
:


return

[ ]


Step



build solution from shorter sequences:

If

seq
1
[
-
1
] == seq
2
[
-
1
]:


return

lcs

(seq
1
[:
-
1
],seq
2
[:
-
1
]) + [ seq
1
[
-
1
] ]

else
:


return

max
(
lcs

(seq
1
[:
-
1
],seq
2
),
lcs
(seq
1
,seq
2
[:
-
1
]),







key =
len
)

lcs_rec.py

Wasteful Recursion

22


For the inputs “MAN” and “PIG”, the calls are:

(
1
, ('', 'PIG'))

(
1
, ('M', 'PIG'))

(
1
, ('MA', 'PIG'))

(
1
, ('MAN', ''))

(
1
, ('MAN', 'P'))

(
1
, ('MAN', 'PI'))

(
1
, ('MAN', 'PIG'))

(
2
, ('MA', 'PI'))

(
3
, ('', 'PI'))

(
3
, ('M', 'PI'))

(
3
, ('MA', ''))

(
3
, ('MA', 'P'))

(
6
, ('', 'P'))

(
6
, ('M', ''))

(
6
, ('M', 'P'))


24
redundant calls!

http://wordaligned.org/articles/longest
-
common
-
subsequence

Wasteful Recursion

23


When comparing longer sequences with a small number of
letters the problem is worse


For example, DNA sequences are composed of A, G, T and C,
and are long


For
lcs
('ACCGGTCGAGTGCGCGGAAGCCGGCCGAA',
'GTCGTTCGGAATGCCGTTGCTCTGTAAA') we get an
absurd:

(('', 'GT'),
13
,
182
,
769
)

(('A', 'GT'),
13
,
182
,
769
)

(('A', 'G'),
24
,
853
,
152
)

(('', 'G'),
24
,
853
,
152
)

(('A', ''),
24
,
853
,
152
)


http://blog.oncofertility.northwestern.edu/wp
-
content/uploads/
2010
/
07
/DNA
-
sequence.jpg

DP Saves the Day

24


We saw the
overlapping sub problems

emerge


comparing the same sequences over and over again


W
e saw how we can find the solution from solution of sub
problems


a property we called
optimal substructure


Therefore we will apply a
dynamic programming

approach


Start with
top
-
down

approach
-

memoization


Memoization

25


We save results of function calls to refrain from calculating them again

def

lcs_mem
( seq
1
, seq
2
,
mem
=None ):



if

not

mem
:



mem

= { }


key = (
len
(seq
1
),
len
(seq
2
))
# tuples are immutable



if

key
not

in

mem
:
# result not saved yet


if

len
(seq
1
) ==
0
or
len
(seq
2
) ==
0
:



mem
[key
] = [
]


else
:


if

seq
1
[
-
1
] == seq
2
[
-
1
]:


mem
[key
]

=
lcs_mem
(seq
1
[:
-
1
]
, seq
2
[:
-
1
],
mem
)
+
[ seq
1
[
-
1
] ]



else
:



mem
[key
]

=
max
(
lcs_mem
(seq
1
[:
-
1
]
, seq
2
,
mem
),





lcs_mem

(
seq
1
, seq
2
[:
-
1
],
mem
), key=
len

)

return

mem
[key]

“maximum
recursion depth
exceeded”

26


We want to use our
memoized

LCS algorithm on two long DNA
sequences:

>>>
from

random
import

choice

>>>
def

base
():


return

choice(
'AGCT'
)

>>>
seq
1
=
str
([base()
for

x
in

range
(
10000
)])

>>>
seq
2
=
str
([base()
for

x
in

range
(
10000
)])

>>>
print

lcs
(seq
1
, seq
2
)

RuntimeError
: maximum recursion depth exceeded in
cmp


We need a different algorithm…

27

link



DNA Sequence Alignment

28


Needleman
-
Wunsch

DP Algorithm:


Python package:
http://
pypi.python.org/pypi/nwalign


On
-
line example:
http://
alggen.lsi.upc.es/docencia/ember/frame
-
ember.html


Code:
needleman_wunsch_algorithm.py


Lecture videos from TAU:


http://video.tau.ac.il/index.php?option=com_videos&view=
video&id=
4168
&Itemid=
53