# 10 Algorithms for Hard Problems

AI and Robotics

Oct 23, 2013 (4 years and 6 months ago)

133 views

Page

86

10

Algorithms for Hard Problems

The algorithms introduced thus far in this course are algorithms for problems that are
relatively straight forward and can be done within a certain amount of time. However,
there are some problems that are harder to solve. P
roblems where the algorithms are not
necessarily straight forward. This section will introduce you to ideas on how these
problems have been approached in the past.

10.1

Genetic Alorithms

Genetic algorithms(GA) are search methods based on biological concepts.

The basic
operations involved with the genetic algorithm are simple. For the most part they involve
copying strings, swapping substrings, or altering a single bit in a string. Yet their effects
are powerful. These two features combined are the main at
traction of genetic algorithms

10.1.1

Definitions:

Chromosomes:

In biological terms, chromosomes are structures that contain codes that define an
individual. For genetic algorithms a chromosome is a structure that will fully
define a specific solution to a pr
oblem.

Genes:

In biological terms a
gene

is the code that controls a certain feature of an
individual, such as our blood type. In genetic algorithms the gene can represent a
specific part of a solution.

Alleles:

The value assigned to a
gene
.

Populatio
n

In genetic algorithms a population is simply a set of chromosomes that represent
different solutions to the same problem.

Generation:

A generation is the population of chromosomes that exists at time t. The next
generation of this population would be t
he chromosomes that exists at time t+1
and the ancestors of the population would be any population that appeared before
time t

Page

87

fitness:

The fitness of an individual indicates the "health" of the indviduals. Those that
are more fit will have a higher fitn
ess level and are more likely to survive and
reproduce. Genetic Algorithms use a fitness function to test the goodness of a
solution. The exact fitness function varies between applications because different
applications will have different measurements o
f fitness.

10.1.2

Basic Operations of a GA

There are three basic operators in genetic algorithms. The basic ideas behind these
operators are very simple. Yet these simple operators give genetic algorithms much of its
power. Each of these operations have a sp
ecific purpose and are important in different
ways. This section will discuss the operation and use of these operators.

10.1.2.1

Reproduction

The reproduction operator can be considered the fundamental operator of genetic
algorithms. In the biological world rep
roduction is a means by which individuals can
pass on their genetic material. Individuals who are better suited to an environment will
have a better chance to survive to adulthood and reproduce, thereby transferring its
genetic material to the next genera
tion.

In nature, individuals of a population compete with other individuals for a limited amount
of resources. This concept is very important because unlimited resources would allow
population sizes to grow forever. Similarly in genetic algorithms we mu
st limit the size
of our populations. Thus while the size of our population remains relatively constant
between generations, the quality of our solutions will generally increase. The
reproduction operator helps to pick the better solutions to survive to
the next generation.

The reproduction operator is extremely simple. The operator simply makes an exact copy
of the parent. To illustrate how the reproduction operator works let us look at the
following chart. The first column represent their fitness as

calculated by their fitness
function. The second column represents their relative fitness.

Solution

Fitness

Relative Fitness

fitness(solution)/total

A

20

0.333

B

10

0.167

C

15

0.25

D

10

0.167

E

5

0.0833

Total fitness = 60

Average fitness = 12.

Page

88

Fr
om the above values we set up what is called a weighted roulette wheel. The above
table would result in a wheel like the one below:

Individuals get a slice of the wheel proportional to their relative fitness. We then

"spin"
the wheel five times (the same as the size of the population) and determine who will be
reproduced. Obviously since slice A is a lot bigger than slice E, the probability that a A
would be chosen to survive will be much higher than E. Note that th
is scheme does not
guarantee that A will survive or that E will die off. It merely makes it more probable that
fit individuals will survive while not so fit individuals will die off.

Suppose we spin the above wheel 5 times and obtain:C A D C B. The tota
l fitness of the
population is now 15 + 20 + 10 + 10 + 15 = 70 and the average fitness is now 14. In
general we can say that the reproduction operator works to improve the fitness of the
entire population. Reproduction weeds out those solutions that aren
't very good and
keeps those that are good.

The selection method that was described above is called the roulette wheel method.
Other selection schemes have been developed which yield better performance. These
included such models as the expected value m
odel by De Jong, which helped to reduce
the variance and scattering of the roulette wheel method. Another popular selection
method is the remainder stochastic sampling without replacement method, proposed by
Brindle.

It is obvious however that we need mo
re than just the reproduction operator. Although
the reproduction operator will improve the fitness of the population as a whole, it cannot
improve any of the indivdual solutions in a population. In other words it lacks the ability
to find any new soluti
ons. In our example, the best that the operator would be able to do
alone would be to generate a population of all A's. A however might not be the best
solution. It was simply the best solution in the original population. Clearly a method of
generating

better individuals is needed.

Page

89

10.1.2.2

Cross Over

The cross over operator is the main operator responsible for producing new solutions to a
problem in a structured manner. The operator combine traits of the two parents in their
offsprings. Thus while the repr
oduction operator tries to improve the population as a
whole, the cross over operator improves individual solutions.

The basic cross over operator is also very simple. The parents consist of two
chromosomes. The result is also a pair of chromosomes. T
o illustrate how the cross over
operators works let us consider the following two strings as chromosomes.

s1 = ABCDEF

s2 =
abcdef

The first thing we must do is to choose a loci. The loci is the point where we cut the
chromosomes. The loci can be chose
n randomly. In our case any number between 1 and
5 can be used (5 places that can be cut in a 6 character string) Suppose we chose 2 to be
the loci. This would result in the chromosomes being cut as follows:

AB | CDEF

ab

|
cdef

The next step is to si
mply exchange all the characters of each of the chromosome from
the point of the loci. The final result of cross over is as follows:

AB
cdef

and

ab
CDEF

As we can see there are now two new individuals exhibiting traits of the parents.

Although this proc
ess does not guarantee that the new indiduals will do better than their
parents, it does allow for the introduction of new individuals. These individuals are
formed from solutions that we know have worked previously. Once there are new
individuals in a po
pulation, reproduction will work to eliminate those that are less fit and
keep only those that are better suited.

10.1.2.3

Mutation

The mutation operator plays a secondary role in Genetic Algorithms. In nature most
mutations will result in individuals who are le
ss fit than their non mutant counter parts.
Mutation however can sometimes produce new traits that can be desirable. In genetic
algorithms, the role of mutation is that of a safeguard. Reproduction and cross over can
sometimes eliminate useful genetic m
aterial. Mutation operates to recover loss of useful
material.

The mutation operator is also very simple. For every chromosome in a population we
determine if that chromosome is to be mutated. This can be randomly determined based
on some preset mutat
ion rate. Once we have determined that a chromosome is to be
mutated we would simply pick a gene and mutate that gene. Actual mutation would
depend on the gene. For example if your chromosomes are bit strings then the mutation

Page

90

operator would simply inve
rt one of the bits. For the chromosomes of the decision tree
the mutation operator would either change the feature number to test or the value of the
feature.

As as stated above, the mutation operators prevent irrecoverable loss. Suppose that we
are lo
oking for the optimal encoding of a bit string and we obtain the following strings
after a few generation

1100100

1000101

1100001

1011000

We see that the column of second lowest order bits consists of all 0's. There is no way
that the crossover and repro
duction operators alone would be able to generate a '1' in that
column. We need to mutate that bit in order to recover that bit of lost information.

10.2

Heuristic Search

One way of looking at how to solve a problem is by looking at the solution as a search
through the different possible solutions. For example recall the 8 queens problem. This
problem basically searched the different configurations and returned successful
configurations.

Some harder problems can be considered in the same manner. For examp
le consider the
following problem. Suppose the dots below represent cities. Lines represent roads that
connect the cities. The number with each line represents the length of the road.

What is the shortest path to get from the yellow
dot to the green dot?

10

12

11

9

22

18

25

19

17

31

8

30

13

15

27

Page

91

Between yellow and green there are quite a few possible paths. To find the shortest path,
we can obviously generate all possible paths and then find the shortest distance from that
set. However, this is not necessarily the best wa
y to do this. Even in our very over
simplified problem we can have quite a few paths. coloured lines represent just a few
paths.

:

To find the optimal solution, there are several search strat
egies that we could perform.
One thing we could do is perform a uniform cost search.

10

12

11

9

22

18

25

19

17

31

8

30

13

15

27

12

11

9

22

18

25

19

17

31

8

30

13

15

27

10

Page

92

10.2.1

Uniform Cost Search

Our goal is to minimize the total distance from yellow to green. Thus, we will look at the
possible paths and whichever is the s
hortest so far will be chosen will be used for the next
path.

Continue from pink because it is minimal path so far.

shortest path is now from yellow to orange. so continue from there

continue from red as it is short
est so far:

10

11

10

11

12

10

11

22

20

2
2

10

11

22

20

33

28

Page

93

continue from top orange as it is now shorter:

From here expand from black.

However, we can't stop yet as we do not know that it is the shortest path. It's total cost is
currently 59. It is so f
ar the longest of all possible paths. we would need to expand all
other paths until all paths generate are longer than 59 before we can stop. The major
problem with uniform cost search is that it can be very inefficient as the search tree can
be very lar
ge.

10

11

22

20

33

28

31

44

10

11

22

20

33

28

31

44

59

43

Page

94

10.2.2

Guided search

know that pink is farther from green than orange so moving to pink as a first move is
probably not a good idea. Consider the following table which show
s the straight line
distance between any two dots.

0

10

12

21

28

31

35

54

32

60

10

0

11

15

24

32

35

52

30

38

12

11

0

9

16

22

31

36

22

35

21

15

9

0

8

26

28

40

16

32

28

24

16

8

0

28

26

31

15

31

31

32

22

26

28

0

17

19

18

34

35

35

3
1

28

26

17

0

34

13

25

54

52

36

40

31

34

19

0

30

27

32

30

22

16

15

18

13

30

0

30

60

38

35

32

31

34

25

27

30

0

With this extra information, we can perform a heuristic search. A heuristic search is an
informed search. In other words we use the extra

information to help us in our search.

10.2.3

Greedy search

The first kind of search is the greedy search. Greedy search is basically an informed
search. Our objective is to get to the green dot. Thus what we can do is start at the
yellow dot and from there,

choose the dot closest to the green dot.

Orange is closer so choose it and continue from there.

60

38

60

38

30

32

Page

95

Blue is closer so continue from Blue.

With the heuristic search we can stop as we cannot get closer to gre
en than green.

According to this algorithm we would choose the following path:

This may not be the best path but it is not a bad path. The kind of search just performed
is called a greedy search.

The idea is that each stage of the search we pick the best one so
far.

Another search method is called an A* search. Uniform cost search finds the best path
but is inefficient. Greedy search is more likely to be efficient but doesn't find the best
pa
th. A* search combines these two search strategies.

When doing a uniform cost search we look for the best so far. We will call this value
g(n). g(n) is the cost function and it represents the cost of performing the search When
doing a greedy search we

choose the value that we think will be the best. Let this value
be h(n). h(n) is the heuristic function. It represents the estimate of how close we are.

A* search does the same kind of search as a greedy search but uses g(n)+h(n) as the
guide of what n
ode to expand next.

12

11

9

22

18

25

19

17

31

8

30

13

15

27

10

60

38

30

32

0

25

31

Page

96

g(n) is cost of travel. h(n) is the heuristic, the estimate on how good the choice was. To
perform A* search combine the 2. In above we will choose orange.

In the next step we will continue from red. Only one
choice go to black.

In the next step we will continue black. There are two choices green and blue. Pick
green as its total value is the lowest.

10 + 60

11 +35

10 + 60

11 +35

20+32

33+30

10 + 60

11 +35

20+32

33+30

28+8

10 + 60

11 +35

20+32

33+30

28+8

59+0

53+30

Page

97

The A* search found the path:

No
tice that this is the shortest path. If the heuristic was good, A* will not only find the
solution but it will find the best solution.

Think of it this way. Our heuristic function (distance from goal) is the minimum distance
to travel between two points
. Thus, from any point the cost of travel to the goal state
must be at least the cost of travel so far + the minimum distance between the two points.
This is why A* search can find the optimal solution.

11

Cryptography

Encryption is the process of transfo
rming a message so that only the intended recipient
can understand the message. A good cryptosystem should have the following properties:

1.

Secure against the designer (the designer will not be able to break a message
created by the system any easier than
someone else)

2.

Small key (small is a relative term)

3.

Use simple operations to perform encryption and decryption

4.

Allow error propagation

5.

Messages can be expanded

11.1

The One Time Pad is a system that is perfectly secure if used properly. This mean
s that
even if you tried every single possible key on an encrypted message you will not find the
original message because multiple "meaningful" would appear. To implement this
system characters are translated into a 5 bit binary equivalent:

12

11

9

22

18

25

19

17

31

8

30

13

15

27

10

Page

98

A
----
>00000

B
----
>00001

C
----
>00010

D
----
>00011

E
----
>00100

F
----
>00101

G
----
>00110

H
----
>00111

I
----
>01000

J
----
>01001

K
----
>01010

L
----
>01011

M
----
>01100

N
----
>01101

O
----
>01110

P
----
>01111

...

The message M is then represented as an n
-
bit message. To use the one t
random n
-
bit key (K) is required. The cipher text C is then generated by the expression:

C=M

K

where

is defined as the XOR operator. In other words, each bit of M is XOR'd with
the corresponding bit of K

Example:

Message: CALL

M=00010

00000 01011 01011

let K be a randomly selected 20 bit key. Suppose K is:

K=01011 00111 01010 01110

C=M

K

Page

99

thus:

00010 00000 01011 01011

01011 00111 01010 01110

01001 00111 00001 00101

The recipient can decrypt the message by calculating C

K

010
01 00111 00001 00101

01011 00111 01010 01110

00010 00000 01011 01011

C A L L

Very important features about the key K:

1.

K must be as long as M (can't just repeat a shorter key over and over)

2.

K must only ever be used just once.

If you use the
key more than once then the system is basically a coherent running key
cipher which is breakable.

11.2

Computationally Secure Cryptosystems

Although the one time pad offers perfect security, its drawbacks (big keys that can only
be used once) make it impracti
cal for everyday encryption. A computationally secure
cryptosystem is a system that can be cracked by performing an exhaustive search because
there is a unique meaningful decipherment. The idea is that we make the task of finding
the key very difficult.

There is no proof of security for computationally secure
cryptosystems.

11.2.1

DES

Developed by IBM in 1970's. DES uses permutations and transformations to achieve
security. How it works:

Break message M up into 64 bit blocks. Message is encrypted 64 bits a
t a time. Each bit
is first permutated (order is mixed up) and then the message is divided into two 32 bit
halves. The encryption algorithm is then applied to the message 16 times using different
key blocks that are generated by using the original key.

Page

100

Each time:

L
i+1
=R

i

R
i+1
=L

i

f(R
i
,K
i
)

Process is repeated 16 times to encrypt message (ie encrypted message is

The K is 64 bits. Ki is a 48 bit key block generated by a key schedule. The key schedule
determins what the key block
will be based on K and the current iteration.

To decrypt the message we simply have to use the message R
16
L
16

and apply DES to it
using K
16

as the first key, K
15

as the second key and so on.

The actual function f is described as follows:

f is given a 32
bit message (R) and a 48 bit key block K. R first goes through an
expansion function to make it 48 bits call it E(R). We then compute E(R)

K. This
results in a 48 bit value. This value is then broken up into eight 6 bit blocks B
1
B
2
...B
8.

Functions
S
1
,S
2
,...S
8

are applied to B
1
,B
2
,...B
8

respectively. Each of the S functions takes
in a 6 bit block and produces a 4 bit output. Thus in the end we end up with 32 bits of
output total. These 32 bits are then permutated one more time to yield the final r
esult.

It should be noted that all of the processes on how DES works is known. There is no
mystery in how the key blocks are calculated, or how E works or how any of the S boxes
work. The idea is that even if you were given all of this, you would still
have a hard time
breaking the code without searching all possible keys. Assuming you can check 1 key/

s
it would take 28 hours to find the correct key.

L
i

R
i

K
i

L
i+1

R
i+1

f

Page

101

11.2.2

RSA

RSA (Rivest, Shamir, and Adleman) is a public key cryptosystem. This kind of crypto
system is di
fferent from the other two presented because instead of having one key which
both the sender and recipient must know, it uses two keys K
1

and K
2
. The idea is that the
recipient keeps one key private and publishes the other key. To send a message, the
sen
der uses the public key to encrypt the message. This key can ONLY be used to
encrypt the message. It cannot be used to decrypt the message. The recipient then uses
the private key to actually decrypt the message.

The system works in this manner:

1.

Choose

two large distinct primes P
1

and P
2
.

2.

calculate R=P
1
P
2
. Find

(R) where

(R) is euler's function that returns the
number of values n between 1and R such that gcd(1,n)=1,. Because P
1

and P
2

are
both primes

(R)=(P
1
-
1)(P
2
-
1).

3.

Randomly choose a value e suc
h that 1 < e < R and that gcd(e,

(R)) = 1.

4.

Solve the linear congruence: de

1 (mod

(R)) (which gives you d)

5.

d is kept secret and R and e are published

6.

Anyone who wishes to send a message M (numerically coded) (0 < M < R) will
encrypt the message by com
puting:

C

M
e

(mod R)

7.

C is sent.

8.

To decrypt calculate: C
d

(mod R)

C
d

M
ed

M
1+

(R)k

M(M

(R)
)
k

M(mod R)

If R is factored (the two primes that makes up R is found) then the encryption can be
broken.