Genetic algorithms (pdf)

disturbedtonganeseΒιοτεχνολογία

2 Οκτ 2013 (πριν από 4 χρόνια και 1 μήνα)

99 εμφανίσεις

Brandon Andrews


What are genetic algorithms?


3 steps


Applications to Bioinformatics


Invented and published in 1975 by John
Holland


Cells have DNA which define properties


Reproduction crosses DNA from both
parents merging properties from both


During this step random mutations can occur


A test of the fitness of the organism is
performed


Scores the organism against others based on criteria
for survival


Essentially evolution


Selection step


Based on the calculate fitness


Reproduction
step


Mutations


S
trategies
for
crossing


Termination step


When the goal is met



1) Generate random properties
(chromosomes) for N entities


2) Calculate their fitness and discard ones
that fall below the threshold


Can be determined through a simulation


3) Randomly cross over pairs that survive
the selection step


Also randomly choose properties and mutate them.
This could be as simple as jittering them


4) Go to step 2 until a goal is reached


Return the best set of properties


Could be anything


The goal is to minimize or maximize the
fitness function normally after each step


How often crossovers happens


0% represents if no crossover and both parents are
simply moved to the next step


100% represents that all of the parents are crossed
and only their children are move to the next step


The idea is that hopefully the good
properties of both parents are merged or
the good parent is preserved completely if
it has no flaws that can be fixed via a
crossing pair


The probability that part of the chromosome
is changed after a crossing


0
% if none of it is changed


Not useful since variety is needed to approach the best
solution or you’re stuck with the first generated properties


100
% if all of it is changed


Not useful since it negates the point of crossing at all,
causes a random search essentially


The concept is to stop the algorithm from
halting at a local maximum. The mutations
have a chance to generate small better
changes


When the expected error is low


Sometimes it’s hard to calculate an error since
the solution isn’t known


Or when the results stop minimizing for a
few iterations or stops increasing
depending on the problem


Might be obvious, but genetic algorithms
are by design approximate solutions
since they attempt to optimize to a
solution


Perfection is only as good as the fitness function
and the number of iterations, crossing and
mutation probabilities


Multiple Sequence Alignment


Initial generation


random generation of an
alignment based on the alignments of the given
sequences


No authors agree on the initial size of the population


Selection via a tournament style pairing crossing the
possible alignments


The fitness function


“Sum of pair” Objective Function (everyone uses a
different one)


The survival rate is different for each alignment


Sum all alignment scores together and take a percentage
for each alignment


Basically better alignments have a higher percentage to survive



Reproduction


Crossing uses a “one
-
point crossover”


Takes the first half of the first alignment and cross if with the
second half of the second parent


AB
CD and EF
GH

-
> ABGH


Or “point
-
to
-
point crossover”


Random index is chosen


ABC
D and EFG
H

-
> ABCH


Mutation


Remove or insert a gap into the alignment


Obitko

M. (1998). Genetic Algorithms.
Retrieved
from

http://www.obitko.com/tutorials/genetic
-
algorithms
/


Radenbaugh

A. (2008). Applications of genetic
algorithms in bioinformatics. Retrieved from
http://
scholarworks.sjsu.edu/cgi/viewcontent.
cgi?article=4491&context=etd_theses