Genetic Algorithms and
Based on lecture by Dr. Steffen Schulze
is a heuristic method that operates on pieces of
information like nature does on genes in the course of
Individuals are represented by a linear string of letters of an alphabet
(in nature nucleotides, in genetic algorithms bits)
Individuals are allowed to
Depending on the generation replacement mode a subset of parents and
offspring enters the next reproduction cycle.
After a number of iterations the population consists of individuals that
are well adapted in terms of the fitness function.
It cannot be proven that the individuals of a final generation contain an
optimal solution for the objective encoded in the fitness function.
Initialise a population of individuals.
This can be done either randomly or with domain specific background
knowledge to start the search with promising seed individuals.
available the latter is always recommended. )
Individuals are represented as a string of bits.
must be defined that takes as input an
individual and returns a number (or a vector) that can be used as a
measure for the quality (fitness) of that individual.
The application should be formulated in a way that the desired
solution to the problem coincides with the most successful individual
according to the fitness function.
Evaluate all individuals
of the initial population.
Generate new individuals.
The reproduction probability for an individual
is proportional to its relative fitness within the current generation.
two point crossover
. Substitute one or more bits of an individual randomly by a new
value (0 or 1).
. Change the bits in a way that the number encoded by them is
slightly incremented or decremented.
. Exchange parts (single bits or strings of bits) of one individual
with the corresponding parts of another individual. Originally, only
point crossover was performed but theoretically one can
process up to L
1 different crossover sites (with L as the length
IV. Select individuals for the new parent generation.
1) Complete offspring is selected while all parents are discarded
(original genetic algorithm). This is motivated by the biological
model and is called total generation replacement.
best individuals (from old and new generation)
This method is called
elitist generation replacement
V. Go back to step 2 until either a desired fitness value was reached or
until a predefined number of iterations was performed
Select the next
genetic algorithm is configured to
operate on numbers, not bit strings as in the
original genetic algorithm.
the mathematical foundation of genetic
algorithms holds only for binary representations,
although some of the mathematical properties
are also valid for a floating point representation.
Binary representations run faster in many
An additional encoding/decoding process may be
required to map numbers onto bit strings.
Protein Structure Prediction
Cartesian 3D coordinates is not a good choice
> representation by torsion angles
The frequency of each torsion angle in intervals of 10
was determined and the
ten most frequently occurring
are made available for substitution of individual
torsion angles by the
At the beginning of the run, individuals were initialized
with either a completely extended conformation where all
torsion angles are 180
or by a random selection from the
ten most frequently occurring intervals of each torsion
For the w torsion angle the constant value of 180
used because of the rigidity of the peptide bond between
the atoms Ci and Ni+1.
Generally molecules with
residues * approximately
atoms per residue =
degrees of freedom
Systems of equations with this number of variables
are analytically intractable today.
torsion angles per residue *
likely values per torsion angle) =
Charmm energy func:
bond length potential (set to const)
bond angle potential
(set to const)
torsion angle potential
improper torsion angle potential
(set to const)
van der Waals pair interactions
(set to const)
interaction with the solvent
(set to const
> in vacum)
(since there are no interactions with the solvent, there is not enough
force to drive the protein to a compact folded state)
Simplified Energy Function
pseudo entropic term
Empirical relation between the number of residues and the diameter:
protein Crambin, 46 a.a.
Table 3. Steric Energies in the Last Generation
Table 2. R.m.s. Deviations to Native Crambin
Simple summation of different components has the disadvantage
that components with larger numbers would dominate the fitness
function whether or not they are important or of any significance
at all for a particular conformation.
In other words
bad fitness function
The genetic algorithm favoured
individuals with lowest total energy
which in this case was most easily
achieved by optimising electrostatic
Instead of using separate phi psi value distributions, apply phi
Use secondary structure prediction algorithm (70% accuracy).
Specialised Genetic Operators
LOCAL TWIST (local conformation changes by performing the ring
closure algorithm for polymers)
operator led to significant
improvements in prediction
accuracy and also to a
substantial decrease in
overall computation time.
r.m.s. only for verification
Vector Fitness Function
Candidate selection for the next generation:
If there is an individual that has better (i.e. lower) values in each fitness
component, then we take it. Continue until no unambiguously better
individuals are found.
Then remove the worst individuals, i.e. those with higher values in each
fitness component than any other individual.
The remaining set of individuals is heuristically reduced until the exact
number of individuals for the next generation is reached. This is done by
iteratively removing an individual with the worst fitness value in a randomly
selected fitness component.
Tests on other proteins (Local Twist and rms fitness) gave also
close to native conformations (less than 3.0 A)
Capability of Genetic Algorithm in General?
applying an appropriate fitness function genetic
algorithm achieves the desired results.
> Rms 6.27
decreased with rms
mislead the algorithm to non
I. Fitness vector
II. Fitness vector
+ constraints on the secondary structures
> Rms 4.36
Crambin 46 a.a.
Genetic algorithms proved to be an efficient search tool for 3
of proteins. For a 3
D protein model with a simple, additive force field as fitness
function and using a rather small population the genetic algorithm produced
individuals (i.e. protein conformations) of dissimilar topology but each
with highly optimized fitness values.
Given an appropriate fitness function the genetic algorithm application described
here finds the desired solution within only small deviations.
The major problem lies in the fitness function. If there were one or a set of
indicators that return 1for
the object is native protein conformation
and 0 for
object is not a native protein conformation
one could expect the genetic
algorithm approach to deliver reasonably accurate
However, neither mathematical models, empirical, semi
empirical or statistical
force fields are yet accurate enough to reliably discriminate native from non
native conformations without additional constraints. Thus, the genetic algorithm
)optimal conformations in a different sense than that of
the same problem (fitness
scoring function) exists in the Protein
Docking problem. The correct transformation (within 3
5A) is found in
realistic time (almost in all cases). However, to assign a high score to the
native complex is a problematic task. We don’t know yet a proper scoring
Side Chain Placement