Gene Expression Programming

yalechurlishAI and Robotics

Nov 7, 2013 (3 years and 9 months ago)

292 views

Gene Expression Programming

CS678
-

Machine Learning

Grand Valley State University

by James Bund

GEP is like GA and GP

GA: Linear strings of fixed length


GP: Nonlinear entities of different sizes and shapes


GEP: Linear strings of fixed length and

Nonlinear entities of different sizes and shapes

GEP Algorithm

Genotype and Phenotype

Attempts to represent nature more faithfully.

In GA and GP genotype and phenotype are the same.


Genotype: Genetic representation

Phenotype: Physical representation


Genotype
-
> Translation
-
> Phenotype

String chromosome
-
> Expression Tree


Fitness is a judgment of the phenotype

Reproduction involves the genotype

Encoding Genotype & Phenotype

)
(
)
(
d
c
b
a



01234567

Q*+
-
abcd

Genotype is a fixed length string.

Resulting genotype is guaranteed to translate into a valid
expression tree.

Open Reading Frames (ORF)

Biology: Start codons and Stop codons


GEP: Start at the beginning of the coding sequence

Stops when the functions have been expressed,

so usually there are non
-
coding regions at the end.

01234567
89

Q*+
-
abcd
ef

Head and Tail Regions

t = h (n
-
1) + 1


t = tail length

h = head length

n = maximum number of arguments to a function

012345678901234567890

+Q
-
/b*aaQb
aabaabbaaab

Multigenic Chromosomes

Hierarchical discovery technique

012345678012345678

Q*Q+
bbaaa
*
-
ba
baabb

Multigenic Chromosomes

012345678901201234567890120123456789012

IIAI
ca3aa2acu
NNAO
ab2u3c31c
Au12
ua3112cac

13 = h + t

t = h (n
-
1) + 1


h = 4, t = 9

Replication

Replication is copying without modification.

Mutation

012345678012345678012345678

-
+
-
+
abaaa
/bb/
ababb
*Q*+
aaaba

012345678012345678012345678

Q+
-
+
abaaa
/bb
Q
ababb
*b*+
aaaba

Resulting ET will be valid as long as the tail doesn’t contain functions.

Insertion Sequence (IS)
Transposition

012345678901234567890
012345678901234567890

*
-
+*a
-
+a*b
babbaababab
Q**+abQbb*
aa
bba
aaabba


012345678901234567890
012345678901234567890

*
-
+*a
-
bba
+
babbaababab
Q**+abQbb*
aabbaaaabba

The position and the length of the IS must be determined.

Resulting ET will be valid, because the change is upstream.

Root Insertion Sequence (RIS)
Transposition

012345678901234567890
012345678901234567890

-
ba*+
-
+
-
Q/
abababbbaaa
Q*b/
+bb
abb
aaaaaaaabbb


012345678901234567890
012345678901234567890

-
ba*+
-
+
-
Q/
abababbbaaa
+bb
Q*b/
+bb
aaaaaaaabbb

The position and the length of the RIS must be determined.

Resulting ET will be valid, because the change is upstream.

Gene Transposition

012345678012345678012345678

*a
-
*abbab
-
QQ/aaabb
Q+abababb

012345678012345678012345678

-
QQ/aaabb
*a
-
*abbabQ+abababb

No effect on genes, but can change the linking of genes.

1
-
Point Recombination

012345678012345678

-
b+Qbbabb/aQbbbaab

/
-
a/ababb
-
ba
-
abaaa

012345678012345678

-
b+
/ababb
-
ba
-
abaaa

/
-
a
Qbbabb/aQbbbaab

Second gene was not modified.

Both offspring will always be valid.

2
-
Point Recombination

0123456789001234567890

+*a*bbcccac*baQ*acabab

*cbb+cccbcc++**bacbaab


0123456789001234567890

+*a*bbc
cbcc++*
Q*acabab

*cbb+cc
ccac*ba
*bacbaab

Both offspring will always be valid.

Gene Recombination

012345678012345678012345678

/aa
-
abaaa/a*bbaaab/Q*+aaaab

/
-
*/abbabQ+aQbabaa
-
Q/Qbaaba

012345678012345678012345678

/aa
-
abaaa
Q+aQbabaa
/Q*+aaaab

/
-
*/abbab
/a*bbaaab
-
Q/Qbaaba

Block Stacking (SpellBot) Example

EQ(DO(MTT(TOS), NOT(TOS)), DO(MTS(NBN), NOT(NBN)))


012345601234560123456

DTN
tttt
DSN
nnnn
cNS
tcnc

EQ=E

DO=D

MTT=T

MTS=S

NOT=N

TOS=t

TCB=c

NBN=n

Advantages of GEP over GA and GP


GEP combine these two features by having the ease of
working with them as GAs and the complexity of GPs.


Generated phenotypes are always syntactically correct and
don’t need to be corrected.


The genetic operators don’t need to handle complex
language syntax.

GAs have less complexity in what they can express.


GPs are harder to work with because their operators must

be smart enough to ensure syntactically correct offspring.

Multigenic Chromosome Benefits

Evolving components of a solution not just solutions.


Separation of tasks and abstraction of details like in the
engineering done by humans.


Similar to:


Structured Programming


Functions


Object Oriented


Design Patterns

You Can Have Extra Genes

Best Fitness vrs. Average Fitness

References

Gene Expression Programming: A New Adaptive Algorithm for Solving Problems

by Candida Ferreira

http://www.gene
-
expression
-
programming.com/webpapers/gep.pdf

Gene Expression Programming

http://www.gene
-
expression
-
programming.com

GeneXproTools

http://www.gepsoft.com