Genetic Algorithms

courageouscellistAI and Robotics

Oct 29, 2013 (3 years and 5 months ago)

50 views

Class Project


Due at end of finals week


Essentially anything you want, so long
as it’s AI related and I approve


Any programming language you want


In pairs or individual


Email me by Wednesday, November 3

Projects


Implementing Knn to Classify Bedform Stability Fields


Blackjack Using Genetic Algorithms


Computer game players:Go, Checkers, Connect Four,
Chess, Poker


Computer puzzle solvers: Minesweeper, mazes


Pac
-
Man with intelligent monsters


Genetic algorithms:


blackjack strategy


Automated 20
-
questions player


Paper on planning


Neural network spam filter


Learning neural networks via GAs

Projects


Solving neural networks via backprop


Code decryptor using Gas


Box pushing agent (competing against an
opponent)

What didn’t work as well


Too complicated games:

Risk, Yahtzee, Chess, Scrabble,
Battle Simulation


Got too focused in making game work


I sometimes had trouble running the game


Game was often incomplete


Didn’t have time to do enough AI


Problems that were too vague


Simulated ant colonies / genetic algorithms


Bugs swarming for heat (emergent intelligence never happened)


Finding paths through snow


AdaBoost on protein folding data


Couldn’t get boosting working right, needed more time on
small datasets (spent lots of time parsing protein data)

Reinforcement Learning


Game playing: So far, we have
told

the
agent the value of a given board
position.


How can agent
learn

which positions
are important?


Play whole bunch of games, and receive
reward at end (+ or
-
)


How to determine utility of states that
aren’t ending states?

The setup: Possible game
states


Terminal states have reward


Mission: Estimate utility of all possible game
states

What is a
state
?


For chess: state is a combination of
position on board and location of
opponents


Half of your transitions are controlled by
you (your moves)


Other half of your transitions are
probabilistic (depend on opponent)


For now, we assume all moves are
probabilistic (probabilities unknown)

Passive Learning


Agent learns by “watching”


Fixed probability of moving from one state
to another

Sample Results

Technique #1: Naive Updating


Also known as Least Mean Squares (LMS)
approach


Starting at home, obtain sequence of
states to terminal state


Utility of terminal state = reward


loop back over all other states


utility for state i = running average of all
rewards seen for state i

Naive Updating Analysis


Works, but converges slowly


Must play
lots

of games


Ignores that utility of a state should
depend on successor

Technique #2: Adaptive
Dynamic Programming


Utility of a state depends entirely on the
successor state


If a state has one successor, utility should
be the same


If a state has multiple successors, utility
should be expected value of successors

Finding the utilities


To find all utilities, just solve equations





Set of linear equations, solveable


Changes each iteration as you learn
probabilities


Completely intractable for large problems:


For a real game, it means finding actual utilities of
all states

Technique 3: Temporal
Difference Learning


Want utility to depend on successors, but
want to solve iteratively


Whenever you observe a transition from i to
j:




= learning rate


difference between successive states =
temporal difference


Converges faster than Naive updating

Active Learning


Probability of going from one state to
another now depends on action


ADP equations are now:


Active Learning


Active Learning with Temporal Difference
Learning: works the same way (assuming
you know where you’re going)





Also need to learn probabilities to
eventually make decision on where to go

Exploration: where should
agent go to learn utilities?


Suppose you’re trying to learn optimal
game playing strategies


Do you follow best utility, in order to win?


Do you move around at random, hoping to
learn more (and losing lots in the process)?


Following best utility all the time can
get you stuck at an imperfect solution


Following random moves can lose a lot

Where should agent go to
learn utilities?


f(u,n) = exploration function


depends on utility of move (u), and number of
times that agent has tried it (n)


One possibility: instead of using utility to
decide where to go, use





Try a move a bunch of times, then eventually
settle

Q
-
learning


Alternative approach for temporal difference
learning


No need to learn probabilities: considered
more desirable sometimes


Instead, looking for “quality” of (state, action)
pair

Generalization in
Reinforcement Learning


Maintaining utilities for all seen states in a real game
is intractable.


Instead, treat it as a supervised learning problem


Training set consists of (state, utility) pairs


Or, alternatively, (state, action, q
-
value) triples


Learn to predict utility from state


This is a
regression

problem, not a
classification

problem


Radial basis function neural networks (hidden nodes are
Gaussians instead of sigmoids)


Support vector machines for regression


Etc…

Other applications


Applies to any situation where
something is to learn from
reinforcement


Possible examples:


Toy robot dogs


Petz


That darn paperclip


“The only winning move is not to play”