# Genetic Algorithms

Τεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 4 χρόνια και 7 μήνες)

78 εμφανίσεις

Class Project

Due at end of finals week

Essentially anything you want, so long
as it’s AI related and I approve

Any programming language you want

In pairs or individual

Email me by Wednesday, November 3

Projects

Implementing Knn to Classify Bedform Stability Fields

Blackjack Using Genetic Algorithms

Computer game players:Go, Checkers, Connect Four,
Chess, Poker

Computer puzzle solvers: Minesweeper, mazes

Pac
-
Man with intelligent monsters

Genetic algorithms:

blackjack strategy

Automated 20
-
questions player

Paper on planning

Neural network spam filter

Learning neural networks via GAs

Projects

Solving neural networks via backprop

Code decryptor using Gas

Box pushing agent (competing against an
opponent)

What didn’t work as well

Too complicated games:

Risk, Yahtzee, Chess, Scrabble,
Battle Simulation

Got too focused in making game work

I sometimes had trouble running the game

Game was often incomplete

Didn’t have time to do enough AI

Problems that were too vague

Simulated ant colonies / genetic algorithms

Bugs swarming for heat (emergent intelligence never happened)

Finding paths through snow

AdaBoost on protein folding data

Couldn’t get boosting working right, needed more time on
small datasets (spent lots of time parsing protein data)

Reinforcement Learning

Game playing: So far, we have
told

the
agent the value of a given board
position.

How can agent
learn

which positions
are important?

Play whole bunch of games, and receive
reward at end (+ or
-
)

How to determine utility of states that
aren’t ending states?

The setup: Possible game
states

Terminal states have reward

Mission: Estimate utility of all possible game
states

What is a
state
?

For chess: state is a combination of
position on board and location of
opponents

Half of your transitions are controlled by

Other half of your transitions are
probabilistic (depend on opponent)

For now, we assume all moves are
probabilistic (probabilities unknown)

Passive Learning

Agent learns by “watching”

Fixed probability of moving from one state
to another

Sample Results

Technique #1: Naive Updating

Also known as Least Mean Squares (LMS)
approach

Starting at home, obtain sequence of
states to terminal state

Utility of terminal state = reward

loop back over all other states

utility for state i = running average of all
rewards seen for state i

Naive Updating Analysis

Works, but converges slowly

Must play
lots

of games

Ignores that utility of a state should
depend on successor

Dynamic Programming

Utility of a state depends entirely on the
successor state

If a state has one successor, utility should
be the same

If a state has multiple successors, utility
should be expected value of successors

Finding the utilities

To find all utilities, just solve equations

Set of linear equations, solveable

Changes each iteration as you learn
probabilities

Completely intractable for large problems:

For a real game, it means finding actual utilities of
all states

Technique 3: Temporal
Difference Learning

Want utility to depend on successors, but
want to solve iteratively

Whenever you observe a transition from i to
j:

= learning rate

difference between successive states =
temporal difference

Converges faster than Naive updating

Active Learning

Probability of going from one state to
another now depends on action

ADP equations are now:

Active Learning

Active Learning with Temporal Difference
Learning: works the same way (assuming
you know where you’re going)

Also need to learn probabilities to
eventually make decision on where to go

Exploration: where should
agent go to learn utilities?

Suppose you’re trying to learn optimal
game playing strategies

Do you follow best utility, in order to win?

Do you move around at random, hoping to
learn more (and losing lots in the process)?

Following best utility all the time can
get you stuck at an imperfect solution

Following random moves can lose a lot

Where should agent go to
learn utilities?

f(u,n) = exploration function

depends on utility of move (u), and number of
times that agent has tried it (n)

One possibility: instead of using utility to
decide where to go, use

Try a move a bunch of times, then eventually
settle

Q
-
learning

Alternative approach for temporal difference
learning

No need to learn probabilities: considered
more desirable sometimes

Instead, looking for “quality” of (state, action)
pair

Generalization in
Reinforcement Learning

Maintaining utilities for all seen states in a real game
is intractable.

Instead, treat it as a supervised learning problem

Training set consists of (state, utility) pairs

Or, alternatively, (state, action, q
-
value) triples

Learn to predict utility from state

This is a
regression

problem, not a
classification

problem

Radial basis function neural networks (hidden nodes are
Gaussians instead of sigmoids)

Support vector machines for regression

Etc…

Other applications

Applies to any situation where
something is to learn from
reinforcement

Possible examples:

Toy robot dogs

Petz

That darn paperclip

“The only winning move is not to play”