A presentation by Matthew Dilts

hostitchAI and Robotics

Oct 23, 2013 (4 years and 6 months ago)


A presentation by Matthew Dilts

To solve problems that would take a long
amount of time to manually solve ie. What’s
the best strategy in a certain game?

Learning AI can adapt to conditions that
cannot be anticipated prior to a game’s release
(such as individuals playing tastes and


Until recently, the lack of precedent of the
successful application of learning in a
mainstream top
rated game means that the
technology is unproven and hence perceived as
being high risk. (Software companies are

Learning algorithms are frequently associated
with techniques such as neural networks and
genetic algorithms, which are difficult to apply
game due to their relatively low efficiency.

Faking It

Indirect Adaptation

Direct Adaptation

Supervised Learning

Unsupervised Learning

Simply degrade an AI that performs very well through
the addition of random errors, then over time reduce
the number of random errors.



The ‘rate of learning’ can be carefully controlled and
specified prior to release, as can the behavior of the AI
at each stages of its development.


The state of the AI at any point in time is independent
of the details of the interaction of the player with the
game, simplifying debugging and testing.



However, this doesn’t actually solve any of the
problems that would be solved by an actual AI

AI Agent gathers information which is then used by a
“conventional” AI layer to adapt the agent’s behavior.

Calculate optimal camping locations in an FPS, then hand it over
to the AI layer.


The information about the game world upon which the changes
in behavior are based can often be extracted very easily and
reliably, resulting in fast and effective adaptation.

Since changes in behavior are made by a conventional AI layer,
they are well defined and controlled, and hence easy to debug and


It requires both the information to be learned and the changes in
behavior that occur in response to it to be defined a priori by the
AI designer.


Using learning algorithms to adapt an agent’s
behavior directly, usually by testing modifications
to it in the game world to see if it can be improved.

Consider a game with no built
in AI whatsoever
which evolves rules for controlling AI agents as
the game is played. Such a system would be Direct

This type of learning closely mimics human
learning and is a very bright
eyed idealistic way to
think about learning, but it not very applicable in
its most general form.

Direct adaptation is the ultimate form of AI


All the behaviors developed by the AI agents
would be learned from their experience in the
game world, and would therefore be
unconstrained by the preconceptions of the Ai

The evolution of the AI would be open ended in
the sense that there would be no limit to the
complexity and sophistication of the rule sets, and
hence the behaviors that could evolve.

A measure of the agent’s performance must be
developed that reflects the real aim of learning and the
role of the agent in

Each agent’s performance must be evaluated over a
substantial period of time to minimize the impact of
random events on the measured performance.

Too many evaluations are likely to be required for each
agent’s performance to be measured against a
representative sample of human opponents.

The lack of constraints on the types of behavior that
can develop makes it impossible to guarantee that the
game would continue to be playable once adaption had
begun. Testing for such cases would be difficult.

Specifically for Direct Adaptation, it works best when used
for specific subsets of an overall AI goal.

Incorporate as much prior knowledge as possible.

Design a good performance measure (this can be extremely
difficult because:

Many alternative measures of apparently equal merit often
exist, requiring an arbitrary choice to be made between

The most obvious or logical measure of performance might
produce the same value for wide ranges of parameter
values, providing little guide as to how to choose between

Carelessly designed performance measures can encourage
undesirable behavior, or introduce locally optimal behavior.

Learn by Optimization

Search for sets of parameters that make the agent
perform well in

Learn by Reinforcement

Learn the relationship between an action taken by
the agent in a particular state of the game world
and the performance of the agent.

Learn by Imitation

Imitating a player or allowing the player to
evaluate performance assessments.



Avoid Locally Optimal Behaviors

Behaviors that, while not the best possible, can’t be
improved upon by making small changes.

Minimize Dependencies

Multiple learning dependencies can rely on each other.

An AI with both a ‘location’ and a ‘weapon choice



means an agent has adapted its behavior to a
very specific set of states of the game world and performs
poorly in other states.

Explore and Exploit

Should the AI explore new strategies or repeat what it
already knows?

Indirect adaptation works well in game
because the agent’s behavior is determined by
the AI layer in the design phase of a game.

Direct adaptation can be performed during a
game’s development, but is limited to only
limited and specific problems in game and only
when guided by a well thought out heuristic.

Handle learning opportunities efficiently

When do we calculate what has been learned?

Generate effective novel behaviors

The AI should experiment plausibly and

Be robust with respect to

Be aware that sometimes random choices will
work well just by sheer luck

Require minimal computational resources

Typically Ai has access to a very small proportion
of a machines resources.

Potentially NPCs could have adaptive AI. Why

Adaptive Ai takes too long to learn and, in general,
the search space of behaviors is too large and
complex to be explored quickly and efficiently.

Solution: Dynamic Scripting. An adaptive
mechanism that uses domain knowledge to restrict
the search space enough to make real
adaption a reality while ensuring that ALL NPC
behaviors are always plausible.

We can also go back and consider how Dynamic
Scripting solves all of the “requirements” page as

At a tactical level, the number of possible states and actions in a
game can be huge, while the number of learning opportunities is
often relatively small. In these circumstances, reinforcement
learning is unlikely to achieve acceptable results unless the size of
the state
action space can be dramatically reduced. This is the
approach of dynamic scripting.

This is different than just reinforcement learning because (although
reinforcement learning may be involved) because it is a strategy to
reduce the number of actual state actions that can be utilized and it
can also adapt more easily to the strategy in use by the opponent.

Now, how do we actually go about implementing this?

The first step in creating an application of dynamic
scripting is to create the

from which it will
build its scripts.

Example rules: use a melee attack against the closest
enemy. Another rule could be to use a special ability at
a certain time on a certain.

Using the many

for each NPC, we then go
and create our script, or strategy that the NPCs plan on
using against their foes.

Also need to set priorities so the NPC can determine
the order it needs to take its actions. Pulling out a
weapon should be higher priority than attacking for
example since you need to do it first.

Choose rules in a random or weighted
fashion before an encounter with the player.

Choosing the right size for the script is important.

If you set the size too small, the AI won’t be very
complex and won’t seem very smart.

If you set the size too large, the AI might spend too
much time doing high priority tasks and won’t
have any time to do the lower priority ones. This
would be a problem, for example, for our fighter
mentioned earlier if a vast amount of potential
high priority tasks were defined.

Dynamic scripting requires a fitness function that will assign
a value to the results of an encounter that indicates how well
the dynamically scripted AI performed.

Evaluate the “fitness” value of the encounter afterwards.
Based on these evaluations, we’ll change the weights on our
potential rules that we used to create the script.

This fitness function is game specific and needs to be
carefully designed by the developers in advance.

Consider individual performance and also team
performance. If one

performance was terrible, but
team performance was great, maybe its performance wasn’t
so terrible after all? Then again, can’t only consider team
performance, because maybe its performance is dragging
the rest of the team down and they could do even better.

Update the weights of each ruleset as needed
after implementing the fitness function.

There are many possibilities for the design of
the formula, the book has its favorite, it doesn’t
matter so much which you use.

The main goal here is to make sure that your
Dynamically learning NPCs are generating
variability, exploring new behaviors to adapt to
the players strategy, but still utilizing strategies
that work.

In games we don’t want to create the ultimate AI to defeat
the player every time. Not everyone is this good


Therefore we do the following:

When the computer loses a fight, change the weights such
that the computer focuses on successful behaviors instead of
experimenting with new ones.

When the computer wins a fight, change the weights such
that the computer focuses on varying and experimental
strategies instead of ones that it knows are successful
against the particular player.

If these two things happen correctly, the player will find
themselves facing an AI that beats them roughly 50% of the

Supervised learning is a machine learning
technique for creating a function from training
data. The training data consist of pairs of input
objects (typically vectors), and desired outputs.
The output of the function can be a continuous
value, or can predict a class label of the input
object. The task of the supervised learner is to
predict the value of the function for any valid
input object after having seen a number of training
examples (i.e. pairs of input and target output). To
achieve this, the learner has to generalize from the
presented data to unseen situations in a
"reasonable" way.

Unsupervised learning is a type of machine
language where manual labels of inputs are not
used. It is distinguished from supervised
learning approaches which learn how to
perform a task, such as classification or
regression, using a set of human prepared

In reality these 4 different strategies
intermingle. You can have direct supervised,
indirect unsupervised, etc.

An example: Supervised Direct Adaptation
might be your self
learning AI that the
developers run for a game for several days
worth of run/test time before releasing it to the
public to figure out some of the best strategies
for their AI to implement.

The concept: Many video games or games in
general boil down to extremely overcomplicated
versions of rock paper scissors. In a fighting game
you have abilities such as kick/punch/block and
many abilities are good at countering other

If a

can predict what the player is going to do
more reliably, it can win more reliably.

Or if an opponent always does the same thing in
an FPS game: Moves to pick up his favorite
weapon, gets a
, gets some armor, then goes
to his favorite hiding spot in that order, we can
more reliably counteract their plans.

Basic Idea: If you had to predict what the next
number would be in the following string
sequence how would you do it?


Find the longest substring that matches the tail
end of the string, then figure out what comes
after that. What…? The answer to the example
might help elaborate.


It’s 1. The substring is 01101 and the following
number is 1.

Problem: O(N^2) run time on this algorithm.

The solution is to instead, update our
knowledge base of match sizes incrementally.
A picture speaks a thousand words. Instead of
trying to explain this in words, here comes the
next page.

RockPaperScissors Program:

Compute all matching substrings, not just the
longest one.

Prediction Value = Length(S)/

Take the sum of the prediction value for each time
the same length occurred.

The idea here is that smaller string matches that
occurred recently may be more reasonable than
longer ones in the past and string matches that
have occurred multiple times are much more
reasonable than ones that have only occurred once
or twice.

There are several different methods of machine
learning one can use for various effects on

Any good examples of this happening? Yes.

Black and White is an excellent example of a
game which uses mostly Supervised Indirect




There are many simple, non
eating methods
to utilize machine learning in a game.

Although it has been done before, the AI that exists in
many new games is pathetically boring and a few
simple tweaks would make mountains out of molehills
with learning AI that can adapt to, counteract, and
learn strategies used by the player. I would soil myself
if AI ever learned some nonsense like this from


Should anyone in this class ever be making a computer
game, consider using some simple machine learning
concepts to improve the game greatly.