Soar One
-
hour Tutorial
John E. Laird
University of Michigan
March 2009
http://sitemaker.umich.edu/soar
laird@umich.edu
Supported in part by DARPA and ONR
1
Tutorial Outline
1.
Cognitive Architecture
2.
Soar History
3.
Overview of Soar
4.
Details of Basic Soar Processing and Syntax
–
Internal decision cycle
–
Interaction with external environments
–
Subgoals and meta
-
reasoning
–
Chunking
5.
Recent extensions to Soar
–
Reinforcement Learning
–
Semantic Memory
–
Episodic Memory
–
Visual Imagery
2
Learning
How can we build a human
-
level AI
?
3
Tasks
Neurons
Neural
Circuits
Brain
Structure
Calculus
History
Reading
Sudoku
Shopping
Driving
Talking on
cell phone
Learning
How can we build a human
-
level AI
?
Tasks
Neurons
Neural
Circuits
Brain
Structure
Calculus
History
Reading
Sudoku
Shopping
Driving
Talking on
cell phone
4
Programs
Computer
Architecture
Logic
Circuits
Electrical
circuits
Learning
How can we build a human
-
level AI
?
Tasks
Neurons
Neural
Circuits
Brain
Structure
Calculus
History
Reading
Sudoku
Shopping
Driving
Talking on
cell phone
5
Programs
Computer
Architecture
Logic
Circuits
Electrical
circuits
Symbolic Long
-
Term Memories
Procedural
Symbolic Short
-
Term Memory
Decision
Procedure
Chunking
Reinforcement
Learning
Semantic
Semantic
Learning
Episodic
Episodic
Learning
Perception
Action
Imagery
Appraisals
Cognitive
Architecture
Body
Cognitive Architecture
Fixed mechanisms underlying cognition
–
Memories, processing elements, control, interfaces
–
Representations of knowledge
–
Separation of fixed processes and variable knowledge
–
Complex behavior arises from composition of simple
primitives
Purpose:
–
Bring knowledge to bear to select actions to achieve
goals
Not just a framework
–
BDI, NN, logic & probability, rule
-
based systems
Important constraints:
–
Continual performance
–
Real
-
time performance
–
Incremental, on
-
line learning
Architecture
Knowledge
Goals
Task Environment
6
Common Structures of many
Cognitive Architectures
7
Short
-
term
Memory
Procedural
Long
-
term
Memory
Declarative
Long
-
term
Memory
Perception
Action
Action
Selection
Procedure
Learning
Declarative
Learning
Goals
Different Goals of
Cognitive Architecture
•
Biological plausibility: Does the architecture
correspond to what we know about the brain?
•
Psychological plausibility: Does the architecture
capture the details of human performance in a wide
range of cognitive tasks?
•
Functionality: Does the architecture explain how
humans achieve their high level of intellectual
function?
–
Building Human
-
level AI
8
Short History of Soar
9
1980
1995
1985
1990
2000
2005
Pre
-
Soar
Problem
Spaces
Production
Systems
Heuristic
Search
Functionality
Modeling
Multi
-
method
Multi
-
task
problem
solving
Subgoaling
Chunking
UTC
Natural
Language
HCI
External
Environment
Integration
Large bodies of
knowledge
Teamwork
Real
Application
Virtual Agents
Learning from
Experience,
Observation,
Instruction
New
Capabilities
Distinctive Features of Soar
•
Emphasis on functionality
–
Take engineering, scaling issues seriously
–
Interfaces to real world systems
–
Can build very large systems in Soar that exist for a long time
•
Integration with perception and action
–
Mental imagery and spatial reasoning
•
Integrates reaction, deliberation, meta
-
reasoning
–
Dynamically switching between them
•
Integrated learning
–
Chunking, reinforcement learning, episodic & semantic
•
Useful in cognitive modeling
–
Expanding this is emphasis of many current projects
•
Easy to integrate with other systems & environments
–
SML efficiently supports many languages, inter
-
process
10
System Architecture
Soar Kernel
gSKI
KernelSML
ClientSML
SWIG Language
Layer
Application
SML
Soar 9.0 Kernel
(C)
Higher
-
level Interface (C++)
Encodes/Decodes function calls
and responses in XML (C++)
Soar Markup Language
Encodes/Decodes function calls
and responses in XML (C++)
Wrapper for Java/Tcl (Not
needed if app is in C++)
Application (any language)
Soar Basics
•
Operators: Deliberate changes to internal/external state
•
Activity is a series of operators controlled by knowledge:
1.
Input
from environment
2.
Elaborate current situation:
parallel rules
3.
Propose and evaluate operators via preferences:
parallel rules
4.
Select operator
5.
Apply operator: Modify internal data structures:
parallel rules
6.
Output
to motor system
12
Agent in real or virtual world
?
Agent in new
state
?
Agent in new state
Operator
Basic Soar Architecture
Body
Long
-
Term Memory
Procedural
Symbolic Short
-
Term Memory
Decision
Procedure
Chunking
Perception
Action
Elaborate
Operator
Output
Input
Elaborate State
Propose Operators
Evaluate Operators
Select Operator
Apply Operator
Apply
Decide
13
Evaluate
Operators
Evaluate
Operators
Production
Memory
Working
Memory
Soar 101: Eaters
East
South
North
Propose
Operator
North > East
South > East
North = South
Apply
Operator
Output
Input
Select
Operator
If cell in direction <d>
is not a wall,
--
>
propose operator
move <d>
If operator <o1> will move to a
bonus food and operator <o2>
will move to a normal food,
--
>
operator <o1> > <o2>
If an operator is
selected to move <d>
--
>
create output
move
-
direction <d>
Input
Propose
Operator
Select
Operator
Apply
Operator
Output
If operator <o1> will move to a
empty cell
--
>
operator <o1> <
North > East
South <
move
-
direction
North
Example Working Memory
B
A
(s1 ^block b1 ^block b2 ^table t1)
(b1 ^color blue ^name A ^ontop b2 ^size 1
^type block ^weight 14)
(b2 ^color yellow ^name B ^ontop t1 ^size 1
^type block ^under b1 ^weight 14)
(t1 ^color gray ^shape square
^type table ^under b2)
Working memory is a graph.
All working memory elements must be “linked” directly or indirectly to a
state
.
S1
b1
t1
b2
^block
^block
^table
yellow
block
1
B
14
^color
^name
^size
^type
^weight
^under
^ontop
15
Soar Processing Cycle
16
Elaborate
Operator
Output
Input
Elaborate State
Propose Operators
Evaluate Operators
Select Operator
Apply Operator
Apply
Decide
Rules
Impasse
Subgoal
Elaborate
Operator
Output
Input
Elaborate State
Propose Operators
Evaluate Operators
Select Operator
Apply Operator
Apply
Decide
TankSoar
Red Tank’s
Shield
Borders
(stone)
Walls
(trees)
Health
charger
Missile
pack
Blue tank
(Ouch!)
Energy
charger
Green
tank’s radar
17
Soar 103: Subgoals
Propose
Operator
Compare
Operators
Apply
Operator
Output
Input
Select
Operator
Input
Propose
Operator
Compare
Operators
Select
Operator
Move
Wander
If enemy not
sensed, then wander
Turn
Apply
Operator
Output
Soar 103: Subgoals
Propose
Operator
Compare
Operators
Apply
Operator
Output
Input
Select
Operator
Attack
If enemy is sensed,
then attack
Shoot
TacAir
-
Soar [1997]
Controls simulated aircraft in
real
-
time training exercises
(>3000 entities
)
Flies all U.S. air
missions
Dynamically changes missions as
appropriate
Communicates and coordinates
with computer and human
controlled
planes
Large knowledge base
(8000 rules)
No learning
TacAir
-
Soar Task Decomposition
Achieve
Proximity
Employ
Weapons
Search
Execute
Tactic
Scram
Get Missile
LAR
Select
Missile
Get Steering
Circle
Sort
Group
Launch
Missile
Lock Radar
Lock IR
Fire
-
Missile
Wait
-
for
Missile
-
Clear
If intercepting an enemy and
the enemy is within range
ROE are met then
propose employ
-
weapons
Employ
Weapons
If employing
-
weapons and
missile has been selected and
the enemy is in the steering
circle and LAR has been
achieved,
then propose launch
-
missile
Launch
Missile
If launching a missile and
it is an IR missile and
there is currently no IR lock
then propose lock
-
IR
Lock IR
Execute
Mission
Fly
-
route
Ground
Attack
Fly
-
Wing
Intercept
If instructed to intercept an
enemy then
propose intercept
Intercept
>250 goals, >600 operators, >8000 rules
21
Impasse/Substate Implications:
•
Substate is really meta
-
state that allows system to reflect
•
Substate = goal to resolve impasse
–
Generate operator
–
Select operator (deliberate control)
–
Apply operator (task decomposition)
•
All basic problem solving functions open to reflection
–
Operator creation, selection, application, state elaboration
•
Substate is where knowledge to resolve impasse can be found
•
Hierarchy of substate/subgoals arise through recursive impasses
22
Tie Subgoals and Chunking
East
South
North
Propose
Operator
Evaluate
Operators
Apply
Operator
Output
Input
Select
Operator
Input
Propose
Operator
Evaluate
Operators
Select
Operator
Tie
Impasse
Evaluate
-
operator
(North)
North
= 10
Evaluate
-
operator
(South)
Evaluate
-
operator
(East)
= 10
= 10
= 5
Chunking creates
rule that applies
evaluate
-
operator
North > East
South > East
North = South
= 10
Chunking creates
rules that create preferences
based on what was tested
Chunking Analysis
•
Converts deliberate reasoning/planning to reaction
•
Generality of learning based on generality of reasoning
–
Leads to many different types learning
–
If reasoning is inductive, so is learning
•
Soar only learns what it thinks about
•
Chunking is impasse driven
–
Learning arises from a lack of knowledge
24
Extending Soar
•
Learn from internal rewards
–
Reinforcement learning
•
Learn facts
–
What you know
–
Semantic memory
•
Learn events
–
What you remember
–
Episodic memory
•
Basic drives and …
–
Emotions, feelings, mood
•
Non
-
symbolic reasoning
–
Mental imagery
•
Learn from regularities
–
Spatial and temporal clusters
Body
Symbolic Long
-
Term Memories
Procedural
Symbolic Short
-
Term Memory
Decision
Procedure
Chunking
Reinforcement
Learning
Semantic
Semantic
Learning
Episodic
Episodic
Learning
Perception
Action
Visual
Imagery
Appraisal
Detector
Episodic
Episodic
Learning
Semantic
Semantic
Learning
Visual
Imagery
Appraisal
Detector
Reinforcement
Learning
Clustering
Clustering
25
Theoretical Commitments
Stayed the Same
•
Problem Space Computational Model
•
Long
-
term & short
-
term memories
•
Associative procedural knowledge
•
Fixed decision procedure
•
Impasse
-
driven reasoning
•
Incremental, experience
-
driven
learning
•
No task
-
specific modules
Changed
•
Multiple long
-
term memories
•
Multiple learning mechanisms
•
Modality
-
specific representations &
processing
•
Non
-
symbolic processing
–
Symbol generation
(clustering)
–
Control (numeric preferences)
–
Learning Control (reinforcement learning)
–
Intrinsic reward (appraisals)
–
Aid memory retrieval (WM activation)
–
Non
-
symbolic reasoning (visual imagery)
26
Reinforcement Learning
Shelly
Nason
27
RL in Soar
1.
Encode the value function as operator evaluation
rules with numeric preferences.
2.
Combine all numeric preferences for an operator
dynamically.
3.
Adjust value of numeric preferences with
experience.
Internal State
Value
Function
Perception
Reward
Update Value
Function
Action
Selection
Action
28
The Q
-
function in Soar
The value
-
function is stored in rules that test the
state and operator,
and
create numeric
preferences
.
sp {
rl
-
rule
(
state <s> ^operator <o> +)
…
--
>
(<
s> ^operator <o> =
0.34)}
Operator
Q
-
value
= the
sum of all numeric
preferences.
Selection: epsilon greedy, or Boltzmann
O1: {.34, .45, .02} = 8.1
O2: {.25, .11, .12} = 4.8
O3: {
-
.04, .14,
-
.05} = .05
epsilon
-
greedy:
With probability ε the
agent selects an action at random.
Otherwise the agent takes the action
with the highest expected value.
[Balance exploration/exploitation]
29
Updating operator values
Sarsa
update:
Q(s,O1)
Q(s,O1) +
α
[r +
λ
Q
(s’,O2)
–
Q(s,O1
)]
.1 * [.2 + .9*.11
-
.33] =
-
.03
Update is split evenly between rules contributing to O1 =
-
.01.
R1 = .19, R2 = .14, R3 =
-
.03
O1
= .33
Q(s,O1)
= sum
of numeric
prefs
.
r =
reward = .2
O2
= .11
Q(s’,O2)
=
sum of numeric
prefs
. of
selected operator (O2)
R1(O1) = .20
R2(O1) = .15
R3(O1)=
-
.02
30
Results with Eaters
0
200
400
600
800
1000
1200
1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
Total Score
Move #
Figure 2a rule
Random
After 5
After 10
After 15
After 20
31
RL TankSoar Agent
-20
-10
0
10
20
30
40
50
60
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
Successive Games
Average Margin of Victory
32
Semantic Memory
Yongjia Wang
33
Memory Systems
Memory
Long Term Memory
Short Term Memory
Declarative
Procedural
Semantic
Memory
Episodic
Memory
Perceptual
Representation
System
Procedural
Memory
Working
Memory
34
Declarative Memory Alternatives
•
Working Memory
–
Keep everything in working memory
•
Retrieve dynamically with rules
–
Rules provide asymmetric access
–
Data chunking to learn (complex)
•
Separate Declarative Memories
–
Semantic memory (facts)
–
Episodic
memory (events
)
35
Basic Semantic Memory Functionalities
•
Encoding
–
What to save?
–
When to add new declarative chunk?
–
How to update knowledge?
•
Retrieval
–
How the cue is placed and matched?
–
What are the different types of retrieval?
•
Storage
–
What are the storage structures?
–
How are they maintained?
36
Semantic
Memory Functionalities
A
B
A
state
B
Cue
A
Expand
NIL
NIL
Expand
Cue
C
D
E
F
D
E
F
E
E
Save
NIL
Save
Save
Feature Match
Retrieval
Update with Complex Structure
AutoCommit
Remove
-
No
-
Change
Semantic
Memory
Working
Memory
37
Episodic Memory
Andrew Nuxoll
38
Memory Systems
Memory
Long Term Memory
Short Term Memory
Declarative
Procedural
Semantic
Memory
Episodic
Memory
Perceptual
Representation
System
Procedural
Memory
Working
Memory
39
Episodic vs. Semantic Memory
•
Semantic Memory
–
Knowledge of what we “know”
–
Example: what state the Grand Canyon
is in
•
Episodic Memory
–
History of specific events
–
Example: a family vacation to the
Grand Canyon
Characteristics of Episodic Memory: Tulving
•
Architectural:
–
Does not compete with reasoning.
–
Task independent
•
Automatic:
–
Memories created without deliberate decision.
•
Autonoetic
:
–
Retrieved memory is distinguished from sensing.
•
Autobiographical:
–
Episode remembered from own perspective.
•
Variable Duration:
–
The time period spanned by a memory is not fixed.
•
Temporally Indexed:
–
Rememberer
has a sense of when the episode occurred.
41
Long
-
term Procedural Memory
Production Rules
Implementation
Encoding
Initiation?
Storage
Retrieval
When the agent takes an action.
Input
Output
Cue
Retrieved
Working Memory
42
Long
-
term Procedural Memory
Production Rules
Current Implementation
Encoding
Initiation
Content?
Storage
Retrieval
The entire working memory is stored in the episode
Input
Output
Cue
Retrieved
Working Memory
43
Long
-
term Procedural Memory
Production Rules
Current Implementation
Encoding
Initiation
Content
Storage
Episode Structure?
Retrieval
Episodes are stored in a separate memory
Input
Output
Cue
Retrieved
Working Memory
Episodic
Memory
Episodic
Learning
44
Long
-
term Procedural Memory
Production Rules
Current Implementation
Encoding
Initiation
Content
Storage
Episode Structure
Retrieval
Initiation/Cue?
Cue is placed in an architecture specific buffer.
Input
Output
Cue
Retrieved
Working Memory
Episodic
Memory
Episodic
Learning
45
Episodic
Memory
Long
-
term Procedural Memory
Production Rules
Current Implementation
Encoding
Initiation
Content
Storage
Episode Structure
Retrieval
Initiation/Cue
Retrieval
The closest partial match is retrieved.
Input
Output
Cue
Retrieved
Working Memory
Episodic
Learning
46
Cognitive Capability: Virtual
Sensing
•
Retrieve prior perception that
is relevant to the current task
•
Tank recursively searches
memory
–
Have I seen a charger from here?
–
Have I seen a place where I can
see a charger?
?
47
Virtual Sensors Results
0
50
100
150
200
250
1
3
5
7
9
11
13
15
17
19
Subsequent Searches
Average Number of Moves
Average Random
Episodic Memory
48
Create
a memory
cue
East
South
North
Evaluate moving in each available direction
Cognitive Capability: Action
Modeling
49
Episodic
Retrieval
Retrieve
the best matching memory
Retrieve
Next Memory
Retrieve
the
next
memory
Use
the change in score to evaluate the proposed action
Move North = 10 points
Agent’s knowledge is insufficient
-
impasse
Agent attempts to choose direction
Episodic Memory:
Multi
-
Step Action Projection
[Andrew Nuxoll]
•
Learn tactics from prior success and failure
–
Fight/flight
–
Back away from enemy (and fire)
–
Dodging
-30
-20
-10
0
10
20
30
40
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
Margin of Victory
Successive Games
Average Margin of
Victory
Enables Cognitive Capabilities
•
Sensing
–
Detect Changes
–
Detect Repetition
–
Virtual Sensing
•
Reasoning
–
Model Actions
–
Use Previous
Successes/Failures
–
Model the Environment
–
Manage Long Term Goals
–
Explain Behavior
•
Learning
–
Retroactive Learning
–
Allows Reanalysis Given New
Knowledge
–
“Boost” other Learning
Mechanisms
Episodic Memory
51
Mental Imagery and Spatial Reasoning
Scott Lathrop
Sam Wintermute
See AGI Talks
52
•
Shape, color, topology, spatial properties
•
Depictive, pixel
-
based representations
•
Image algebra algorithms
Sentential/Algebraic algorithms
Depictive/Ordinal algorithms
VISUAL IMAGERY
VISUAL
-
SPATIAL
VISUAL
-
DEPICTIVE
•
Location, orientation
•
Sentential, quantitative
representations
•
Linear algebra and
computational geometry
algorithms
WHAT IS VISUAL IMAGERY?
53
Where can you put A next to I?
54
Spatial Problem Solving with Mental Imagery
[Scott Lathrop & Sam Wintermute]
Environment
Spatial Scene
Soar
Qualitative descriptions
of object relationships
Qualitative description of new objects in
relation to existing objects
Quantitative descriptions of
environmental objects
O
A
A’
A
’
(on
AI)
(
imagine_left_of
A I)
(intersect
A′ O)
(
no_intersect
A’)
(
imagine_right_of
A I)
(
move_right_of
A I)
I
Upcoming Challenges
•
Continued refinement and integration
•
Integrate with complex perception and motor
systems
•
Adding/learning lots of world knowledge
+
Language, Spatial, Temporal Reasoning, …
•
Scaling up to large bodies of knowledge
–
Build up from instruction, experience, exploration, …
56
Soar Community
•
Soar Website
–
http://sitemaker.umich.edu/soar
•
Soar Workshop every June in Ann Arbor
–
June 22
-
26, 2009
•
Soar
-
group
–
http://lists.sourceforge.net/lists/listinfo/soar
-
group
–
Low traffic
57
Thanks to
Funding Agencies:
NSF, DARPA, ONR
Ph.D. students:
Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert
Marinier, Andrew Nuxoll, Yongjia Wang, Samuel
Wintermute, Joseph Xu
Research Programmers:
Karen Coulter, Jonathan Voigt
Continued inspiration:
Allen Newell
58
Challenges in
Cognitive Architecture Research
•
Dynamic taskability
–
Pursue novel tasks
•
Learning
–
Always learning, learning in unexpected and unplanned ways (
wild learning)
–
Transition from programming to learning by imitation, instruction, experience,
reflection, …
•
Natural language
–
Active area but much left to do.
•
Social behavior
–
Interaction with humans and other entities
•
Connect to the real world
–
Cognitive robotics with long
-
term existence
•
Applications
–
Expand domains and problems
–
Putting cognitive architectures to work
•
Connect to unfolding research on the brain, psychology, and the rest of AI.
60
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Commentaires 0
Connectez-vous pour poster un commentaire