Soar One-hour Tutorial

quietplumIA et Robotique

23 févr. 2014 (il y a 3 années et 8 mois)

113 vue(s)

Soar One
-
hour Tutorial

John E. Laird

University of Michigan

March 2009

http://sitemaker.umich.edu/soar

laird@umich.edu

Supported in part by DARPA and ONR

1

Tutorial Outline

1.
Cognitive Architecture

2.
Soar History

3.
Overview of Soar

4.
Details of Basic Soar Processing and Syntax


Internal decision cycle


Interaction with external environments


Subgoals and meta
-
reasoning


Chunking

5.
Recent extensions to Soar


Reinforcement Learning


Semantic Memory


Episodic Memory


Visual Imagery

2

Learning

How can we build a human
-
level AI
?

3

Tasks

Neurons

Neural
Circuits

Brain
Structure

Calculus

History

Reading

Sudoku

Shopping

Driving

Talking on
cell phone

Learning

How can we build a human
-
level AI
?

Tasks

Neurons

Neural
Circuits

Brain
Structure

Calculus

History

Reading

Sudoku

Shopping

Driving

Talking on
cell phone

4

Programs

Computer
Architecture

Logic
Circuits

Electrical
circuits

Learning

How can we build a human
-
level AI
?

Tasks

Neurons

Neural
Circuits

Brain
Structure

Calculus

History

Reading

Sudoku

Shopping

Driving

Talking on
cell phone

5

Programs

Computer
Architecture

Logic
Circuits

Electrical
circuits

Symbolic Long
-
Term Memories

Procedural

Symbolic Short
-
Term Memory

Decision
Procedure

Chunking

Reinforcement

Learning

Semantic

Semantic

Learning

Episodic

Episodic

Learning

Perception

Action

Imagery

Appraisals

Cognitive

Architecture

Body

Cognitive Architecture

Fixed mechanisms underlying cognition


Memories, processing elements, control, interfaces


Representations of knowledge


Separation of fixed processes and variable knowledge


Complex behavior arises from composition of simple
primitives

Purpose:


Bring knowledge to bear to select actions to achieve
goals

Not just a framework


BDI, NN, logic & probability, rule
-
based systems


Important constraints:


Continual performance


Real
-
time performance


Incremental, on
-
line learning


Architecture

Knowledge

Goals

Task Environment

6

Common Structures of many

Cognitive Architectures

7


Short
-
term
Memory

Procedural

Long
-
term
Memory

Declarative

Long
-
term
Memory

Perception

Action

Action

Selection

Procedure
Learning

Declarative
Learning

Goals

Different Goals of

Cognitive Architecture



Biological plausibility: Does the architecture
correspond to what we know about the brain?



Psychological plausibility: Does the architecture
capture the details of human performance in a wide
range of cognitive tasks?



Functionality: Does the architecture explain how
humans achieve their high level of intellectual
function?


Building Human
-
level AI


8

Short History of Soar

9

1980

1995

1985

1990

2000

2005

Pre
-
Soar

Problem

Spaces

Production
Systems

Heuristic

Search

Functionality

Modeling

Multi
-
method

Multi
-
task
problem
solving

Subgoaling

Chunking

UTC

Natural
Language

HCI

External
Environment

Integration

Large bodies of
knowledge

Teamwork

Real
Application

Virtual Agents

Learning from
Experience,
Observation,
Instruction

New
Capabilities

Distinctive Features of Soar


Emphasis on functionality


Take engineering, scaling issues seriously


Interfaces to real world systems


Can build very large systems in Soar that exist for a long time


Integration with perception and action


Mental imagery and spatial reasoning


Integrates reaction, deliberation, meta
-
reasoning


Dynamically switching between them


Integrated learning


Chunking, reinforcement learning, episodic & semantic


Useful in cognitive modeling


Expanding this is emphasis of many current projects


Easy to integrate with other systems & environments


SML efficiently supports many languages, inter
-
process

10

System Architecture

Soar Kernel

gSKI

KernelSML

ClientSML

SWIG Language

Layer

Application

SML

Soar 9.0 Kernel
(C)

Higher
-
level Interface (C++)

Encodes/Decodes function calls
and responses in XML (C++)

Soar Markup Language

Encodes/Decodes function calls
and responses in XML (C++)

Wrapper for Java/Tcl (Not
needed if app is in C++)

Application (any language)

Soar Basics


Operators: Deliberate changes to internal/external state


Activity is a series of operators controlled by knowledge:

1.
Input

from environment

2.
Elaborate current situation:
parallel rules

3.
Propose and evaluate operators via preferences:
parallel rules

4.
Select operator

5.
Apply operator: Modify internal data structures:
parallel rules

6.
Output

to motor system


12

Agent in real or virtual world

?

Agent in new
state

?

Agent in new state

Operator

Basic Soar Architecture

Body

Long
-
Term Memory

Procedural

Symbolic Short
-
Term Memory

Decision
Procedure

Chunking

Perception

Action

Elaborate

Operator

Output

Input

Elaborate State

Propose Operators

Evaluate Operators

Select Operator

Apply Operator

Apply

Decide

13

Evaluate

Operators

Evaluate

Operators

Production

Memory

Working

Memory

Soar 101: Eaters

East

South

North

Propose

Operator

North > East

South > East

North = South

Apply

Operator

Output

Input

Select

Operator

If cell in direction <d>
is not a wall,

--
>

propose operator

move <d>

If operator <o1> will move to a
bonus food and operator <o2>
will move to a normal food,

--
>

operator <o1> > <o2>

If an operator is
selected to move <d>

--
>

create output

move
-
direction <d>

Input

Propose

Operator

Select

Operator

Apply

Operator

Output

If operator <o1> will move to a
empty cell

--
>

operator <o1> <

North > East

South <

move
-
direction
North

Example Working Memory

B

A

(s1 ^block b1 ^block b2 ^table t1)

(b1 ^color blue ^name A ^ontop b2 ^size 1


^type block ^weight 14)

(b2 ^color yellow ^name B ^ontop t1 ^size 1


^type block ^under b1 ^weight 14)

(t1 ^color gray ^shape square


^type table ^under b2)

Working memory is a graph.

All working memory elements must be “linked” directly or indirectly to a
state
.


S1

b1

t1

b2

^block

^block

^table

yellow

block

1

B

14

^color

^name

^size

^type

^weight

^under

^ontop

15

Soar Processing Cycle

16

Elaborate

Operator

Output

Input

Elaborate State

Propose Operators

Evaluate Operators

Select Operator

Apply Operator

Apply

Decide

Rules



Impasse

Subgoal



Elaborate

Operator

Output

Input

Elaborate State

Propose Operators

Evaluate Operators

Select Operator

Apply Operator

Apply

Decide

TankSoar

Red Tank’s
Shield

Borders
(stone)


Walls
(trees)


Health
charger



Missile
pack


Blue tank
(Ouch!)


Energy
charger

Green
tank’s radar

17

Soar 103: Subgoals

Propose

Operator

Compare

Operators

Apply

Operator

Output

Input

Select

Operator

Input

Propose

Operator

Compare

Operators

Select

Operator

Move

Wander

If enemy not
sensed, then wander

Turn

Apply

Operator

Output

Soar 103: Subgoals

Propose

Operator

Compare

Operators

Apply

Operator

Output

Input

Select

Operator

Attack

If enemy is sensed,
then attack

Shoot

TacAir
-
Soar [1997]

Controls simulated aircraft in
real
-
time training exercises
(>3000 entities
)


Flies all U.S. air
missions


Dynamically changes missions as
appropriate


Communicates and coordinates
with computer and human
controlled
planes


Large knowledge base

(8000 rules)


No learning

TacAir
-
Soar Task Decomposition


Achieve

Proximity

Employ

Weapons

Search

Execute

Tactic

Scram

Get Missile

LAR

Select

Missile

Get Steering

Circle

Sort

Group

Launch

Missile

Lock Radar

Lock IR

Fire
-
Missile

Wait
-
for

Missile
-
Clear

If intercepting an enemy and

the enemy is within range

ROE are met then

propose employ
-
weapons

Employ

Weapons

If employing
-
weapons and

missile has been selected and

the enemy is in the steering
circle and LAR has been
achieved,

then propose launch
-
missile

Launch

Missile

If launching a missile and

it is an IR missile and

there is currently no IR lock

then propose lock
-
IR

Lock IR

Execute

Mission

Fly
-
route

Ground

Attack

Fly
-
Wing

Intercept

If instructed to intercept an
enemy then

propose intercept

Intercept

>250 goals, >600 operators, >8000 rules

21

Impasse/Substate Implications:


Substate is really meta
-
state that allows system to reflect


Substate = goal to resolve impasse


Generate operator


Select operator (deliberate control)


Apply operator (task decomposition)


All basic problem solving functions open to reflection


Operator creation, selection, application, state elaboration


Substate is where knowledge to resolve impasse can be found


Hierarchy of substate/subgoals arise through recursive impasses

22

Tie Subgoals and Chunking

East

South

North

Propose

Operator

Evaluate

Operators

Apply

Operator

Output

Input

Select

Operator

Input

Propose

Operator

Evaluate

Operators

Select

Operator

Tie
Impasse

Evaluate
-
operator

(North)

North

= 10

Evaluate
-
operator

(South)

Evaluate
-
operator

(East)

= 10

= 10

= 5

Chunking creates


rule that applies

evaluate
-
operator

North > East

South > East

North = South

= 10

Chunking creates


rules that create preferences
based on what was tested

Chunking Analysis


Converts deliberate reasoning/planning to reaction


Generality of learning based on generality of reasoning


Leads to many different types learning


If reasoning is inductive, so is learning


Soar only learns what it thinks about


Chunking is impasse driven


Learning arises from a lack of knowledge

24

Extending Soar


Learn from internal rewards


Reinforcement learning


Learn facts


What you know


Semantic memory


Learn events


What you remember


Episodic memory


Basic drives and …


Emotions, feelings, mood


Non
-
symbolic reasoning


Mental imagery


Learn from regularities


Spatial and temporal clusters

Body

Symbolic Long
-
Term Memories

Procedural

Symbolic Short
-
Term Memory

Decision
Procedure

Chunking

Reinforcement

Learning

Semantic

Semantic

Learning

Episodic

Episodic

Learning

Perception

Action

Visual

Imagery

Appraisal
Detector

Episodic

Episodic

Learning

Semantic

Semantic

Learning

Visual

Imagery

Appraisal
Detector

Reinforcement

Learning

Clustering

Clustering

25

Theoretical Commitments

Stayed the Same


Problem Space Computational Model


Long
-
term & short
-
term memories


Associative procedural knowledge


Fixed decision procedure


Impasse
-
driven reasoning


Incremental, experience
-
driven
learning


No task
-
specific modules

Changed


Multiple long
-
term memories


Multiple learning mechanisms


Modality
-
specific representations &
processing


Non
-
symbolic processing


Symbol generation
(clustering)


Control (numeric preferences)


Learning Control (reinforcement learning)


Intrinsic reward (appraisals)


Aid memory retrieval (WM activation)


Non
-
symbolic reasoning (visual imagery)


26

Reinforcement Learning

Shelly
Nason

27

RL in Soar

1.
Encode the value function as operator evaluation
rules with numeric preferences.

2.
Combine all numeric preferences for an operator
dynamically.

3.
Adjust value of numeric preferences with
experience.

Internal State

Value
Function

Perception

Reward

Update Value

Function

Action
Selection

Action

28

The Q
-
function in Soar

The value
-
function is stored in rules that test the
state and operator,
and
create numeric
preferences
.


sp {
rl
-
rule


(
state <s> ^operator <o> +)





--
>


(<
s> ^operator <o> =
0.34)}


Operator
Q
-
value
= the
sum of all numeric
preferences.

Selection: epsilon greedy, or Boltzmann

O1: {.34, .45, .02} = 8.1

O2: {.25, .11, .12} = 4.8

O3: {
-
.04, .14,
-
.05} = .05

epsilon
-
greedy:
With probability ε the
agent selects an action at random.
Otherwise the agent takes the action
with the highest expected value.

[Balance exploration/exploitation]

29

Updating operator values

Sarsa

update:

Q(s,O1)


Q(s,O1) +
α
[r +
λ
Q
(s’,O2)


Q(s,O1
)]




.1 * [.2 + .9*.11
-

.33] =
-
.03


Update is split evenly between rules contributing to O1 =
-
.01.

R1 = .19, R2 = .14, R3 =
-
.03


O1
= .33

Q(s,O1)
= sum
of numeric
prefs
.

r =
reward = .2

O2
= .11

Q(s’,O2)
=
sum of numeric
prefs
. of
selected operator (O2)

R1(O1) = .20

R2(O1) = .15

R3(O1)=
-
.02

30

Results with Eaters

0
200
400
600
800
1000
1200
1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
Total Score
Move #
Figure 2a rule
Random
After 5
After 10
After 15
After 20
31

RL TankSoar Agent

-20
-10
0
10
20
30
40
50
60
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
Successive Games
Average Margin of Victory
32

Semantic Memory

Yongjia Wang

33

Memory Systems

Memory

Long Term Memory

Short Term Memory

Declarative

Procedural

Semantic
Memory

Episodic
Memory

Perceptual
Representation
System

Procedural
Memory

Working
Memory

34

Declarative Memory Alternatives


Working Memory


Keep everything in working memory


Retrieve dynamically with rules


Rules provide asymmetric access


Data chunking to learn (complex)


Separate Declarative Memories


Semantic memory (facts)


Episodic
memory (events
)

35

Basic Semantic Memory Functionalities


Encoding


What to save?


When to add new declarative chunk?


How to update knowledge?


Retrieval


How the cue is placed and matched?


What are the different types of retrieval?


Storage


What are the storage structures?


How are they maintained?

36

Semantic
Memory Functionalities

A

B

A

state

B

Cue

A

Expand

NIL

NIL

Expand

Cue

C

D

E

F

D

E

F

E

E

Save

NIL

Save

Save

Feature Match

Retrieval

Update with Complex Structure

AutoCommit

Remove
-
No
-
Change

Semantic
Memory

Working
Memory

37

Episodic Memory

Andrew Nuxoll

38

Memory Systems

Memory

Long Term Memory

Short Term Memory

Declarative

Procedural

Semantic
Memory

Episodic
Memory

Perceptual
Representation
System

Procedural
Memory

Working
Memory

39

Episodic vs. Semantic Memory


Semantic Memory


Knowledge of what we “know”


Example: what state the Grand Canyon
is in


Episodic Memory


History of specific events


Example: a family vacation to the


Grand Canyon

Characteristics of Episodic Memory: Tulving


Architectural:



Does not compete with reasoning.


Task independent


Automatic:



Memories created without deliberate decision.


Autonoetic
:



Retrieved memory is distinguished from sensing.


Autobiographical:



Episode remembered from own perspective.


Variable Duration:



The time period spanned by a memory is not fixed.


Temporally Indexed:



Rememberer

has a sense of when the episode occurred.

41

Long
-
term Procedural Memory

Production Rules

Implementation

Encoding


Initiation?


Storage


Retrieval






When the agent takes an action.

Input

Output

Cue

Retrieved

Working Memory

42

Long
-
term Procedural Memory

Production Rules

Current Implementation

Encoding


Initiation


Content?

Storage


Retrieval






The entire working memory is stored in the episode

Input

Output

Cue

Retrieved

Working Memory

43

Long
-
term Procedural Memory

Production Rules

Current Implementation

Encoding


Initiation


Content

Storage


Episode Structure?

Retrieval





Episodes are stored in a separate memory

Input

Output

Cue

Retrieved

Working Memory

Episodic

Memory

Episodic

Learning

44

Long
-
term Procedural Memory

Production Rules

Current Implementation

Encoding


Initiation


Content

Storage


Episode Structure

Retrieval


Initiation/Cue?





Cue is placed in an architecture specific buffer.

Input

Output

Cue

Retrieved

Working Memory

Episodic

Memory

Episodic

Learning

45

Episodic

Memory

Long
-
term Procedural Memory

Production Rules

Current Implementation

Encoding


Initiation


Content

Storage


Episode Structure

Retrieval


Initiation/Cue


Retrieval





The closest partial match is retrieved.

Input

Output

Cue

Retrieved

Working Memory

Episodic

Learning

46

Cognitive Capability: Virtual
Sensing


Retrieve prior perception that
is relevant to the current task


Tank recursively searches
memory


Have I seen a charger from here?


Have I seen a place where I can
see a charger?

?

47

Virtual Sensors Results

0
50
100
150
200
250
1
3
5
7
9
11
13
15
17
19
Subsequent Searches
Average Number of Moves
Average Random
Episodic Memory
48


Create
a memory
cue

East

South

North

Evaluate moving in each available direction

Cognitive Capability: Action
Modeling

49

Episodic

Retrieval


Retrieve
the best matching memory

Retrieve

Next Memory


Retrieve
the
next

memory


Use
the change in score to evaluate the proposed action

Move North = 10 points

Agent’s knowledge is insufficient
-

impasse

Agent attempts to choose direction

Episodic Memory:

Multi
-
Step Action Projection

[Andrew Nuxoll]


Learn tactics from prior success and failure


Fight/flight


Back away from enemy (and fire)


Dodging

-30
-20
-10
0
10
20
30
40
1
11
21
31
41
51
61
71
81
91
101
111
121
131
141
151
161
171
Margin of Victory

Successive Games

Average Margin of
Victory

Enables Cognitive Capabilities


Sensing


Detect Changes


Detect Repetition


Virtual Sensing


Reasoning


Model Actions


Use Previous
Successes/Failures


Model the Environment


Manage Long Term Goals


Explain Behavior





Learning


Retroactive Learning


Allows Reanalysis Given New
Knowledge



“Boost” other Learning
Mechanisms

Episodic Memory

51

Mental Imagery and Spatial Reasoning

Scott Lathrop

Sam Wintermute


See AGI Talks

52



Shape, color, topology, spatial properties




Depictive, pixel
-
based representations



Image algebra algorithms



Sentential/Algebraic algorithms



Depictive/Ordinal algorithms

VISUAL IMAGERY

VISUAL
-
SPATIAL

VISUAL
-
DEPICTIVE



Location, orientation




Sentential, quantitative
representations



Linear algebra and
computational geometry
algorithms

WHAT IS VISUAL IMAGERY?

53

Where can you put A next to I?

54

Spatial Problem Solving with Mental Imagery

[Scott Lathrop & Sam Wintermute]

Environment

Spatial Scene

Soar

Qualitative descriptions
of object relationships

Qualitative description of new objects in
relation to existing objects

Quantitative descriptions of
environmental objects



O


A


A’

A


(on
AI)

(
imagine_left_of

A I)

(intersect
A′ O)

(
no_intersect

A’)

(
imagine_right_of

A I)

(
move_right_of

A I)

I

Upcoming Challenges


Continued refinement and integration


Integrate with complex perception and motor
systems


Adding/learning lots of world knowledge

+
Language, Spatial, Temporal Reasoning, …


Scaling up to large bodies of knowledge


Build up from instruction, experience, exploration, …


56

Soar Community


Soar Website


http://sitemaker.umich.edu/soar


Soar Workshop every June in Ann Arbor


June 22
-
26, 2009


Soar
-
group


http://lists.sourceforge.net/lists/listinfo/soar
-
group


Low traffic

57

Thanks to

Funding Agencies:


NSF, DARPA, ONR

Ph.D. students:

Nate Derbinsky, Nicholas Gorski, Scott Lathrop, Robert
Marinier, Andrew Nuxoll, Yongjia Wang, Samuel
Wintermute, Joseph Xu

Research Programmers:

Karen Coulter, Jonathan Voigt

Continued inspiration:


Allen Newell

58

Challenges in

Cognitive Architecture Research


Dynamic taskability


Pursue novel tasks


Learning


Always learning, learning in unexpected and unplanned ways (
wild learning)


Transition from programming to learning by imitation, instruction, experience,
reflection, …


Natural language


Active area but much left to do.


Social behavior


Interaction with humans and other entities


Connect to the real world


Cognitive robotics with long
-
term existence


Applications


Expand domains and problems


Putting cognitive architectures to work


Connect to unfolding research on the brain, psychology, and the rest of AI.


60