Using Computational Cognitive Models for Better Human-Robot Collaboration

bouncerarcheryΤεχνίτη Νοημοσύνη και Ρομποτική

14 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

70 εμφανίσεις

Using Computational Cognitive Models
for Better Human
-
Robot Collaboration


Alan C. Schultz

J. Gregory Trafton

Nick Cassimatis



Navy Center for Applied Research in Artificial Intelligence

Naval Research Laboratory

Peer
-
to
-
peer collaboration in
Human
-
Robot Teams


Not interested in general, unified grand theory of
cognition for solving the whole problem


We already know how to be mobile, avoid collisions,etc


Approach: to be informed from cognitive
psychology


Study human
-
human collaboration


Determine important high
-
level cognitive skills


Build computational cognitive models of these skills


ACT
-
R, SOAR, EPIC, Polyscheme…


Use computational models as reasoning

mechanism on robot for high
-
level cognition

Cognitive Science as Enabler

Cognitive Robotics


Hypothesis:


A system using human
-
like representations and
processes will enable better collaboration with people
than a computational system that does not


Similar representations and reasoning mechanisms make it
easier for humans to work with the system;
more compatible


For close collaboration, systems should act “naturally”


i.e. not do something or say something in a way that detracts
from the interaction/collaboration with the human


Robot should accommodate humans; not other way around


Solving tasks from “first principles”


Humans are good at solving some tasks; let’s

leverage human’s ability

Cognitive Skills


Appropriate knowledge representations


Spatial representation for spatial reasoning


Adapting representation to problem solving method


Problem solving


Navigation routing with constraints (e.g., remaining hidden)


Learning


Learning to recognize and anticipate others’ behaviors


Learning characteristics of other’s capabilities


Vision


Object permanence and tracking (Cassimatis et al., 04)


Recognizing gestures


Natural language/gestures
(Perzanowski et al., 01)

Cognitive Skills


Perspective
-
Taking


Spatial
(Trafton et al., 2005)


Social
(Breazeal et al., 2006)


Spatial reasoning


People use metric information implicitly; use and think
qualitatively much more frequently
(Trafton et al., 2006)


Spatial referencing/language
(Skubic et al., 04)


Temporal reasoning


Predicting how long something will take


Anticipation


What does a person need and why?

Hide and Seek

(Trafton & Schultz, 2004, 2006)


Lots of knowledge about space required



A “good” hider needs
visual, spatial perspective
taking

to find the good hiding places (large
amount of spatial knowledge needed)



Development of Perspective
-
Taking


Children start developing (very very basic)
perspective
-
taking ability around age 3
-
4


Huttenlocher & Presson, 1979; Newcombe &
Huttenlocher, 1992; Wallace, Alan, & Tribol, 2001


In general, 3
-
4 year old children do not have a
particularly well developed sense of

perspective taking

Case Study: Hide and Seek

Age 3½

Game Num
b
er

Hiding

L
o
cation

Hiding

Type

1

eyes
-
closed

can't

s
ee

m
e

if
I

can't
s
ee

you

2

out
-
in
-
open

understanding

rules of game

suggestion

don't

hide

out

in

the open


3

under piano

under

4

in laundry room

containment

(room)

break



5

under piano

un
der

6

in laundry room

containment

(room)

7

in bathroom

containment

(room)

8

in her room

containment

(room)

9

under chair

under

10

behind

bedr
o
om
door

containment

o
r

behind

11

under chair

under

12

under covers

under

o
r
co
n
tainment

13

under covers

un
der

o
r
co
n
tainment

14

in bathroom

containment

15

under

glass

co
f
fee table

under


Elena did not have perspective taking ability


Left/right errors


play hide and seek by learning pertinent
qualitative features of objects


construct knowledge about hiding

that is object
-
specific

Hide and Seek Cognitive Model


Created cognitive model of Elena learning to play hide and
seek using ACT
-
R (Anderson, et al 93, 95, 98, 05)


Correctly models Elena’s behavior at 3½ years of age


Learns and refines hiding behavior based on interactions
with “teacher”


Learns production strength based on success and failure of hiding
behavior


Learns ontological or schematic knowledge about hiding


Its bad to hide behind something that’s clear


Its good to hide behind something that is big enough


Knows about location of objects (relative) (behind, in front of) adds
knowledge about relationships. Model only has syntactic notion of
spatial relationships

Hybrid Cognitive/Reactive Architecture

Robot Hide and Seek

QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Using cognitive model of hiding
(after learning) in order to reason
about what makes a good hiding
place in order to seek.

Computational cognitive model of
hiding makes deliberative (high
-
level cognitive) decisions. Models
learning.


Reactive layer of hybrid model for
mobility and sensor processing

QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
How important is perspective taking?

(Trafton et al., 2005)


Analyzed a corpus of NASA training tapes


Space Station Mission 9A


Two astronauts working in full suits in neutral
-
buoyancy
facility. Third, remote person participates.


Standard protocol analysis techniques; transcribed 8
hours of utterances and gestures (~4000 instances)


Use of spatial language (up, down, forward, in
between, my left, etc) and commands


Research questions:


What frames of reference are used?


How often do people switch frames of reference?


How often do people take another person’s perspective?

Spatial language in space

Results

Frame of Reference

Example

% Utterances

Exocentric

Go straight zenith (“up”)

7%

Egocentric

Turn to my left

15%

Addressee
-
Centered

Turn to your left

10%

Deictic

Put it over there [Points]

5%

Object
-
centered

Put it on top of the box

63%


How frequently do people switch their frame of reference?


45% of the time (Consistent with Franklin, Tversky, & Coon, 1992)


How often do people take other people’s perspective (or
force others to take theirs)?


25% of the time

Perspective Taking and
Changing Frames of Reference

QuickTime™ and a DV/DVCPRO - NTSC decompressor are needed to see this picture.

Notice the mixing of perspectives: exocentric (down),
object
-
centered (down under the rail), addressee
-
centered (right hand), and exocentric again (nadir) all in
one instruction!


Notice the “new” term developed collaboratively:
mystery hand rail

Bob, if you come
straight down

from where you are, uh, and uh
kind of peek
down under the rail

on the
nadir side
, by
your right
hand
, almost
straight nadir
, you should see the uh…

Perspective Taking and Changing
Frames of Reference

Perspective Taking


Perspective taking is critical for collaboration.


How do we model it? (ACT
-
R, Polyscheme…)


I’ll show several demos that show our current
progress on spatial perspective taking


But first a scenario:

“Please hand me the wrench”

Perspective taking in
human interactions


How do people usually resolve ambiguous
references that involve different spatial
perspectives? (Clark, 96)


Principle of least effort (which implies least joint effort)


All things being equal, agents try to minimize their effort


Principle of joint salience


The ideal solution to a coordination problem among two or more
agents is the solution that is the most salient, prominent, or
conspicuous with respect to their current common ground.


In less simple contexts, agents may have to work harder to
resolve ambiguous references

Configural

- Navigation
Focal
-object identification
Manipulative
- grasping & tracking
Perspective Taking:

A tale of two systems


ACT
-
R/S (Schunn & Harrison, 2001)


Our perspective
-
taking system using ACT
-
R/S is
described in Hiatt et al. 2003


Three Integrated VisuoSpatial buffers


Focal: Object ID; non
-
metric geon parts


Manipulative: grasping/tracking; metric geons


Configural: navigation; bounding boxes


Polyscheme (Cassimatis)


Computational Cognitive Architecture where:


Mental Simulation is the primitive


Many AI methods are integrated


Our perspective
-
taking using Polyscheme is
described in Trafton et al., 2005

Robot Perspective Taking

QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
Human can see one cone

Robot can sense two cones

QuickTime™ and a
Cinepak decompressor
are needed to see this picture.
(Fong et al., 06)

Summary


Having similar or compatible representation and
reasoning as a human facilitates human
-
robot
collaboration


We’ve developed computational cognitive
models of high
-
level human cognitive skills as
reasoning mechanisms for robots


Open questions:


Scale up; combining many such skills


What are the important skills?


Which skills are built upon others?

Shameless Advertisement


ACM/IEEE Second International Conference
on Human
-
Robot Interaction


Washington DC, March 9
-
11, 2007


With HRI 2007 Young Researchers Workshop,
March 8, 2007


Single track, highly multi
-
disciplinary


Robotics, Cognitive Science, HCI, Human factors,
Cognitive Psychology…


Submission deadline: August 31, 2006


www.hri2007.org

A Dynamic Auditory Scene


Everyday Auditory Scenes are
VERY Noisy


Fans


Alarms/Telephones


Traffic


Weather


People


Auditory Perspective Taking


Information Kiosk


Robot uses speech to relay
information to an interested
human listener.


Given the auditory scene, can
the person understand what the
robot is saying?


If not, what actions can the robot
take to improve intelligibility and
knowledge transfer?

Allow a robot to use its knowledge of the
environment, both
a priori

and sensed, to predict
what human can hear and effectively understand.


Stealth Bot


Robot uses its awareness of the
auditory environment to hide
from people and or machines.


The robot knows its own acoustic
signature


Now predict how each action or
location will be heard by the
listener, and select the best
choice.

An Example of Adaptation:

Robot Speech Interface


Adjust word usage depending on noise levels


Use smaller words with higher recognition rates.


Ask questions to verify understanding; repeat yourself.


Change the quality of the speech sounds


Adapt voice volume and pitch to overcome local noise
levels (Lombard Speech).


Emphasize difficult words.


Don’t talk during loud noises


Reposition Oneself


Vary the proximity to the listener


Face the listener as much as possible


Move to a different location if all else fails.

Information Kiosk


Overhead Microphone Array


Tracks local sound levels


Localizes interfering sources


Guides the vision system to new users


Stereo Vision


Tracks the users position in real
-
time.


Actions


Raise speaking volume relative to users
distance and the level of ambient noise


Pause during loud sounds or speech
interruptions.


Rotate the robot to face users


Reposition the robot if noise levels become
too large.

Acoustic Perspective


Noise Maps


Combine Knowledge of Sound
Sources to Build Maps


Measured Volume/Frequency Levels


Source Locations/Directionality


Walls and environmental features


Multiple maps can be built and

combined in real
-
time


Modifying action based on noise map


Seeking noisy hiding places so that it can best
observe its target without being detected.


masking its particular acoustic signature.

After exploring the area inside the
square, 3 air vents are localized by
the robot


4 Sources are combined together
as omnidirectional sources,
without environmental reflections.