Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Performance Assessment and System Design
in Human Robot Interaction
Sven Wachsmuth
Bielefeld University
May, 2011
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
„
... supercomputing power is already
on the order of estimated
human brain capacity,
but intelligent or human-simulating
machines do not yet exist ...“.
[futurememes.blogspot.com/2010/04]
What are the Flops of cognitive systems?
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
„…
[Richard Murphy] says,
They've designed the
benchmark [Graph 500]
to spur both researchers
and industry toward mastering
architectural problems of
next-generation supercomputers.“
Beyond Flops ...
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Limits of benchmarking
●
Evaluation and benchmarking is an inherently multi-
dimensional problem (how to define progress?)
●
Benchmarks significantly influence the design of
system architectures
●
Evaluation metrics do not necessarily make us aware
of architectural bottlenecks
●
Benchmarks do not capture the richness of
applications
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
[Perona, ICCV Workshop, 2007]
Benchmarks needs to be scalable
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Limits of offline datasets
●
Ground truth is not always easy to capture
●
Image datasets ignore the acquisition step (sensing)
●
Image datasets ignore the relevance of results
●
Offline processing ignores system aspects
●
Fokus on experimental studies
–
Need for live systems
–
Need for live users/interaction partners
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Human-Robot Interaction
●
Human-Robot Interaction scenarios
–
Home-tour (navigation tasks / Human initiative teaching)
–
Curious robot (manipulation tasks / Mixed initiative learning)
–
Museum guide (assistive tasks / Robot initiative explanation)
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Challenges in defining benchmarks
●
How to measure progress?
–
Multi-dimensionality
–
System complexity
–
Small datasets
●
How to define ground truth?
–
User behavior is highly variable
●
How to prevent architectural bottlenecks?
–
Tests are task and platform specific
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Evaluation criteria / interacting levels
●
Human:
–
User experience / User performance
●
System:
–
Task performance
●
Architecture:
–
Reliability / robustness
–
Simplicity
●
Components:
–
Accuracy / efficiency
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Overview of methodologies (each level)
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Interaction between levels
●
Systemic Interaction Analysis (SinA)
feedback
changes
component
changes
architectural
changes
define prototypical
script of task
identify deviation
pattern
identify causes for deviation patterns
(system and interaction level)
Estimate impact of deviation patterns
expectation-driven (based on video data)
annotation
& system logging
statistical
analysis
system analysis
judging results
Lohse, M., M. Hanheide, K. Pitsch, J. Rohlfing, Katharina, and G. Sagerer (2009).
”
Improving HRI design by applying Systemic Interaction Analysis (SinA)”, Interaction Studies (Special Issue: Robots in the Wild:
Exploring HRI in naturalistic environments), 10(3). John Benjamins Publishing Company, pp. 299-324.
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Statistical Analysis of ELAN files in
Matlab (SALEM)
Hanheide, M., M. Lohse, and A. Dierker (2010). “SALEM – Statistical AnaLysis of Elan files in Matlab”, Workshop on Multimodal Corpora:
Advances in Capturing, Coding and Analyzing Multimodality, 7th Intl.l Conf. on Language Resources and Evaluation (LREC), Malta, pp. 121-123.
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
How to measure progress?
●
Benchmarking questions:
–
Did the overall number of different than problem-related
tasks decrease?
–
Did the perceptage of time the users spent on problem-
related tasks (compared to social and functional tasks)
decrease?
–
Did the mean duration of problem-related tasks decrease?
–
Did the handling of problem-related tasks improve?
–
When did the problem-related tasks occur in the task
structure?
Siepmann, F., Lohse, M., & Wachsmuth, S., Towards robot architectures for user-driven system
design (in preparation).
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Challenges in defining benchmarks
●
How to measure progress?
–
Multi-dimensionality
–
System complexity
–
Small datasets
●
How to define ground truth?
–
User behavior is highly variable
●
How to prevent architectural bottlenecks?
–
Tests are task and platform specific
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Social cues in teaching scenarios
●
Valence judging by non-verbal cues (facial expressions)
●
Reduction to the evaluation of a single skill
●
How to provoke natural user behavior?
[Lang et al., ROMAN, 2009]
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Uncertain ground truth in HRI
●
Human judgements
(without sound)
●
44 judges,
88 video sequences of 11 subjects
success videos
failure videos
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Assistance in real applications
●
Supporting cognitively disabled persons in ADLs
(epilepsy, autism, learning disorders, hemiparesis)
●
Cooperation with Bodelschwingsche Anstalten Bethel
●
WOZ-Study (23 trials including 7 users): teeth cleaning
–
Feedback by audio/video prompts
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Individual reaction behavior
●
WOZ study: User reactions on prompts
–
wizard (WIZ) vs. caregiver (CG)
–
audio (A) vs. audio/video (A/V)
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Challenges in defining benchmarks
●
How to measure progress?
–
Multi-dimensionality
–
System complexity
–
Small datasets
●
How to define ground truth?
–
User behavior is highly variable
●
How to prevent architectural bottlenecks?
–
Tests are task and platform specific
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Scalability and transfer of system
frameworks and skills
interactive manipulation
service robotics
task assistence
tutoring,
receptionist,
motivation
DACS, ASR,
Dialog, ...
Multi-modal
anchoring,
Person attention,
...
XCF,
Active Memory, ...
Task state pattern,
Dialog framework,
...
BonSAI,
...
Social
feedback,
...
Working
memory,
...
Application
design,
...
1993
2011
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Scalability in competitions
●
Robocup@Home
Graz, 2009
Singapure, 2010
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
RoboCup@Home
Desired abilities
●
Navigation
●
Fast and easy setup
●
Object recognition
●
Object manipulation
●
Recognition of humans
●
Human robot interaction
●
Speech recognition
●
Gesture recognition
●
Robot applications
●
Ambient intelligence
Tests
●
Robot inspection
●
Follow me
●
Go get it
●
Who is who
●
Open challenge
●
Enhanced who is who
●
General purpose service robot
●
Shopping Mall
●
Demo challenge
●
Final
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
RoboCup@Home
Tests are not completely pre-specified ...
●
Open Challenge allows free performance
●
General Purpose Test includes task specification
●
Shoppingmall includes real unknown environment
●
Demo Challenge focusses on application domains
Points are given for (partial) task completion (time limit)
Judging is (partially) subjective!
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
System development is implicit part of
the competition
●
Team effort of 10-12 people
●
Major team change from
2009 to 2010
●
Large number of modules
●
Limited computing power
●
Prototyping of tasks
●
Short evaluation cycles
●
Robot needs to perform
instantly
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Conclusions
●
Benchmarking cognitive systems is inherently
multi-
dimensional
(there is no FLOPs measure)
●
Evaluation needs to be based on
live-systems
(performance is not characterized by offline error rates)
●
System frameworks and skills significantly profit from
transfer
to other scenarios and platforms
●
System integration and evaluation is
costly
(there is no free lunch)
●
Internal
system analysis
and external
interaction
analysis
needs to be coupled
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Conclusions
●
Benchmarking tasks
should not be overspecified
●
Human behavior
is shaped by the system response
(human input cannot be normed)
●
Ground truth
needs to be defined by the setup
(otherwise it might be ill defined)
●
Human behavior is
highly individual
(there is no „average user“)
●
Competitions
in HRI are inherently not completely
fair, but they are good for research
Applied Informatics, Faculty of Technology
& CITEC – Central Lab Facilities
Thanks to a lot of people ...
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment