Using a simulated user to explore human robot interfaces

embarrassedlopsidedAI and Robotics

Nov 14, 2013 (3 years and 6 months ago)


Simuser to explore HRI



robot interfaces (HRI) can be difficult to use. We examine urban search rescue robots (USR) as an
We present here a theory of their use based on a simulated user written in the ACT
R cognitive
modeling language. The model, using a simulated eye and hand, interacts directly with an unmodified and
simple tele
operating task of maneuvering in an environm
ent to avoid other moving objects. The model user
also performs a secondary task. In addition to describing the knowledge the human operator must have, as
well as what aspects of the task will be difficult for the operator, the model makes quantitative p
about how the speed of the robot influences the quality of the navigation and performance on the secondary
task. These results are examples of the types of outputs available from a model user. As the model now
interacts with the USR simulator
using only the bitmap, the model should be widely applicable to testing other
simulators and to actual robots. The model already suggests why human
robot interfaces are difficult to use
and where they can be improved.

Index Terms

cognitive model, ACT
, human
robot interfaces



In the future, it might be that robots will become completely autonomous and will act largely independent.
However, such a level of independence has not yet been achieved and is in some cases simply undesirable.
any of the tasks that robots face today like exploration, reconnaissance, and surveillance, will continue to
require supervision
. Furthermore, people often do not have enough confidence in a completely
autonomous robot to let it operate independently. So it seems that the level to whi
ch the use of robots will be
integrated in our society, will be largely dependent on the robots ability to communicate with humans in
understandable and friendly modalities

Despite its importance, a general theory of human
robot interface use seems to be lacking. Many human
interfaces do not even respect the most fundamental HCI principles. In this paper, we will pre
sent the
beginnings of a theory that indicates the issues that make human
robot interfaces difficult to use.
Concurrently, we will present a quantative tool in the form of a simulated user that can be used to identify
problems associated with human
interface use. Specifically, we introduce a methodology in which a
cognitive model autonomously exercises human
robot interfaces, indicating ways to improve the interface and
laying bare problems that can serve as starting points for a general theory of h
robot interface use.

One of the reasons that there does not seem to be a general theory of human
robot interface use is the
complexity of the task domain, which is reflected in the diversity in types of human
robot interactions. An
application that i
llustrates this well is that of robot assisted Urban Search and Rescue (USR). USR involves the
detection and rescue of victims from urban structures like collapsed buildings. Because of the extreme physical
and perceptual demands of USR, these applicatio
ns are usually mixed
initiative human
robot interactions, in
Using a simulated user to explore human robot

Member, IEEE
, and R. ST.
Member, IEEE

Manuscript for Special Issue on Human
Robot Interaction

IEEE Transactions on Systems, Man and Cy

Simuser to explore HRI


which a human operator and a robot interact in some manner to produce adequate performance
. This
means that it might be optimal for the robot to exhibit a fair amount of autonomy in some situations, for
instance, in navigating in a confined space using its own sensors. However, other
situations might require
human intervention: An operator may have to assist in freeing a robot because its sensors do not provide
enough information for autonomous recovery
. And yet further interventions, some only imagined, such as
providing medication to trapped survivors, will legally require human intervention. This illustrates how in the
case of enhanced robot autonomy, the role of the operator could often shift between control to monitoring
and diagnosis

There are several reasons why principles from HCI are missing from many human robot systems. First of all,
the task domain of human
robot systems is more complex and

diverse, making it very hard to meet the needs
of diverse users or come up with a general metaphor. Furthermore, these systems are typically more expensive
than regular commercial software packages. At the same time, they are not built as often as regul
ar software
and when they are built, it is usually not by people trained in HCI. Currently, USR robots are directly driven by
operators. As they become more autonomous, these problems will become more complex. What is needed is
a way to test and improve

these interfaces.



In this section, we will introduce a cross
platform architecture in which a cognitive model simulates user
performance. Specifically, we will introduce a simulated user, consi
sting of a cognitive model and a pair of
simulated eyes and hands that can be applied to sample human
robot interfaces (or with additional knowledge
any other interface for that matter). Ultimately, the intention is to provide a quantative tool to guide t
he design
process of human
robot interfaces. This tool will enable designers to apply psychological theories in real time,
providing a simulated user that acts like and interacts with the same interface as a real user.

A cognitive model forms the cognitio
n of our simulated user. A cognitive model is a theory of human
cognition realized as a running computer program. It produces human
like performance in that it takes time,
commits errors, deploys strategies, and learns. It presents a means of applying c
ognitive psychology data and
theory to HCI problems in real
time and in an interactive environment
. We have developed a system
consisting of the cognitive architecture ACT

and a simulated eyes and hands suite called Segman

can be applied to virtually any type of interface running on any operating system. We will begin by describing
e parts that make up the system and then provide a demonstration. Subsequently, we’ll discuss how this
system can be applied as a simulated user to explore human
robot interaction, and how it supports
explanations of user’s behavior and evaluation of inte


R architecture

R architecture integrates theories of cognition
, visual attention
, and motor movement
. It
has been applied successfully to higher
level cognition p
henomena, such as modeling scientific reasoning
differences in working memory
, and skill acquisition

to name but a few. Rec
ently it has been applied
successfully to a number of HCI issues


R makes a distinction between two types of long
knowledge, declarative and procedural. Declar
ative knowledge is factual and holds information like “2 + 3 =
5” or “George Bush is the president of the USA”. The basic units of declarative knowledge are chunks, which
are schema
like structures, effectively forming a propositional network. Procedural

knowledge consists of
production rules that encode skills and take the form of condition
action pairs. Production rules correspond to
specific goals or sub
goals, and mainly retrieve and change declarative knowledge.

Besides the symbolic procedural and

declarative components, ACT
R also has a sub
symbolic component that
determines the use of the symbolic knowledge. Each symbolic construct, be it a production or chunk, has sub
Simuser to explore HRI


symbolic parameters associated with it that reflect its past use. In this wa
y, the system keeps track of the
usefulness of the symbolic information. Which information is currently available in the declarative memory
module is partially determined by the odds that a particular piece of information will be used in that context.

An important aspect of the ACT
R architecture is that models created in it predict human behavior
qualitatively and quantative: Each covert step of cognition (production firing, retrieval from declarative
memory, procedural knowledge application) and ov
ert action (mouse
click, moving visual attention) has
latencies associated with them that are based on psychological theories and data. For instance, taking a
cognitive action, firing a production rule, takes 50 ms (modulated by other factors such as prac
tice), and the
time needed to move a mouse is calculated using Fitts law (e.g.,
). In this way, the system provides a way to
apply psychological knowledge in real


The perceptual
motor buffers

A schematic of the current implementation of the theory, ACT
R 5.0 (
R_5.0), is shown
in Figure 1. At the

heart of the architecture is a production system, which represents central cognition and
interacts with a number of buffers. These buffers represent the information that the system is currently acting
on: The Goal buffer contains the present goal of the

system, the Declarative buffer contains the declarative
knowledge that is currently available, and the perceptual and motor buffers indicate the state of the perceptual
and motor module (busy or free, and their contents). The communication between centr
al cognition and the
buffers is regulated by production rules. As mentioned, production rules are condition
action pairs: The first
part of a production rule, the condition
side, typically tests if certain declarative knowledge (in the form of a
chunk) is

present in a certain buffer. The second part, the action side, then sends a request to a buffer to either
change the current goal, retrieve knowledge from a buffer such as declarative memory, or perform some action.

The perceptual and motor buffers allow

the model to “look” at an interface and manipulate objects in that
interface. The perceptual buffer builds a representation of the display in which each object is represented by a
feature. Productions can send commands to the perceptual buffer to direct

attention to an object on the screen
and create a chunk in declarative memory that represents that object and its location on the screen. The
production system can then send commands, initiated by a production rule, to the motor buffer to manipulate
e objects.

Central cognition and the various buffers run in parallel with one another, but each of the perceptual and
motor buffers is serial (with a few rare exceptions) and can only contain one chunk of information. This means
that the production system

might retrieve a chunk from declarative memory, while the perceptual buffer shifts
Figure 1. ACT
R 5 system diagram. The production
system and buffers run in parallel, but each
component is itself serial. The graded areas
indicate the novel functionality provided by
SEGMAN that overrides the
original perceptual

motor functionality of ACT
R 5, which is indicated
by the dashed lines.

Simuser to explore HRI


attention and the motor buffer moves the mouse. We will mainly concentrate on the motor and perceptual
buffer, which are most relevant for our purpose.


Segman and ACT


R 5 in its current release ( interacts with interfaces using a Perceptual
Motor buffer

includes tools for creating interfaces and annotating existing interfaces in
Macintosh Common Lisp so that models can see and interact with objects in the interface. This allows most
models to interact i
n some way with most interfaces that are written in that language, and to let all models
interact with all interfaces written with the special tools.

For our simulations, we developed a more general version of ACT
R/PM, which provides ACT
R 5 direct

to an interface, thus removing the need for a specific interface creation tool. This is done by extending
R/PM with the Segman suite (

As Figure 1 shows, Segman


level input from the screen (i.e., the screen bitmap), runs the bitmap
through image processing algorithms, and builds a structured representation of the screen. This representation
is then passed to ACT
R through the ACT
R/PM theory of visual perc
eption (i.e. perceptual buffer). ACT
R/PM moderates what is visible and how long it takes to see and recognize objects. Segman can also generate
mouse and keyboard inputs to manipulate objects on the screen. This functionality is called through the ACT
R/PM theory of motor output, but we have extended the output results to work with any Windows interface.
This is done by creating very primitive events (click icon, select button, etc), which are implemented as
functions at the operating system level. As

such, they are indistinguishable from human
generated events.
Currently, we have a fully functional system that runs under Windows 98 and 2000.






We will now describe an implementation of our system called DUMAS (pronounced

doo ‘maa]
, see also
), which stands for Driver User Model in ACT
R & Segman. DUMAS drives a car in a Java
implemented game, which was downloaded from
. For the
simulations reported below, no changes were made to the game.

We choose the 3D driver game for several reasons. First, it
has a direct interface, in that the operator directs
the car u
sing the keyboard. This perspective is often referred to as “inside
out” driving, because the operator
feels as if she is inside the vehicle and looking out, and is a common method for vehicle or robot tele
. Second, driving behavior is a prototypical example of real
time, in
teractive decision making in an
interactive environment


and is as such is comparable to many tele
operated robot tasks. The source
code is extensible, which means aspects of the environment (e.g., slow or fast driving) and interface can be
ted (e.g., bigger or smaller buttons), in a controlled fashion. Because the code is Java, this can be done
on multiple platforms. And finally, because we did not write it, it helps to show the generality of this approach.

Models of driving have been targ
ets of research for decades (the analysis of Gibson and Crooks in 1938
provides one of the earliest examples
; see Bellet and Tattegrain

for a concise historical overview
from a cognitive ergonomics perspective.) The hierarchical risk model of van der Molen and Botticher is a
representative example of recent models
. Driving can be seen as structured into strategic, tactical and
operational levels. Moving up the hierarchy, each level describes an incr
easingly abstract set of behaviors that
govern choices at the level below it.

At the strategic level, planning activity takes place, such as the choice of
route and travel speed. At the tactical level decisions encompass more concrete, situation

such as lane changing, passing, and so forth. The operational level describes skilled but routine activities, such
as steering and acceleration.

Simuser to explore HRI


The different levels of abstraction represent different demands on the cognitive, perceptual, and mo
tor abilities
of the driver. For example, feedback from assistive technology such as ABS or power steering is provided at
the operational level through haptic channels, often imperceptibly. Feedback for travel speed, in contrast,
requires some cognitive
activity at the strategic level, to interpret speedometer readings. If the feedback
channels from these different activities were reversed (e.g., if the driver had to interpret a numerical value to
determine power steering assist), their usability would b
e seriously impaired. Many task domains in HRI, in
particular urban search and rescue, share this layered structure.


Current implementation of the model

We set out to let the model perform some standard tasks, like staying on track, avoiding traffic and

or decreasing speed. At this point, the model can start the game by clicking the mouse on the game window,
accelerate by pushing the “A” key, brake by pushing the “Z” key, and steer by using the left and right arrow

Perceptual processi
ng in the model is based on observations from the literature on human driving, as is
common for other driving models
. Land and Horwood's

study of driving behavior describes a "double
del" of steering, in which a region of the visual field relatively far away from the driver (about 4 degrees
below the horizon) provides information about road curvature, while a closer region (7 degrees below the
horizon) provides position
lane informa
tion. Attention to the visual field at an intermediate distance, 5.5
degrees below the horizon, provides a balance of this information, resulting in the best performance.

The visual interface of the 3D Driver Game, which is the same interface the model us
es, is shown in Figure 2.
The default procedure for perception in the model is as follows. The model computes position
information by detecting the edges of the road and the stripes down the center of the road. The stripes are
combined into a sm
oothed curve to provide the left boundary of the lane, while the right edge of the road
directly gives the right lane boundary. The model computes the midpoint between these two curves at 5.5
degrees below the horizon. This point, offset by a small amoun
t to account for the left
hand driving position
of the car, is treated as the goal. If the center of the visual field (the default focal point) is to the right of this
point, the model must steer to the right, otherwise to the left.

Perceptual processing
in the model has limitations. For example, it is not entirely robust: Determining the
center of the lane can break down if the road is curving off too fast in one direction. Segman can also return
some of the information that it has extracted. For examp
le, it can determine road curvature from more distant
points, as is done in models of human driving
. However, this has not yet led to imp
roved performance in
this simulation environment.

In its current form, the model has problems staying on the road because visual cognition is not yet perfect.
The amount of change in speed depends on how the visual environment

is changing. At the moment, the
2: Two snapshots of the
driving environment.

Simuser to explore HRI


model takes snapshots of the whole visual scene to determine its actions. It detects changes by recording the
locations of specific points in the visual field and then measuring the distance they move from one snapshot to

the next. It turns out that this is not a good way to handle visual flow: Suppose that at time

the model
analyzes the road, records the data for estimating visual flow, and determines that steering one direction or
another is appropriate. At time

some steering command is issued, and the simulated car moves in that
direction. At time

or later the road is again analyzed so that flow can be computed, but at this point the
action of the model resulted in changes in the visual field, independent of
changes that would have occurred
otherwise. This contribution needs to be accounted for, or the car might end up braking every time it steers.

The model still represents a rather restricted model of driving behavior. Whereas previous driving models use
more than 40 rules
the complete behavior of DUMAS is currently determined by only 20 productions
rules. Foremost, this reflects that the production system of DUMAS does not yet use the full range of
motor capabilities offered by the ACT
R architecture through the

R/PM theory.
Nevertheless, the demonstrations below will illustrate that even with a relatively simple ACT
R model, the
current model already demonstrates some of its capabilities and produces behavior that is fully in line with
more established mode
ls of driving.


Two demonstrations

We provide two example analyses of the 3D driver game interface. This first one assesses the influence of
speed on the ability to drive, the second examines how multi
tasking influences driving. These demonstrations
e really proofs of existence. They are examples of the type of measures that would be helpful in testing and
designing more advanced human
robot interfaces. In order to increase the realism of our simulation, we will
need to expand the perceptual
motor c
apabilities of the ACT
R model. However, even though the model at
this point only simulates constraints in cognitive functioning, it is able to simulate realistic driving behavior.
Figure 3 shows a screenshot of the desktop during a simulation run. On
the left, is a GNU Emacs window, in
which a trace of the cognitive model appears (which we discuss later). The right top half of the figure shows
the debug window of the Allegro Lisp package.

DUMAS starts the game autonomously by clicking on the game wind
ow shown in the right bottom of the
figure. Note that the model “knows” where the gaming window resides on the desktop. By limiting the
attention of the model to the position and dimensions of the game, we create a virtual bounding box on the

Next, the model accelerates, drives at a constant speed and slows down if necessary (e.g. in a strong curve).
Because at this point the model cannot pass, a run typically ends when the model hits traffic in its own lane.
can also end when it commits an error and runs off the road. Figure 4 shows the results of the model
Figure 3: Screen capture showing a
GNU Emacs window on the left, an
Allegro Lisp window in the top right
corner and the driving game in the
bottom right corner.

Simuser to explore HRI


simulation on two dependent measures: Lateral deviation, which shows the position of the car with respect to
the center of the right lane and total drivi
ng time in minutes. These should be seen as example measures. The
model and architecture can provide other measures such as working memory load, spare capacity in the
processor and learning.


In the speed demonstration, DUMAS completed three sets of

10 runs, one at low, one at medium and one at
high speed, in the 3D Driving Game. We looked at the influence of speed on total driving time and the
amount of lane deviation, which reflects the ability of the model to stay on its ideal line of driving. L
deviation is commonly used in driving studies to measure the influence of factors such as multi

drug use
. Figure 4 summarizes the results.

Lane deviation
Total Driving Time

The left panel of Figure 4 shows the model predicts that average lane deviation will increase as speed increases,
which is in line with experimental d
ata and previous models
. The model needs a certain amount of time to
update its representation of the environment, mainly determined by constraints build into the ACT
R model.
As a result, the distance between steering adju
stments increases as speed increases thus leading to larger lane

The right panel of Figure 4 shows how total average driving time, measured in minutes, drops significantly as
speed increases. The explanation for this is made clear by the type
of errors in the first condition: Because
more distance passes between two steering adjustments, the chance of accidents also increases. In the Slow
condition, DUMAS only had 3 accidents, compared to 7 and 10 in the Medium and Fast conditions


In the multi
tasking demonstration, we illustrate how dividing the model’s attention produces the same effect
as increasing speed. In essence, the model’s performance is determined by the speed and accuracy with which
it reacts and adapt
s to the environment. As a consequence, anything that diverts attention from driving will
Slow Medium Fast

Slow Medium Fast

Figure 4. Speed Demonstration: Lane deviation (in degrees) and total driving time (in minutes) of
DUMAS in function of speed. Slow corresponds to a driving speed within the range of 15
medium 20
25, and fast 30
35 as measured on the spedometer in the simulation.

Simuser to explore HRI


affect performance. More precisely, the time between updating moments will increase, leading to behavior that
is less adapted to the environment.

To simulate the in
fluence of anxiety as a dual
task, we added useless knowledge to the system designed to
interfere with driving. Specifically, we added to the model’s procedural knowledge simple rules that can fire
any time while the model is driving.
This simulates the i
nfluence of distracting thoughts, as well as the effects
of reduced working memory: D
ue to the serial nature of rule
firing in ACT
R, whenever one of the useless
rules fires, it results into a slowing down of the execution of the relevant driving producti
ons. As a result,
performance will be more error prone (for related work,

Lane deviation
Total Driving Time

We compared the slow speed condition from the speed demonstration (Standard) to a condition in which the
model drove at the same speed but was bothered by “obtrusive” thoughts (Worried). Figure 5 shows the result
for the same set of dependent measures.

The left panel shows the model predicts that average lane deviation
increases when the model is worried. This confirms data generated with more complex driver models that
show how a secondary task affects performance
. The second measure further confirms this. The right
panel of Figure 5 shows how total average
driving time, measured in minutes, drops significantly in the Worry
condition due to an increase in the number of accidents.

A very useful aspect of the ACT
R model is that it also generates a protocol of behavioral output, illustrating
how separate parts
of a complex behavior like driving unfold over time. Figure 6 depicts a test run of the
model, starting with the “go” production and ending with a crash. For illustrative purpose, we chose a
particularly short run. As you can see, the protocol indicates

what behavior (steering, cruising, “thinking about
the World Cup” as worry) is taking place at what time. This protocol gives insight into the behavior, in that it
shows the sequence and timing of behavior and can also indicate critical points in a behav
ior. Furthermore, it
can be compared to the behavior of human subjects as a further validation of the model, or to gain further
insight into a complex behavior such as driving.


Subtasks in human
robot interaction.

Even though DUMAS is still in its beg
inning stages and more work needs to be done, it already illustrates many
of the issues that a theory of human
robot interface use will have to face. More specifically, it allows
identifying a set of subtasks that appear relevant to human
robot interface
use. What are these?

Standard Worried



Figure 5: Lane deviation (in degrees) and total driving time (in minutes) of DUMAS in
the Standard and Worried condition.

Simuser to explore HRI


Figure 6. Protocol generated by DUMAS during

a run in the multi
tasking demonstration.

1. Visual orientation
: Visual input is undoubtedly the most important source of information in driving
Nevertheless, the hu
man visual system seems badly equipped for a task like driving: We only see sharply in a
small center of the visual field; acuity drops significantly towards the periphery. As a result, eye movement, in
the form of saccades, is needed to construct an int
egrated field of vision for larger scenes. To accomplish this,
a driver needs a theory of where to look, and what features are important in the visual field.

Time 0.000: Go Selected


Time 0.050: G
o Fired

Time 0.050: Perceive
Environment Selected

Time 0.100: Perceive
Environment Fired

Time 0.100: Decide
Action Selected

Time 0.150: Decide
Action Fired

Time 0.150: Steer
Right Selected

. . .

GOING TO THE RIGHT>>>>>>>>>>>>>>>>

Time 0.350:
Right Fired

Time 0.350: Perceive
Environment Selected

Time 0.400: Perceive
Environment Fired

Time 0.400: Decide
Action Selected

Time 0.450: Decide
Action Fired

Time 0.450: Cruising Selected


Time 0.500: Cruising Fired

e 0.500: Perceive
Environment Selected

Time 0.550: Perceive
Environment Fired

Time 0.550: Decide
Action Selected

Time 0.600: Decide
Action Fired

Time 0.600: Steer
Left Selected

<<<<<<<<<<<<<<<GOING TO THE LEFT

Time 0.650: Steer
Left Fired

Tim .0.650: Perceive
Environment Selected

Time 0.700: Perceive
Environment Fired

Time 0.700: Decide
Action Selected

Time 0.750: Decide
Action Fired

Time 0.750: Thinking about wc Selected

“Thinking about the World Cup”

Time 0.800: Thinking abo
ut wc Fired

Time 0.800: Perceive
Environment Selected

Time 0.850: Perceive
Environment Fired

Time 0.850: Decide
Action Selected

Time 0.900: Decide
Action Fired

Time 0.900: Cruising Selected


Time 2.400: Cruising Fired

Time 2.400: Pe
Environment selected

Time 2.450: Perceive
Environment Fired

Time 2.450: Crashing Selected


<<<Writing data to data.txt>>>>>>

Time 2.500: Crashing Fired

The domain of visual orientation is probably

the place where collaboration between human and robot will be
most intense. Robots continue to be poor at high
level perceptual functions, like object recognition and
situation assessment
, which means the human operator will still play an important role
. However,
USR applications ha
ve illustrated that it is often not an easy task for an operator to infer complex features from
certain environments (e.g. hot spots, cavities, and voids). For instance, when maneuvering through a small and
dark shaft, it is difficult to discern any featu
res. As a result, it becomes hard to perform certain tasks like
identifying victims
. Furth
ermore, the human operator can remove ambiguities that arise from the limited
visual capabilities of robots, by providing information that enables the visual system to adapt to the situation at

2. Speed control and steering:
Based on information coming from the visual system, the model has to decide which
speed would be appropriate. This means the user model has to have a theory that determines optimal speed in
a gi
ven situation. Once the optimal speed is determined, motor procedures have to be performed to
manipulate the appropriate controls. So the process of controlling speed will be determined to great extent by
the constraints of other parts of the system, inc
luding control by the operator.

A constraint that typically arises in tele
operated navigation is that of communications time lag, which occurs
often as a result of navigating quickly. Our speed and multi
tasking demonstrations already indicated how
rtant it is to keep time intervals between updating moments as short as possible. In USR applications, the
time that passes between operating the controls on an interface and the robot actually reacting, creates an
additional communications time lag. The

impact of this additional constraint can be routinely added to our
system. This model can start to help quantify tradeoffs between speed and accuracy.

Simuser to explore HRI


The occurrence of course corrections is another interesting and unexpected parallel between our simple

game demo and USR applications. In the driving game, situations would occur in which the system would
“overreact” to a change in the environment, usually a curve, and would brake to a complete halt. Subsequently,
the system would go in a recurri
ng sequence of over
corrections and the run had to be either terminated or the
model had to be helped by the human simulator. The same occurs in USR applications when the operator
overshoots a desired location, sending the robot into the same type of “rep
etitive cycle of over
This is a known problem in human operators controls

It is thus pleasing and useful to see a
simulated user exhibit the same maladaptive behavior.

3. Multi
: Most process models of driving use very
efficient and continuous processes to perceive the
environment and control the car
. In contrast, our implementation uses a discrete updating mechanism,
reflecting the discrete nature of the ACT
R model in which each step takes a fixed amount of time. This has
consequences. First, our model does not produce optimal behavior but rather aims at simulating
human behavior. Specifically, its performance is determined by the speed and accuracy with which it reacts and
adapts to the environment. Second, our model al
lows assessing the influence of a secondary task, in other
words, the effects of multi
tasking (see also
). If the model has to divide its attention between two tasks,
each of those tasks will suffer. Specifically, the time between updating moments might increase, leading to
poorer performance.


system allows exploring the relationship between the attentional constraints of the human operator, which
are captured by the cognitive model, and the operated object, be it a simulated car or robot. As such, it is
perfectly suited to explore the operati
on of multiple robots through one user interface. It becomes possible to
map the resources (attention, visual and motor capabilities) of the human operator to the performance of the
robots in real
time. Our system can indicate in a quantative way, which
adjustments to the interface would lead
to better performance. As such, it could also indicate at what point robot autonomy could effectively
compensate for human operator constraints.

Some aspects of human
robot interaction have not yet been directly add
ressed in the driving game, but are
important enough to be included in future simulations. These are:

4. Navigation:
A problem that arises with direct interfaces is that they provide poor contextual cues, which leads
to less situation awareness. Using o
ur system, it is possible to investigate how map building by the user is
related to aspects of the user interface and characteristics of the user (expert, novice, high of low working
memory capability and so on).

5. The influence of the user interface on p
In human
robot interactions, the interface plays a secondary role.
The user usually aims at completing one or more primary tasks and uses the interface to achieve her goals. The
present approach is ideally suited to explore how changes in the

interface will affect performance of the user on
the primary tasks.

6. The level of expertise of the user
. The time to learn an USR is an important factor affecting their uptake
. By
varying the knowledge of the task in the cognitive model, one could vary the degree of skill or expertise of the
user and see how this affects
performance. It affects the design of the interface as well, as one would like
experts to have great flexibility and control, while novices should be guarded from making large and costly
mistakes. By using the current system, one could explore the relati
onship between user interface design and
level of performance in a direct and quantative way.

Simuser to explore HRI






Our model that interacts with a simulated USR
type task provides several suggestions
about what makes
robot interfaces difficult to use. These implications arise from each aspect of the model. Several
problems in perception and eye
hand coordination arise from the simulated eye and hand. Addition problems
can be noted from the cog
nitive model of this task. The cognitive modeling language, as it implements a
unified theory of cognition, makes several further suggestions.

The cognitive model of the USR tasks predicts that USR tasks are difficult because they contain several diffic
subtasks, and because the tasks interact. The model currently has to do several tasks at once, driving, noticing
objects, and so on. These tasks alone are not difficult, but they interact with each other, competing to use the
same resources, rule fir
ings and buffer contents. If these tasks were in different modalities, they would degrade
each other’s performance less. For example, having the model notice trees by saying something would be less
disruptive then clicking on them.

The simulated eye of t
he model, based on Segman

explains where several difficul
ties for users come from.
Segman was able to work fairly well on its early tasks, as its authors noted explicitly, because the computer
screen is a relatively benign environment for vision. Edges are crisp, color and shapes are not ambiguous, and
there a
re a limited number of object types. Environments for USR are basically none of these. The control
screens of these systems are easy to see, but the addition of a forward looking (indeed any perspective) video
display adds further complexity to the visio
n processing. Other reports agree that humans have problems
understanding the video displays on USR interfaces

In addition to including a harder vision problem because of more ambiguous stimuli, the video displays are also
noisier and of poorer quality than either vision (the display) or the controls. These effects may cause
disproportionate p
roblems for the model than for the human as humans have better vision systems, but
humans too have problems with recognizing objects when the display is noisy.

Models will have some difficulty with vision processing of the video signal for some time. Thes
e difficulties are
also explored in the general computer vision community. The problems that arise out of trying to understand
the image like a human would suggests that there are many more problems to be solved, and that human have
difficulties with thes
e displays as well. This may be a place for computers to help users. Work on augmented
reality would have a useful role here. Some of the earliest work in psychology showed that object naming was
more difficult than word recognition. Screens that descr
ibed or labeled ambiguous objects would help the
model and are likely to help humans in this area. Such a system could help by holding visual state (going up
stairs, on the third floor). These systems could also help by numbering ambiguous objects. Vid
eos of workers
at the World Trade Center search and rescue effort (available from suggest that it would be
useful to have objects numbered for discussion by the operators.

The model currently cannot recognize emergent features. For example
, trees that make up a line. This skill
appears to be an interaction between vision and cognition. It is not straightforward to include it in a model.
Similar inferential skills are necessary in USR to recognize implications such as hot walls, and unnat
positions of objects. People have difficulty in doing it, for recognizing the implications of objects in perception
appears to be an important component of expertise

These vision problems are compounded when consi
dering multiple robots controlled by a single operator. It
may be quite useful for computer vision algorithms to preprocess what it can, and then providing a verbal
description of the robot's progress ("going through woods"). A military commander might l
ike to see what
Simuser to explore HRI


their subordinates are doing occasionally, but if they saw everything, they would be overwhelmed. Verbal
reports in that case help manage attention and reduce cognitive demands.

As the task increases in complexity, the model will have to h
ave mental maps of the world, which is difficult for
people to create without external memory aids. Unless the operator has a clipboard, paper, and pen with them,
they have to hold the mental map of the world in their head. Providing support for world vi
ews in the
interface would help users navigate their world.

The model's complexity and enfolding representation offers a theoretically based prediction of situation
awareness. That is, the model's mental map of the world at a point in time can be compared

to the world at
that point in time. It will be difficult and probably not useful to assign a number to this comparison, but
qualitative summaries and full descriptions of the match and mismatch give a detailed, meaningful measure of
the model's awareness

of the situation. The model (its knowledge and strategies) and the interface it uses to
run robots can be modified to improve situation awareness. What is likely to happen is that some aspects can
be easily improved, but that memory decay and limits to
attention are likely to hinder representing the entire
world in the model. The designer will be left with working on what aspects of the situation should be
highlighted for the model and thus for the user.

There are difficulties specifying all the tasks t
hat users would do with a USR robot. Previous reviews

provides a short list of tasks, inclu
ding navigation and noticing. The current set of tasks suggests that the
operators, in addition to their routine tasks, are also doing many novel tasks. Generally, performing tasks that
require problem solving requires more expertise and is more error pr
one than well
practiced behavior. These
effects may be due to the lack of practice with USR interfaces, but they may also indicate a fundamental effect
of the domain. This effect suggests that models and human operators should practice doing the simple t
asks to
increase resources available for more complicated tasks, and that they should practice more complicated tasks
to support transfer of skills between the complex tasks

The model we presented and the cognitive modeling architecture it is created in, ACT
R, make several
predictions about why this task is difficult for humans. We note a few here.

Learning how to use an H
R interface is currently important for the acceptance of robots in urban search and

USR robots are currently just another tool, with restrictions on training time. If their use becomes
more pervasive or specialists emerge to use USR robots like dogs are used in USR, then this may change.

Learning in the ACT
R theory, as presented by And
erson is sensitive to several factors that are usually absent
in human
robot interfaces or have bad values. Feedback, its quality and amount, may be the most important
factor. HRI interfaces often provide poor feedback as to the robot's location and the
effects of commands.
While the current model does not learn, it will have difficulties learning when commands do not always work in
a uniform manner, the interpretation of the actions is hindered by imperfect perception, and when the
feedback is not provi
ded. The feedback is also often delayed, which hinders learning.

The nature of a mixed initiative interface can make it harder to use. There are more aspects of the state to keep
track of, including who (agent or human) has which tasks, the state of progr
ess of the other agent, and then
context switching and communication. Models created in the ACT
R architecture can do these tasks, but the
models predict that these activities will take time, attention, and working memory.

Finally, the models to use robot
s interact in a less direct way than models that use more typical interfaces. The
models have to keep a distal representation separate from the interface. They are representing a separate world
Simuser to explore HRI


that is not the interface, but built from the interface. Mo
re direct interfaces can rely on the interface to hold
and be the state of their world


Based on the results of this model we can see several general lessons for the area of urban search and rescue
robots and the design of their interfaces in particular.

Models can use HRI interfaces

The model prese
nted here shows that models of users can already provide insights and summaries useful for
creating better human
robot interfaces. The various levels of the model provide suggestions for improving the
interface we studied, and by analogy provides suggesti
ons for many USR robot interfaces.

The model's parameters can be varied to represent individual difference in the operator, such as working
memory capacity and knowledge. The impact of these differences on performance can be examined, as shown
in figure
s 4 and 5. These results suggest that the speed of the robot can influence its drivability, with increased
speed not always leading to better performance. Differences in the interface can also be examined.

Massive application and reuse is becoming poss

Our model is not complete and is not a perfect user in many ways. It is, however, uniquely positioned to
rapidly improve the scope of behavior it can represent as well as being applied to further interfaces.

Having the model interact with an off
shelf interface based on reading a bitmap and generateing input
events directly means that the model can now start to be applied to virtually any human
robot interface.
Interfaces that run under the Windows operating system can be examined directly. Int
erfaces that run under
the Macintosh system can be examined with the Windows to Macintosh display tool VNC. Interfaces that run
under the X
windows system can be examined using the Xceed Windows utility for displaying X
generated displays under Wi
ndows. As noted above, there are limitations in fonts and object recognition, but
these are now approachable problems.

Creating the model within a cognitive architecture provides an approach for including more aspects of behavior
quickly as well as provid
ing several further advantages when creating such a large model. There are many
people creating models of human behavior using the ACT
R architecture (see for
a list). Some of these models are of behaviors not of interest or o
f use for modeling a user of HR interfaces.
There are enough models, however, that we were able to start to build upon existing models rather than create
our model entirely from scratch. This is theoretically pleasing because it offers an additional audi
ence for these
models, as well as other users and scientists to test the user model in formal and informal ways.

There are several models that we can already point to as being candidates for directly extending our model.
Candidates include St. Amant's mo
del of autonomous exploration of interfaces
, Ritter's mod
el of telephone
and Salvucci's model of telephone dialing while driving
. St. Amant's model of interface
exploration may be extendible to include victim search in the environment, a common task in for urban search
and rescue robots

Working within a common cognitive modeling architecture provides the resources of the architecture for
people interested in understanding the model. In the case of ACT
R, this
includes an online tutorial, summer
schools, programming interfaces, a mailing list to get help with technical problems, and a manual.

The model predicts that human
robot interfaces are sensitive to many factors, including graphics, the
processing speed of

the human operator relative to the robot they are manipulating, the knowledge and
processing in the robot and user, and the user's eye
hand coordination. Improving an interface that relies on
Simuser to explore HRI


this many factors is difficult without tools to help keep trac
k of these factors and their relationships. Keeping
track of these factors in complex environments, such as operating multiple robots, understanding how their
control structures will be robust to perturbations, and predicting how many robots an operator c
an easily
manipulate, will be particularly difficult.

To a certain extent, this knowledge of users and of how to support users with interfaces can be passed to
designer by having them reading books on Human
Computer Interaction. This approach is useful,

and model
users like the one presented here can also help by providing a system level summary of users that will be
interesting and informative to HRI designers.

The model explored here showed that it is not an artifact or coincidence that current human
obot interfaces
are difficult to use. Our theory of HRI use, DUMAS, suggests these interfaces are difficult for all kinds of
reasons. The difficulties range from the perceptual issues that must be addressed, to the relatively high
cognition and pro
blem solving in new situations that must be supported, as well as the relatively large amount
of knowledge that is required. The model of vision that interprets the bitmap makes direct suggestions that a
reason the task is difficult is because the vision
recognition problem of the real world bitmaps is difficult. This
result suggests that better display hardware and augmented reality particularly would help make HR interfaces
easier to use. As these user models become easier to create they will be able t
o more routinely provide
feedback directly to interface designers. In the meantime, example models like this can summarize behavior
with human
robot interfaces, noting what makes human
robot interfaces difficult to use so that designers can
study these pr
oblems and avoid them.


This work was sponsored by the Space and Naval Warfare Systems Center San Diego, grant number N66001
411F. The
content of the information does not necessarily reflect the position or the policy of the Government
, and no official endorsement
should be assumed.

DIRK VAN ROOY is a post
doctoral researcher
at the new interdisciplinary School of Information Sciences and Technology at
Penn State. He earned his PhD, MS and BS in Computational Psychology from the Free U
niversity of Brussels.

FRANK E. RITTER (M '82) helped start the new interdisciplinary School of Information Sciences and Technology at Penn State.
He earned his PhD in AI and Psychology and a MS in psychology from CMU, and BSEE from the University of Ill
He is on the editorial board of Human Factors and the steering committee of the Society for the AI and the Simulation of

ROBERT ST. AMANT is an associate professor in the Computer Science Department at North Carolina State Univers
ity. He co
directs the IMG Lab, which performs research in the areas of intelligent user interfaces, multimedia, and graphics. He earne
d a
Ph.D. in computer science in 1996 from the University of Massachusetts, Amherst, and a B.S. in electrical engineeri
ng and
computer science from the Johns Hopkins University in 1985.



T.W. Fong and C. Thorpe,
Vehicle Teleoperation interfaces.

Autonomous Robots, 2001.
: p. 9


L.S. Lopes, et al.,
Sentience in robots: applications a
nd challenges.

IEEE Intelligent Systems [see also IEEE Expert],
(5): p. 66


R. Murphy, Casper, J., Micire, M., and Hyams, J.,
initiative Control of Multiple Heterogeneous Robots for
, in
IEEE Transactions on Robotics and Automation
. 2002.


F.E. Ritter and J.H. Larkin,
Using process models to summarize sequences of human actions.

Interaction, 1994.
(3&4): p. 345


F.E. Ritter and R.M. Young,
Embodied models as simulated users: Introduction to this special issu
e on using cognitive
models to improve interface design.

International Journal of Human
Computer Studies, 2001.
: p. 1


M.J. Schoelles and W.D. Gray.
Argus Prime: Modeling emergent microstrategies in a complex simulated task

ings of the Third International Conference on Cognitive Modeling
. 2000: Veenendal, NL:
Universal Press.

Simuser to explore HRI



J.R. Anderson and C. Lebiere,
The Atomic Components of Thought.

1998, Mahwah, NJ: Lawrence Erlbaum Associates.


R. St. Amant and M.O. Riedl,
A perc
eption/action substrate for cognitive modeling in HCI.

International Journal of
Computer Studies., 2001.
: p. 15


J.R. Anderson, Matessa, M., & Lebiere, C.,
R: a theory of higher level cognition and its relation to visual

man Computer Interaction, 1997.
12 (4)
: p. 439


D. Kieras and D.E. Meyer,
An overview of the EPIC architecture for cognition and performance with application to
computer interaction.

Computer Interaction., 1997.
: p. 391



Schunn and J.R. Anderson,
The generality/specificity of expertise in scientific reasoning.

Cognitive Science, 1999.
: p. 337


M.C. Lovett, L.M. Reder, and C. Lebiere,
Modeling individual differences in a digit working memory task.
, in
ings of the Conference of the Cognitive Science Society
. 1997, Mahwah, NJ: Erlbaum. p. 460


J.R. Anderson, J.M. Fincham, and S. Douglass,
The role of examples and rules in the acquisition of a cognitive skill.

Journal of Experimental Psychology: Le
arning, Memory and Cognition., 1997.
: p. 932


D.D. Salvucci,

Predicting the effects of in
car interface use on driver performance: An integrated model approach.

International Journal of Human
Computer Studies, 2001.


M.D. Byrne,
R/PM and menu selection: Applying a cognitive architecture to HCI.

International Journal of Human
Computer Studies., 2001.
: p. 41


P.M. Fitts,
The information capacity of the human motor system in controlling the amplitude of movement.


Experimental Psychology, 1954.
: p. 381


M. Antoniotti and A. Göllü,
SHIFT and SmartAHS: A Language for Hybrid Systems Engineering, Modeling, and
, in
Proceedings of the USENIX Conference of Domain Specific Languages
. 1997: Santa Ba
rbara, CA,


J. Aasman,
Implementationsof car
driver behaviour and psychological models
, in
Road User Behavior: Theory and
, J.A. Rothengatter and R.A. Bruin, Editors. 1988, Van Gorcum: Assen.


J.J. Gibson and L.E. Crooks,
A theoretica
l field
analysis of automobile

American Journal of Psychology, 1938.
: p. 453


T. Bellet and H. Tattegrain
A framework for representing driving knowledge.

International Journal of Cognitive
Ergonomics, 1999.
(1): p. 3


. van der Molen and M.T. Bötticher,
A hierarchical risk model for traffic participants.

Ergonomics, 1998.
(4): p.


M.F. Land and J. Horwood
Which parts of the road guide steering?

Nature, 1995.
: p. 339


M.F. Land and D.N. Lee,
re we look when we steer.

Nature, 1994.
: p. 742


H.W.J. Robbe,
Marijuana use and driving.

Journal of the International Hemp Association 1: 44
48., 1994.


F.E. Ritter, M. Avraamides, and I.G. Councill,
An approach for accurately modeling the
effects of behavior
, in
Proceedings of the 11th Computer Generated Forces Conference
. 2002: Orlando, FL: U. of Central
Florida. p. 29


B.L. Hills,
Vision, visibility and perception in driving.

Perception, 1980.
: p. 183


P. Milg
ram, et al.,
Applications of Augmented Reality for Human
Robot Communication
, in
IROS'93: Int'l Conf. on
Intelligent Robots and Systems
. 1993: Japan. p. 1467


R. Clark,
Asimov's laws of robotics:implications for information techonolgy.

IEEE Comput
er, 1994.
26, 27
(12, 1).


T. Takahashi, et al.
robot interface by verbal and nonverbal communication.

RSJ International Conference
on Intelligent Robots and Systems
. 1998.


C. Wickens, Mavor, A., Purusurum, R., and McGee, B.,
The Future of T
raffic Control.

1998, Washington, D.C:
National Academy of Sciences.


C.D. Wickens, Gordon, S., and Liu, Y.,
An introduction to human factors engineering.

1998, New York: Addison
Wesley Longman, Inc.


B. Cheng and T. Fujioka.
A Hierarchical Driver Mo
. in
IEEE Conference on Intelligent Transportation Systems
1997. ITSC: IEEE.


J. Casper,
Robot Interactions during the Robot
Assisted Urban Search and Rescue Response at the World Trade
, in
Computer Science and Engineering
. 2002., USF.


G.A. Klein,
primed decisions.
, in
Advances in Man
Machine Systems Research
, W.B. Rouse, Editor. 1989,
JAI.: Greenwich, CT. p. 47


M.K.A. Singley, J. R.,
Transfer of Cognitive Skill.

1989, Cambridge, MA: Harvard University Press.


F.E. Ritter,
A role for cognitive architectures: Guiding user interface design
, in
Proceedings of the Seventh Annual
R Workshop, p. 85
. 2000: Department of Psychology, Carnegie
Mellon University.