Visual Exploration and Analysis of Human-Robot Interaction Rules

fencinghuddleAI and Robotics

Nov 14, 2013 (3 years and 5 months ago)

71 views

Visual Exploration and Analysis of Human-Robot Interaction Rules
Hui Zhang and Michael J.Boyles
Pervasive Technology Institute,Indiana University,USA
ABSTRACT
We present a novel interaction paradigm for the visual exploration,manipulation and analysis of human-robot interaction
(HRI) rules;our development is implemented using a visual programming interface and exploits key techniques drawn
from both information visualization and visual data mining to facilitate the interaction design and knowledge discovery
process.HRI is often concerned with manipulations of multi-modal signals,events,and commands that form various
kinds of interaction rules.Depicting,manipulating and sharing such design-level information is a compelling challenge.
Furthermore,the closed loop between HRI programming and knowledge discovery fromempirical data is a relatively long
cycle.This,in turn,makes design-level verication nearl y impossible to perform in an earlier phase.In our work,we
exploit a drag-and-drop user interface and visual languages to support depicting responsive behaviors from social partici-
pants when they interact with their partners.For our principal test case of gaze-contingent HRI interfaces,this permits us
to program and debug the robots'responsive behaviors throu gh a graphical data-ow chart editor.We exploit additional
program manipulation interfaces to provide still further improvement to our programming experience:by simulating the
interaction dynamics between a human and a robot behavior model,we allow the researchers to generate,trace and study
the perception-action dynamics with a social interaction simulation to verify and rene their designs.Finally,we ext end
our visual manipulation environment with a visual data-mining tool that allows the user to investigate interesting phenom-
ena such as joint attention and sequential behavioral patterns from multiple multi-modal data streams.We have created
instances of HRI interfaces to evaluate and rene our develo pment paradigm.As far as we are aware,this paper reports the
rst program manipulation paradigm that integrates visual programming interfaces,information visualization,and visual
data mining methods to facilitate designing,comprehending,and evaluating HRI interfaces.
Keywords:Information Visualization,Visual Data Mining,Joint Attention,Human-Robot Interaction.
1.INTRODUCTION
HRI
1
is the interdisciplinary study of interaction dynamics between humans and robots.One fundamental goal of HRI is
to develop the principles and algorithms for robot systems that make them capable of effective interaction with humans.
Many facets of HRI research focus on the design and analysis of interaction rules that will allow a robot to track humans
so that better engagement can be maintained.
These rules with the form E,C ⇒A,are expressed through an event part (E),a condition part (C) and an action part (A).
The informal meaning in the domain of robot perception and action is that when an event (e.g.,an asynchronous sensory
event,or a timer expiration event) is recognized and if the specic condition computationally holds,then the robot per forms
the programmed action.While these rules can eventually be implemented with code blocks containing various control
statements (such as IF-THEN,and SWITCH statements comm only found in most textural programming languages),
achieving intelligent HRI rules requires a mix of rules programing,properties manipulation,and visual debugging to follow
the rule triggering.This quickly becomes very difcult to p erformin a traditional textual programming environment.
The recent advances of visual languages,information visualization methods,and visual data mining algorithms make it
possible for researchers to design,implement and improve HRI rules in a more expressive and powerful interface.By
exploiting such tools and methodologies,we feel that we can make a signicant contribution to these classes of problems
whose intuitive,non-symbolic comprehension needs to be built upon hybrid approaches of visualization,navigation,and
interaction in the specic information space.The challeng e here is to nd a mechanism that is simple enough to allow
Further author information:(Send correspondence to H.Z.)
H.Z.:E-mail:huizhang@iu.edu
M.J.B.:E-mail:mjboyles@iu.edu
the structuring and relating of rules,yet powerful enough to allow following,comprehending,and rening the t riggering
relationships among rules.
In this paper we take the rst steps towards this goal by intro ducing a novel interaction paradigm for the design,compre-
hension and evaluation of HRI rules.
2.RELATED WORK
There is a rich literature on employing HRI to understand the ground truth of social interaction.Kipp programmed and
investigated the avatar's reactive gaze attending rules ba sed on the human participant?s current state in an interview.
2
Morency's work has proposed using several sequential proba bilistic models to select multi-modal features from a speaker
(e.g.prosody,gaze and spoken words) to make decisions on visual back-channel cues (e.g.head nods) in HRI.Nakano
built an engagement estimation rule by analyzing gaze transition patterns from users.
3
There has also been growing
interest in applying intelligent computational algorithms in data analysis to understand the perception-action dynamics in
HRI studies(see,e.g.,
4,5
).
In parallel to the above-cited work focusing on understanding human behaviors by systematically manipulating interaction
rules in HRI studies,there are also many approaches that employ visualization and visual languages to compose complex
robot behavior using simple visual description.This work is largely motivated by the intrinsic limitations of textural
programming interfaces in depicting and manipulating interaction rules for HRI studies.For example,Li's Graphical
State Space Programming environment combines a software v isualization technique and a tool for robot programming.
6
Sangpetch developed a graphical storyboard representat ion for visualizing and recording a robot's behavior over ti me
7
to make the robot's behavior comprehensible to its users.Di prose designed a visual language (Ruru) that enables novice
programmers to create simple robot behaviors.
8
While most HRI studies are concerned with the manipulation of events and actions to generate interaction rules,the whole
development cycle of such undertakings are quite time consuming.Understanding and analyzing the rule triggering,are
intrinsically limited to the empirical results;this makes design-level verication nearly impossible to performin a n earlier
phase.Furthermore,depicting and manipulating rules that contains multiple sensory events and commands in HRI is a
compelling challenge with textural programming languages.The question thus raised is whether it is possible to create a
HRI rule programming environment which is both expressive and powerful.
Figure 1.Transforming HRI designs in our meta-level user interface into experimental studies:(1) drag and drop variables,commands,
functional units froma toolbox to a graphical ow-chart editor;(2) exp lore,design,and composite programobjects into rules of various
kinds;(3) construct executable interaction ow with our program manipu lation paradigm;(4) trace,investigate the perception-action
dynamics between a human and a robot behavior model,and rene beh avior design inspired by visualization;(5) acquire empirical data
in HRI studies including a robot with programmed interaction rules,and gain insights to lead to the next round of behavioral study.
3.OVERVIEWOF VISUAL EXPLORATION AND ANALYSIS SCENARIO
The motivation for building visual systems to explore and manipulate complex information can be traced back to cognitive
literature.One benet of visual languages is that it takes a dvantages of the synthesis capability of the human perception
system and the power of the computational algorithm (see e.g.,
9
about dual brain).While our ultimate challenge is to
understand the perception-action dynamics by analyzing empirical results fromHRI studies,it is still very useful,perhaps
even cognitively necessary,to create a programmanipulation paradigmthat exploits visual languages and visual computing
methods to serve as a meta-level user interface between HRI rules design and experimental HRI studies.
Our key ideas of the overall visual programming scenario,exemplied by Cassel's Study-Model-Build-Test development
cycle
10
,can be summarized as follows:
1.represent event-based variables and action commands as manageable visual forms.Examples of variables include
event variables derived fromthe social agents'eye gaze,ha nd movement,and speech.Examples of action commands
include a robot head turn and text-to-speech commands.
2.design the robot's interaction rules (i.e.,moment-by-moment interactive behavioral patterns).These rules combine
to formthe the robot's attending strategies in human-robot interactions.
3.implement the interaction model of human-robot interactions.
4.evaluate cognitive models at the design-level.This is possible,if we can also program an adaptive human model,
and then fully test the interaction dynamics by coupling the human and the robot model.
5.acquire real data from formal experiments using HRI interfaces with human participants,embodied agents,and
interaction strategies.New knowledge inferred fromempirical results leads to the new round of behavioral study.
The rest of this paper will describe how the above logical steps can be implemented in an integrated environment that
allows the user to depict interaction rules with visual languages,validate cognitive models in the earliest possible phase by
examining the free-ow behaviors between simulated agent m odels,gain insights from empirical results,and apply new
knowledge to the next round of behavioral study in an informative closed loop (see Figure 1).Our paper starts from the
familiar visual programming idea of dragging and droppin g program objects into various contexts to create interaction
rules.It then extends that idea for design-level vericati on purposes to an interactive program manipulation paradigm
capable of generating,tracing,and visualizing the interaction dynamics between simulated agents.Having established
the mechanisms,we proceed to a family of information visualization and visual data mining methods.Exploiting these
methods allows us to explore,integrate,and analyze multi-stream Region-Of-Interest (ROI) variables.All these combine
to provide a framework to study the families of signicant pr oblems in behavioral studies such as joint attention acquisition
and real-time adaptive behaviors.
(a) (b)
Figure 2.Visual programming of a following human's visual attention r ule.(a) A snapshot from the human's rst-person view
expresses the rule's functionality:when human gazes at an attentional o bject,the robot will also attend to the same object.The yellow
cross-hair indicates the attentional object the human gazed fromhis rst- person view.At this moment,the robot is engaged and visually
attending to the same attentional object.(b) The corresponding rule graph programmed in our environment.
4.IMPLEMENTATION METHODS
In this section,we describe the interaction models and visualization methods used to develop our visual programming
paradigm and user interfaces.Our techniques are based on a variety of prior art,including visual programming interfaces
focusing on ow-chart editors (see,e.g.,
6,11,12
) and the exploration,visualization,and pattern mining of multi-modal
data streams in social interactions (see,e.g.,
4,13,14
).Relevant end-user robot programming languages include e.g.,
68
and
knowledge discovery approaches in HRI empirical results include,e.g.,the work of,
15
,
16
and.
4
However,we have found many aspects of HRI programming and interaction design to be unique,and thus we adopted
customized hybrid approaches.For example,interaction rules must be depicted in a programming environment that are
easy to use and expressive.Such an environment must contain mechanisms that enable the user to comprehend programs
and program fragments,to compose complex programs from simpler primitives,and most importantly,to easily verify
the rules'logics by interactively following the triggerin g relationships and to verify the rule's temporal propertie s by
visualizing the precise timing of the interaction dynamics.Distinct from other programming environments which have
addressed these concerns individually,we believe it is essential to nd a unied program manipulation paradigm that
supports the above features simultaneously.
The basic modeling methods,components,and features characterizing our interface are summarized in the following
subsections.
4.1 Visual Programming of Robots'Interaction Rules
Our basic interaction model exploits a visual programming interface that implements drag-and-drop support for a pre-
dened set of graphical symbols.The user can apply program o bjects by dragging them from a toolbox to a graphical
ow-chart editor and set the values of the typed parameters v ia a set of type interactors such as pre-populated pull-down
menus and text elds.Relationships between program object s are depicted by drawing a link from one object's outgoing
connection pin to another's incoming connection pin.
A rule graph is the basic relational unit in our visual programming model.A rule graph consists of the following three
types programobjects:
• Event.Events are rendered as interactive visual forms representing sensory signals or timer expiration signals that
can be captured in the robot's perceptual interface.Primit ive events provided fromthe toolbox can be combined with
others to generate customized more complex events.
• Action.Actions have a visual syntax that allows manipulations through their combination of textual and pictorial
forms.Users can set values for parameters in the interactive form or in the property grid interface associated with
each action command.
• Trigger.Users explicitly draw directed links to dene the triggerin g relationships between an event and an action.
A trigger can also be drawn from one action to another,in order to group a set of actions that will be invoked in a
specied sequential order.
Figure 2 shows one such interaction rule applied in our programming environment.In this example,the rule is programmed
to acquire human-robot joint attention in real-time gaze-contingent interaction.The IF part (represented as a pink inter-
active form ) and THEN part (represented by a blue interactive form) are dragged from the toolbox and dropped in our
data-ow chart editor.Their relationship is dened by a lin k explicitly drawn from the event node to the action node.The
informal meaning of this rule is that the robot should visually attend to the same object when interacting with the human
participant.
Properties of these program objects contains two representations:major properties are displayed in the visual form while
others are drawn in a graphical property grid when the corresponding program object is selected.For example,the robot
gaze action node in Figure 2 allows the user to congure a hum an gaze lter which takes not only the current human gaze
information but also previous gaze information in the last 30 data points.The robot should now respond to reliable and
relatively stable human gaze but ignore abrupt and sporadic gaze points.
Programming processes are often involved with the composition of complex events and commands from primitive pro-
gramming objects.While only a predened set of primitive pro gram objects are provided in our toolbox,the user can
employ a spatial metaphor of nested containers to construct composite events and commands.As illustrated in Figure 3,
the user can drop primitive programming objects into a composition container,which,in turn,can be dropped into another
nested container to construct even more complex events or commands.This feature simplies the depiction of complex
visual symbols and their manipulation.Additionally,it enables a non-traditional way for us to organize,maintain,and
reuse a customized rule graph.
Figure 3.A sample composite rule:if human is visually attending to a differe nt attentional object,or if a 2-second timer expiration
event res.
4.2 Visual Debugging of Robots'Interaction Rules
Visual programs use interactive visualization to represent programfragments,organize programstructures,and manipulate
data ows.These representations,however,are just static representations of the program unless they provide some kind
of feedback during program execution.HRI is specically co ncerned with systematically studying the exact temporal
relationships of events and actions in the complex dynamic behavioral ows.Only viewing these program structures in a
graphical editor does little to make themmore comprehensible than viewing the same programin a traditional mediumlike
paper.Our paradigm extends the framework of visual programming with a set of program debugging tools that help the
end-user verify his/her designs by adding perception by manipulation.
In practice,when designing and programming a complex interaction rule,one often hopes for a visual debugging mech-
anism to follow the triggering relationships among events and actions by advancing step by step through the interaction
ow.Our design allows the user to trace the interaction owv isually and interactively accompanied by a real-time variable
watcher.
As shown in Figure 4,once triggering relationships are den ed for graphical programming objects placed in the ow-char t
editor,we can evaluate and execute the visual program recursively from the root node.In our example,the root node is a
human ROI generator which generates randomhuman ROI stre ams for debugging purposes.We like to think of this as a
static human model as it is not sensitive to the robot model's real-t ime behavior.Our programming environment enhances
the user's ability to step through the interaction ow to fol low the triggering relationships.Code path executed thus far in
debugging mode is highlighted by coloring the nodes'bounda ries.Various variables are displayed and can therefore be
evaluated at each step.
Even more signicant for comprehension than the moment-by- moment variable evaluation is the visualization ability of
the resultant event-based data streams.Figure 4 provides such an example:the interaction rule being programmed asks the
robot to either follow human's attentional object  or loo k at the human's face if the human gazes at the robot's face.
Sensory event streams are transformed into ROI streams and represented as sequences of color-coded bars.Their size
corresponds to their temporal duration while their color is keyed to ROI values.In this example,the robot ROI stream is
exactly a right-shifted version of the human's.The interac tion rule asks the robot to follow both the human's attention s on
objects and on robot's face.Note that the right-shifted eff ect is caused by the gaze ltering mechanism.
Figure 4.Designing and debugging a looking back interaction rule.The robot always follows the human's attention under this attending
rule.(top) Stepping over statements.(middle) Watching global variables in real-time.(bottom) The nal variable streams are visualized
as ROI streams.The robot ROI streamis a right-shifted version of the human's due to the gaze ltering mechanism(red,green and blue
colors represent three objects respectively while yellow represents face.)
4.3 Simulating the Perception-Action Dynamic Flow
The graphical ow-chart editor and visual debugging tool ca n lower the threshold of programming for end-users and
in principle are sufcient to allow the user to explore inter action rules and design various attending strategies of a robot.
However,there are many interesting phenomena that have to be studied in the perception-action loops where two interacting
agents are sensitive to each other and are capable of generating adaptive behaviors.
Our next task is to form such a loop by putting together a human and a robot behavior model in our visual programming
environment,to simulate the perception-action dynamics in the traditional HRI studies.By doing so,our interface assists
in examining and investigating the interaction rules in a more naturalistic and dynamic context.
4.3.1 Programming Adaptive Robot Behavior Models
In our principal test case,the HRI study focuses on an important looking behavior in human-robot interactions? gazing at
each other's face.Face gaze and mutual gaze between two soci al partners are critical in smooth human-human interaction.
Therefore it is important to investigate at what moments and in what ways a robot should look at a human user's face as a
response to the human's gaze behavior.
Figure 5 and Figure 6 provide two HRI rule samples with different ways that the robot generates gaze behaviors corre-
sponding to the human participants'real-time gaze behavio rs.The looking back rule in Figure 4 asks the robot to alway s
Figure 5.Designing a no face-looking interaction rule.The robot nev er looked at the participant's face and instead always attended to
the target object that the participant gazed at.This rule served as a baseline on how frequently the human will look at the robot?s face
while the robot never looks back.The robot ROI streamis a right-shifted version of the human?s,but without any looks to the human?s
face (shown as yellow).Red,green and blue colors represent three objects respectively.
Figure 6.Designing a more face-looking interaction rule.In addition to r esponding to the participant?s face looks,the robot initializes
additional face looks.The program structure here contains all the rule graphs implemented in Figure 4,additionally we now trigger a
robot gazes at human action upon a timer expiration event.The resulta nt robot ROI stream is a right-shifted version of the human?s
plus more face looks initialized by the robot.Again,red,green and blue colors represent three objects and yellow represents face.
follow the human's attention.Consequently,the robot's ey e contact is both initialized and terminated by the human.In
the no face-looking rule (see Figure 5),eye contact is nev er established.In the more face-looking rule,as illustr ated
in Figure 6,the robot attempts to initialize eye contact (triggered by a timer expiration event) in addition to following
human-initialized eye contact.These HRI rules can potentially be used to study,at the micro-behavioral level,how hu-
mans'responsive gaze patterns might differ.This class of r esearch questions can easily be answered using the visual
programming systempresented here by systematically manipulating HRI rules.
4.3.2 Rule-based Visual Programming of Adaptive Human Behavior Models
Just as we can program robot models with attending strategy,we can also program an adaptive human model to interact
with the robot model in a simulated perception-action loop.Putting an adaptive human model in the loop makes it possible
to follow the triggering relationships among rules across interacting agents in the dynamic context.
As shown in Figure 7,our environment provides a rule-based visual programming interface for the user to program an
adaptive human model.We would like the human model to lead the communication when interacting with robot models.
Therefore we enable the user to sketch a ROI stream baseline for the human model;thus far this is not that different from
the random human ROI generator but interactable in a graphical way.Our rule-base visual programming interface also
provides a rule editing panel that allows programming and submitting individual attending strategies as part of the adaptive
human model.These rules will be triggered at specic moment s and will overwrite the baseline ROI stream.With these
interfaces we can implement an adaptive human model that is not only capable of executing the baseline teaching strategy,
but is also sensitive to the real-time behaviors generated fromthe robot model.
Figure 7 shows the screen image of our rule-based visual programming interface for manipulating human model's attendin g
strategies.The user rst selects and drops the color-coded ROI variable icons to the baseline texteld to dene the
attending strategy baseline.The baseline example in Figure 7 says the human model will rst gaze at the red object for
a certain amount of time,then the green object,and lastly the red object.The size of these ROI bars correspond to the
desired durations of each ROI event.The user can depict the types of attending strategies to supplement the ROI baseline.
For instance,in Figure 7,the IF section indicates an interactive behavior of human attend s to red object;robot attends to
blue object,and the THEN section denes an action of human attends to blue object rs t,then looks at the robot's face,
and nally continue to track the red object.Combining the IF and THEN part,we are programming a gaze re-engaging
rule that will be triggered when the robot model is not visually attending to the human model.
Figure 7.Rule-based visual programming interface for adaptive human behavior models.We allow the depiction of a baseline ROI
streamand a set of individual attending rules in a unied,graphical inter face.
4.3.3 Simulating the Perception-Action Dynamics with a Human and a Robot Behavior Model
Putting the Robot and Human Behavior Models Together.The human behavior model which demonstrates this idea is
based on the baseline we dene in Figure 7.We supplement that baseline with the two individual rules in Figure 8.The two
individual rules are mainly concerned with how the human should re-engage the robot through face looks when the robot
is not attending to the human's attention.Next,we should pr epare a robot model that can be engaged in the interaction,
but ideally can also be un-engaged sometimes so that the hu man model will trigger the re-engaging strategies at thes e
moments.To implement this,we exploit the look-back robo t model in Figure 4 with an engagement level introduced
(e.g.,engaged in 10%,50%or 90%of total interaction time).When a robot model is fully engaged in interaction,it would
share visual attention with the human model.When a robot is not engaged,it will not visually attend to the same attentional
object.Instead,it will either looks at a different object or even something that is not interesting at all.
Figure 8.Two individual rules programmed for the human model.(left) If the robot is visually attending to a different object,the human
should try to re-engage the robot by rst looking at the object of the rob ot's interest,then looking at the robot's face,and nally coming
back to the object of his original interest.(right) If the robot is not attending to anything interesting,then the human will rst look at the
robot's face and then track the object of his original interest.
The Perception-action Dynamics.Figure 9 provides an example of the resulting perception-action dynamics between
the human and the robot model - described in the preceding section.Of the two parallel streams,the rst one is the human
model's eye gaze which indicates to which object the human at tends.The second stream is from the robot model's gaze.
We highlight two momentary interactive behaviors from these parallel data streams to illustrate the kinds of interactive
patterns that can be generated.The rst sequential pattern starts with the situation that the robot attends to a different
object (step 1).Next,the human checks the robot's gaze stat us and follows the robot's attention to the same object (step 2).
Then the human looks at the robot's face (step 3) which,in tur n,causes the robot to look back (step 4).Finally,the human
attends to his original attention (step 5),and the robot follows (step 6).
The second sequential pattern starts with the moment that the robot loses the human's attention and doesn't look at anyth ing
interesting (step 1).Next,the human looks at the robot's fa ce and the robot looks back (step 3).The human then comes
back to the attention of his original interest (step 4),and the robot now reaches to that same object (step 5).
The goal of building the human and robot model is to provide a mechanism for us to apply various interaction rules in the
simulated perception-dynamics for two interacting agent models.This methodology allows us to quickly apply various
HRI rules in a simulated interaction loop for design-level verication purposes.
Figure 9.An example of perception-action dynamics between such a human and robot model.The two sequential patterns shows that
our programmed re-engaging rules in our human model can track the robot's attention and re-engage the robot by generating a sequence
of eye gaze events.
5.FORMINGA CLOSED LOOP BETWEEN HRI DESIGNS AND EXPERIMENTAL STUDIES
In this section,we describe a set of interactive visualization and visual data mining tools we integrated into our framework
in order to analyze the integration of multiple sensory events and to nd sequential behavioral patterns from empirical
data collected from experimental HRI studies.Extracting temporal information (such as the exact timing and duration of
interesting phenomena) allows us to evaluate and compare different HRI rules using empirical results.It also allows us to
rene our visual programming environment by applying the n dings of temporal properties and reliably identied human
behavioral patterns to our visual programming environment.
Figure 10.An overview of our HRI systemstructure.A participant and the robot sat across the table and interacted with each other in a
shared environment.The human teacher attempted to teach the robot a set of (articial) objects,and the robot generated gaze-contingent
responsive behaviors based on the real-time detection of the human teacher's attention and the rules programmed by data researchers.In
addition,multiple data streams were recorded to visualize and analyze in this joint task.
5.1 Transforming HRI Designs to Experimental Studies
We have used our visual programming interface to create several instances of HRI studies that focus on studying the real-
time adaptive human behaviors when interacting with robots with different robot gaze behaviors.As shown in Figure 10,
our visual programming environment can convert the visual representation of robot's interaction rules into python-ba sed
robot gaze-contingent action module as part of the human-robot-interaction system.
One such experiment built on our framework (see e.g.,Figure 1) was a language learning task in which a human teacher
was asked to teach the robot a set of object names in a shared environment.The human needed to engage the robot
and attract the robot's attention to the target object of his own interest,and then teach the robot object names (pseudo-
English words were used,e.g.,bosa).This joint task allo wed participants and the robot to naturally interact with each
other without any constraint on what they had to do or what they had to say.In order to teach the robot object names,
human participants actively played the role of teacher and freely generated multi-modal behaviors to attract the robot?s
attention.Behaviors included eye contact,pointing to and manipulating objects in the shared environment,and speaking.
The interaction itself was free-owing,allowing particip ants to generate naturalistic behaviors.Raw action data stream in
human-robot interactions were recorded as temporal sequences.Each temporal itemis represented as (r,[l,u]),where [l,u]
denes a non-empty interval and r denes the symbolic ROI associated to this interval.Figure 11 shows an example of
multi-modal data streams collected in one such HRI experimental study.The three streams are derived from raw action
data from both the user and the agent.The rst is the ROI strea m from the robot's eye gaze indicating which object the
robot attends to (e.g.gazing at one of the three objects or looking up toward the human's face).The second is the ROI
stream (at the three objects and the robot's face) from the hu man user's gaze.The third one encodes which object the
human is manipulating and moving during the interaction.
5.2 Integrating Event-based Variable Streams
Just as we allow the creation and customization of complex events and commands from primitives,we allow the end-user
to create and manipulate custom ROI event streams.Figure 12 provides the screen image of assigning desired colors to
ROI variables in our user interface.By manipulating the color assignment and ltering out those which are not interesti ng,
an end-user can create various projections of the original data stream.Figure 13(a)(b) provides examples of the creation of
two new ROI streams which focus on the face-looking event fromthe human and the robot participants by assigning white
color to ROIs of the three objects (originally using green,blue,and green).
Figure 11.Examples of multi-modal data streams we collected fromthe experimental HRI study instance.
Figure 12.Assigning desired colors to ROI variables.
Our framework also allows an end-user to overlay data streams by dragging a rendered data stream on top of another data
stream.Our system performs a binary XOR operation when two streams are overlayed.By XORing two ROI event
streams,we can integrate two sensory events to nd out the jo int attention over these two sensory channels (XOR will
produce true if the two pixels being overlayed are of different colors,and false if they are of same color.) For example,
in Figure 13(c) we can overlay the newly derived robot face-looking stream and human face-looking stream to identify
the human-robot face-to-face coordination moments.Similarly,we can overlay human eye gaze ROI stream and human
hand movement ROI streamto obtain a new eye-hand coordination streamwhich indicates the human gazes at an object he
also manipulates with his/her hands.Finally,we can integrate the human-robot joint attention event stream by overlaying
eye-hand coordination streamand face-to-face coordination stream.
5.3 Extracting Key Temporal Properties fromMulti-modal Data Streams
Even more interesting in HRI data streams is the quantitative timing information including,e.g.,the typical duration
of a sensory event as well as sequential behavioral pattern information.Such data-driven analysis has the potential to
provide useful means to understand the multi-modal interactions in HRI,and by doing so we can quantitatively evaluate
and compare our interaction rule designs.Moreover,many interesting ndings of these behavioral patterns lead to the
renement of our visual programming interfaces.We use thes e ndings in our rule-based visual programming interface to
make the human models closer to the real world.
Modern computational algorithms have made such knowledge  ndings possible.Guyet has developed an intelligent algo-
rithmto identify continuous interval-based events using a quantitative temporal mining algorithm.
14
His approach exploits
an Apriori algorithmto generate candidate sequences as the rst step then a MCMC sampling method and an expectation-
maximization (EM) algorithm were applied to select candidate patterns.Our interface integrates Guyet's QTempIntMiner
algorithmfor data analysis and pattern mining.
• Quantitative Timing Information.Figure 14 shows one of the most reliable patterns of human-robot eye gaze
events detected by the algorithm:a 1.3 second robot initialized object-looking event followed by a 0.5-second face-
to-face coordination,and then the human attends to the same attentional object with a 0.4-second object-looking.
(a) robot face-looking
(b) human face-looking
(c) face-to-face coordination:(robot face-looking) XOR (hu-
man face-looking)
(d) human eye-hand-coordination:(human eye gaze) XOR
(human hand movement)
(e) human-robot joint attention:(human eye-hand-coordination)
XOR (face-to-face coordination)
Figure 13.Integrating over ROI event streams by overlaying steampictures.
Figure 14.The most reliable sequential patterns found when integrating human eye gaze stream,human-robot face-to-face coorindation
stream,and human-robot joint attention stream.The most reliable pattern showing a 1.3 second robot eye gaze event,followed by a 0.5
second human-robot face-to-face coordination.Approximately 1 second after face-to-face coordinate,both the human and the robot will
reach to the same attention.
• Sequential Relationships Between Events.Complicated temporal patterns such as adaptive behavior patterns can
be identied with the algorithm as well.Figure 15(a) provid es an example of interesting interactive behavioral
patterns.The informal meaning of this pattern being identied is when the robot learner is not visually attending to
the human teacher.The human will rst initiate a face-looki ng at the robot,followed by a gaze event at the same
attentional object and then uses hand to manipulate that object with the hope to re-engage the robot's attention.In
Figure 15(b) we verify this pattern nding by highlighting t he several moments in the visualized data streams that
exhibits this interactive behavioral pattern.
The key temporal properties and sequential patterns revealed can not only advance our understanding of human cognition
and learning,but also provide quantitative evidence that we can directly incorporate in our HRI rules design interfaces.
Most of the temporal properties in our visual programming interfaces are based on these imperial results as the default
parameters.The visual rule-based programming interface for modeling human behaviors also exploits many discovered
patterns fromempirical studies.Newinsights and hypotheses gleaned fromthe empirical data combine to contribute to the
next round of interaction rules design.
(a)
(b)
Figure 15.An example of interactive behavioral pattern.(a) The sequential pattern starts with a situation when the robot is not engaged,
then the human re-engages the robot by rst looking at the robot's fac e,followed by a joint attention,then the human manipulates the
object in his hand to try to attract the robot to reach his attention.(b) Instances of the sequential pattern are identied in the multimodal
streamvisualization.
6.CONCLUSION AND FUTURE WORK
This paper proposes a novel visual programming environment to facilitate designing,manipulating and evaluating HRI
interfaces.The idea is to combine a visual programming interface with visual data mining methods.We have developed
a prototype system with several critical features to facilitate HRI designs.First,we break a complex HRI design goal
into manageable steps by describing rules in a graphical way.Second,we visualize not only the program structures,but
also all intermediate and nal results of interaction ows.Such visualization allows researchers to trace and follow the
triggering relationships among rules.Third,we developed ways to visualize and analyze the event variables in interaction
dynamics separately and integrally.Our visual programming environment allows users to easily formulate new ideas and
interaction designs based on quantiable data obtained thr ough visualization,simulation,and analysis.Our next step is
to conduct an evaluation of our prototype system.We also plan to extend our visual programming paradigm to support
human-multi-robot interactions.
REFERENCES
[1] Hosoda,K.,Sumioka,H.,Morita,A.,and Asada,M.,Acquisition of human-robot joint attention through real-time natural in-
teraction, in [ Intelligent Robots and Systems,2004.(IROS 2004).Proceedings.2004 IEEE/RSJ International Conference on],3,
2867  2872 vol.3 (sept.-2 oct.2004).
[2] Kipp,M.and Gebhard,P.,Igaze:Studying reactive gaze beha vior in semi-immersive human-avatar interactions, in [ Proceedings
of the 8th international conference on Intelligent Virtual Agents],IVA'08,191199,Springer-Verlag,Berlin,Heidelberg (2008).
[3] Nakano,Y.I.and Ishii,R.,Estimating user's engagement from eye-gaze behaviors in human-agent conversations, in [ Proceedings
of the 15th international conference on Intelligent user interfaces],IUI'10,139148,ACM,New York,NY,USA (2010).
[4] Fricker,D.,Zhang,H.,and Yu,C.,Sequential pattern mining of multimodal data streams in dyadic interactions, in [ Development
and Learning (ICDL),2011 IEEE International Conference on],2,1 6 (aug.2011).
[5] Zhang,H.,Fricker,D.,Smith,T.G.,and Yu,C.,Real-time adap tive behaviors in multimodal human-avatar interactions, in
[International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction],ICMI-
MLMI'10,4:14:8,ACM,New York,NY,USA (2010).
[6] Li,J.,Xu,A.,and Dudek,G.,Graphical state space programming:A visual programming paradigmfor robot task specication,
in [Robotics and Automation (ICRA),2011 IEEE International Conference on],4846 4853 (may 2011).
[7] Sangpetch,A.,Visualizing Robot Behavior with Self-Generated Storyboards,Master's thesis,School of Computer Science,
Carnegie Mellon University (2005).
[8] Diprose,J.,End user robot programming via visual languages, in [ Visual Languages and Human-Centric Computing (VL/HCC),
2011 IEEE Symposium on],229 230 (sept.2011).
[9] Shu,N.C.,Visual programming:perspectives and approach es, IBMSyst.J.38,199221 (June 1999).
[10] Cassell,J.,Body language:Lessons fromthe near-human, J.Riskin (ed.) Genesis Redux:Essays in the History and Philosophy
of Articial Intelligence,346374 (2007).
[11] Repenning,A.and Ambach,J.,Tactile programming:a unied m anipulation paradigm supporting program comprehension,
composition and sharing, in [ Visual Languages,1996.Proceedings.,IEEE Symposium on],102 109 (sep 1996).
[12] Sim,Y.-S.,Lim,C.-S.,Moon,Y.-S.,and Park,S.-H.,Des ign and implementation of the visual programming environment for the
distributed image processing, in [ Image Processing,1996.Proceedings.,International Conference on],1,149 152 vol.2 (sep
1996).
[13] Yu,C.,Zhong,Y.,Smith,T.,Park,I.,and Huang,W.,Visua l mining of multimedia data for social and behavioral studies, in
[Visual Analytics Science and Technology,2008.VAST'08.IEEE Symposium on],155 162 (oct.2008).
[14] Guyet,T.and Quiniou,R.,Mining temporal patterns with quantitative intervals, in [ Data Mining Workshops,2008.ICDMW'08.
IEEE International Conference on],218 227 (dec.2008).
[15] Yu,C.,Schermerhorn,P.,and Scheutz,M.,Adaptive eye g aze patterns in interactions with human and articial agents, ACM
Trans.Interact.Intell.Syst.1,13:113:25 (Jan.2012).
[16] Breazeal,C.and Scassellati,B.,Infant-like social interactions b etween a robot and a human caregiver, Adapt.Behav.8,4974
(January 2000).