A biologically inspired Solution for the Robotic Catching Problem

vivaciousaquaticAI and Robotics

Nov 13, 2013 (4 years and 5 months ago)


A biologically inspired Solution for the
Robotic Catching Problem

Dr. Pedro J Sanz


Technical Report

Nov. 2000

Institute for Real
Time Computer Systems

Technische Universität München



I am indebted to a number of persons and institutions who made possible this research stage.

In first place, thanks to the people from the Institute of Real
Time Computer Systems, at the
Technical University of Munich. And very specially to Dipl. Ing. Mike

Sorg, who has been my
closer collaborator during all my time here and, to the head of this Institute, Prof. Dr. Ing. Georg
Färber, who offered to me this opportunity in due time.

Also, my recognizance to Dr Thomas Schenk, at the Department of Neurology, f
rom the
University Hospital Munich
Grosshadern (Ludwig
Maximilians University), for expend some time
to explain me several neurological details about “catching” by people.

Finally, my gratefulness to the Spanish “Ministerio de Educacion y Cultura”, for its



Bearing in mind that a previous bilateral research agreement exists between both institutions, the
“Institute of Real
Time Computer Systems”, in Munich (Germany), and the “Robotic Intelligence
Lab”, in Castellon (Sp
ain), signed for two years (2000 and 2001), the main idea to came here was
to improve this collaboration.

The underlying long term objective of this bilateral collaboration is the establishment of a complete
robot system (robot arm, vision system, etc.) ca
pable to grasp objects in movement in the real
world, using vision both for perception tasks and for guiding the robot actions. In the research
literature this is known like the “catching” problem.

To take advantage over this very complex problem, we start

out from some previous experiences
gained in both Laboratories. (1) The visually guided grasping in 2D static scenarios, from the
Spanish lab, and (2) the MinERVA project at the German Institute. Summarising, we can say that:


Was more oriented toward the
“grasping determination” problem over unknown static
objects, guaranteeing stability, using 2D vision. And,


More interested in looking for a robust solution, firstly in the control of grasping guided by
stereo vision and, after in tracking strategies using

active contours.

Moreover, related to the specific implementation work, there are two thesis in progress, each one
on a side of this bilateral project. Gabriel Recatala, from the Spanish Lab, and Mike Sorg, from the
German Institute. So, like thesis co
viser of Gabriel, I am still more interested in gain progress
around this project.

Hence, during this stage, I have treated to clarify sufficiently the state of the art about the catching
problem, with the aim to establish a robust bridge between both thes
is, and so advancing faster
toward the above long term objective. To develop this task has been necessary the study and
analysis in both sides of this problem, the biological inspiration from the neuroscience behaviour
approach, and the technological one.















APPENDIX. “The Basic Eye Movements”




What is the “catching” problem?

Basically, the

grasping of moving objects in the real world.

Looking for a biological inspiration to solve this problem, we can establish an analogy between
two different points of view,


The Natural Side. The Neurological approach.


The Artificial Side. The Robotics app

Objective: To obtain a theoretical framework (Neurological approach) with the aim to achieve a
preliminary computational model of robotic catching.


The Neurological Approach.

The question to answer it would be: How the human do it?

To answer t
his question, only the more recent contributions, well contrasted by psychophysical
experiments, will be taken into account.

Hong [Hong, 95] suggest the following definition, thinking about catching flying objects:

“There are two fundamental approaches to

catching. One approach is to simply calculate an
intercept point, move to it before the object arrives, wait, and close at the appropriate time.
Humans catch most light objects in this manner, using our visual sense and our sense of touch to
determine whe
n to close our hand. Another approach is to match the trajectory of the object in
order to grasp the object with less impact and to allow for more time for grasping. Humans do not
catch purely in this manner very often. For heavier or faster moving objects
, humans often catch
using a combination of the two approaches. Humans move towards the path of the object but do
not fully match the velocity of the object”.

Note that in order to implement in a robot the first approach would be very convenient the prese
of a “palm” in the robot hand that could improve the situation, of extreme sensitivity to the timing
closure, by stopping the object temporarily. This difficulty is the main reason to consider only the
second approach in our proposal (showed in section


Some of the main operations involved on the catching problem could be the following:

Select the visual target


Attention demanding. Control the gaze towards interesting points in visual space.


Movement detection.



Adapting shaping of the han
d to the target.



Gross approach. Tracking: prediction of where it will be the object in the next time.
This is an interception problem completely ballistic.


Fine approach. Trajectory on
line correction. Can be explained on the basis of

retinal positional error?.



target contact, in appropriate manner to achieve a stable grasp.


Paillard & Beaubaton [Paillard & Beaubaton, 78], suggest a functional distinction on the way in
which visual information is used to control reaching
and grasping movement. They proposed a
partition of the visual information used for planning, triggering, and guiding arm reaching and
grasping movements into three main separate channels:

An “identification channel” for selecting and steering specific mot
or pathways that orient
and shape the hand grip in accordance with the size, form, and orientation of the object to
be efficiently grasped.

A “localization channel” for triggering the motor program of reaching in the right direction
and extending according

to the target localization. And,

An “adjustment channel” to feed the corrective feedback loops that guide the directional
transport of the hand and its smooth homing in on target with the fine visual adjustment of
the grasping.

An interesting pending ques
tion is whether will be possible the parallel processing of these
different channels, and if so, how to implement them in a robotic system.

In the other hand, another important question appears: what happens with the visual information
after is processed
by the Primary Visual Cortex (V1)?.

Ungerleider and Mishkin [Ungerleider & Mishkin, 82], identified two broad “streams“ of
projections, a ventral stream projecting eventually to the inferior temporal (IT) cortex, and a dorsal
stream projecting to the poste
rior parietal (PP) cortex (see Fig. 1).

Figure 1

The major routes of visual input into the dorsal and ventral streams. The diagram of the macaque brain on the right of
the figure shows the approximate routes of the cortico
cortical projections from the
primary visual cortex to the
posterior parietal and the inferotemporal cortex, respectively. LGNd: lateral geniculate nucleus, pars dorsalis; Pulv:
pulvinar; SC: superior colliculus.

In 1982, Ungerleider and Mishkin argued that the two streams of visual
processing play different
but complementary roles in the perception of incoming visual information. According to their
original account, the ventral stream plays a critical role in the identification and recognition of
objects, while the dorsal stream medi
ates the localization of those same objects. Some have
referred to this distinction in visual processing as one between object vision and spatial vision
'what' versus 'where.' Apparent support for this idea came from work with monkeys. Lesions of
r temporal cortex produced deficits in the animal's ability to discriminate between objects on
the basis of their visual features but did not affect their performance on a spatially demanding


'landmark' task. Conversely, lesions of the posterior parietal c
ortex produced deficits in
performance on the landmark task but did not affect object discrimination learning.

Although the evidence available at the time fitted well with Ungerleider and Mishkin's proposal,
recent findings from a broad range of studies in

both humans and monkeys are more consistent
with a distinction not between subdomains of perception, but between perception on the one hand
and the guidance of action on the other [Milner & Goodale, 98]. Moreover, although we have
emphasized the separatio
n of the dorsal and ventral streams, there are of course multiple
connections between them, and indeed adaptive goal
directed behaviour in humans and other
primates must depend on a successful integration of their complementary contributions. Thus, the
cution of a goal
directed action might depend on dedicated control systems in the dorsal stream,
but the selection of appropriate goal objects and the action to be performed depends on the
perceptual machinery of the ventral stream. One of the important qu
estions that remains to be
answered is how the two streams interact both with each other and with other brain regions in the
production of purposive behaviour.

Hence, visual information from V1 is divided along two streams:

A medial “Action” or “Where” st
ream which is concerned with the special relationship
between objects for unconscious guidance movements. And,

A lateral “What” stream, concerned with conscious object recognition and perception.

The What stream gets its input primarily from P (parvocellul
ar) ganglion cells in the fovea. And,
the Action/Where stream gets its input from M (magnocellular) ganglion cells in the peripheral

Some remarks about attention would be [web

Allows us to focus in on a specific location (or stimuli).

Can be v
iewed as a flash light that is directed at different aspects of an object.

Attendin to different features cause activity in different cortical regions.

The Superior Colliculus (SC) causes the eye and head to turn to an interesting visual object:
the “visua
l grasp reflex”.

In all the visual process, identification, localization, gaze control, tracking, etc., it will be
absolutely determinant the role play by the “eyes” movements, and the head movements in
combination with them. See “Appendix A” to review al
l the possibilities of human eye movements.

Another important question is related to the transformation necessary between vision (i.e the eyes)
and the arm. Two possibilities of coordinate frames [web
1] appear:

Areas in the ventral stream tend to encode
locations in allocentric coordinates.

Areas in the dorsal stream tend to encode locations in egoocentric coordinates.

Allocentric coordinates state its location with respect to some other object (e.g. a table in terms of
its location in a room or a featur
e of a face in terms of its location on the face). They can also be
thought of as object centred (e.g. the room can be thought of as an object).

Egocentric coordinates state where the object is related to you (note you can be your body, your
head or your

The best way to appreciate the difference between these is to consider what happens when
someone moves in a room (e.g. to the right), see fig. 2.


























Figure 2.

An example to show the differences between the allocentric (ri
ght) and egocentric (left) coordinates, when someone is
moving towards the right in a room.

To clarify the use of “Coordinate Transformations“, see the next example adapted from T Vilis

Suppose that you are sitting at your desk and want to push
an object to the left. What muscle do
you contract?. It depends of your initial position. If you are in a close position, you flex your wrist.
If not, you extend your wrist. To choose which, the CNS (central neural system) needs to know
where the wrist is
relative to you. This integrated sense of position is computed as proprioceptive
information passed from primary sensory cortex to higher order areas and then to association area.

At the same time you need to know where the object is relative to you. Visua
l information coming
up the 'where' or dorsal stream provides this comparing where the object and your wrist is relative
to you, allows the CNS to generate the appropriate movement (see fig. 3).

Figure 3.

Adapted from J L Driesen [web
3], “cortical cont
rol of movement”. Note the necessary connexion between the region
where the object is represented (V1) and where the position of wrist is.



The Robotics Approach.

With the present technology, what kind of experimental setup is more convenient to guar
antee a
good performance in the catching problem?

At least would be necessary a good reliability both in the visual system and in the robot arm,
including the gripper.

Reviewing recent works about this problematic we could recommend:

Active vision system.

Normally a stereo par mounted over a 2DoF platform (pan
tilt or

Robot arm. Redundant. with 7 DoF, very fast (acceleration higher than g), and with a
gripper (e.g. two parallel jaw gripper) capable to close also very fast (less than 0.5 sec.).

efore the knowledge gained analysing some key aspects of human catching, a list of tasks
necessaries to design our robot catching system must include the following:


A vision system capable of developing the visual attention demanded, including target’s
cating and tracking, by means of integrating shape, colour, depth and movement


Controllers and robust tracking methods for the active vision system to stably track fast
moving objects.


Cross calibration between the vision system and the arm.


th prediction for the target, and


Path generation for the arm to intercept the target.




With the aim to fix faster the key points evolved on the implementation of an artificial catching
system, we will present o
nly a very known challenger case: “Adaptive Robotic Catching of Free
Flying Objects”. Developed by the “Vision and Touch Guided Manipulation Group”.
At the MIT
Artificial Intelligence Lab & Nonlinear Systems Lab [web

Headed by Prof. Slotine and funded
by Fujitsu, Furukawa Electric, and the Sloan Foundation, the
objective is to accomplish real
time robust catching of free flying objects. Initially focusing on
spherical balls of various sizes, they are now experimenting with various objects of unknown
amic characteristics, such as sponge balls, long cylindrical cans, and paper airplanes.

In the following, a short review about the more relevant devices in the experimental setup of this
project is considered, including the arm, the wrist, the hand and th
e vision system.

The Whole Arm Manipulator (see figure 4)

The MIT Whole Arm Manipulator (WAM) Arm is a very fast,
force controllable robot arm designed in Dr. Salisbury's group at
the AI Lab. The concept of "Whole Arm Manipulation" was
originally aimed at

enabling robots to use all of their surfaces to
manipulate and perceive objects in the environment. Central to
this concept (and our group's design efforts in general) has been a
focus on controlling the forces of interaction between robots and
the enviro
nment. To permit this, the WAM arm employs novel
cable transmissions which are stiff, low friction and backdrivable.
This in turn, permits a lightweight design. To achieve good
bandwidth in force control while in contact with the environment,
the arm's des
ign maximizes the lowest resonant frequency of the
system and employs an impedance matching ratio between motor
and arm masses. This also enables the arm to achieve high
accelerations while moving in free space.

Prof. Slotine and his students have develope
d the robot's system
architectures and control algorithms for tasks requiring rapid and
accurate free space motion, fast motion
vision coordination (as in
robotic catching), or precise force
controlled interactions with the

Figure 4.
The MIT Whole Arm Manipulator

The Talon (see figure 5)

A new wrist
hand mechanism has been developed and replaces a
previous forearm mounted system. The new wrist
hand, known as
the Talon, provides 3 additional powered freedoms:

one for grasping
forces and two for orientation. The motors for the device are located
in the forearm to minimize end
effector mass and maximize its
workspace. The grasping mechanism is comprised of a group of 2
fingers which move against a group of 3 fin
gers such that two
groups may be made to mesh together while encircling objects.
Finger inner surfaces are serrated to provide for high contact friction
against rough (rock) surfaces, and curved to enhance capturing large
and small objects. Fingers may def
lect compliantly to accommodate
to object geometry, and finger deflections may be sensed to provide
for monitoring grasp state. They also have studied the design of a
miniature end
effector suitable for grasping small rocks and
cylindrical objects. Similar

in spirit to the Talon, the new miniature
effector utilizes slightly different kinematics to enlarge its
feasible grasping volume.

Figure 5.
The Talon.


The Fast Eye Gimbals (see figure 6)

A more recent component of this sy
stem is the active vision
system which is comprised of two hi
resolution colour CCD
cameras with 50mm focal length lenses mounted on two
degree of freedom gimbals. They have utilized cameras with
a narrow field of view to give higher resolution images of
ypical objects.
This implies, however, that the cameras
have to be actuated in order to pan and tilt so that they can
cover broad scenes, leading to an active vision system, and
an associated trade
off between controller precision and
image resolution (nar
rowness of field of view).

The actuators which they have implemented were designed
in our lab and are known as the Fast Eye Gimbals (FEGs).
The FEGs provide directional positioning for our cameras
using a similar drive mechanism as the WAM.

The two joi
nts are cable driven and have ranges of motion
of +/

90 degrees and +/

45 degrees in the base and upper
joint axes respectively.

These two FEGs are currently strategically mounted on
ceiling rafters (see figure 7) with a wide baseline for higher
n accuracy using stereo vision methods.

Figure 6.

The Fast Eye Gimbals (FEGs).

The independent nature of the FEGs permits to
position each one at different locations in order to
vary the baseline or orientation of the coordinate
frame as well as easily ad
d additional cameras to
provide additional perspectives.

This system uses low cost vision processing hardware for simple information extraction. Each
camera signal is processed independently on vision boards designed by other members of the MIT
AI Labora
tory (the Cognachrome Vision Tracking System). These vision boards provide them with
the centre of area, major axis, number of pixels, and aspect ratio of the colour keyed image. The
two Fast Eye Gimbals permit to locate and track fast randomly moving obje
cts using "Kalman
like" filtering methods assuming no fixed model for the behaviour of the motion. Independent of
the tracking algorithms, they use least squares techniques to fit polynomial curves to prior object
location data to determine the future path
. With this knowledge in hand, they can calculate a path
for the WAM to match trajectories with the object to accomplish catching and smooth
object/WAM post
catching deceleration.

In addition to the basic least squares
techniques for path prediction, the
y study
experimentally nonlinear estimation
algorithms to give "long term" real
prediction of the path of moving objects,
with the goal of robust acquisition. The
algorithms are based on stable on
construction of approximation networks
composed o
f state space basis functions
localized in both space and spatial

As an initial step, they have studied the
network's performance in predicting the path
of light objects thrown in air.

Figure 7.

A picture of the FEGs mounted to a ceiling raf

Further application may include motion prediction
of objects rolling, bouncing, or breaking up on
rough terrains.

Some recent successful results for the application of this network have been obtained in catching of
sponge balls and even paper airpl
anes!, (see Figure 8).


Figure 8.
Catching a ball

Catching a paper airplane

The main reference to this research would be the M.S. Thesis by Hong [Hong, 95], and the paper
[Hong & Slotine, 95]. We show in the following an “Abstract” of this M.S
. Thesis:

“Robot Hand/Eye Coordination and Active Vision are both fields which have enjoyed much
attention. A variety of research has been completed which examines the use of vision to direct
robot manipulation. Specifically, previous research at MIT has e
xamined the task of combining
vision and manipulation applied to the task of tracking and catching tossed balls in controlled
environments. Building upon the foundations of this past research, this thesis presents work which
incorporates a new active visio
n system which requires a minimally controlled environment and
implements new methods for object tracking, robot/camera calibration, and new catching

The system which is used here is composed of a seven degree of freedom cable driven arm and a

ceiling mounted active vision system. The active vision system is composed of two color CCD
cameras each mounted on two degree of freedom actuators. The vision processing is done using
simple blob detection to locate color
keyed objects.

The goal of this

research is to develop the control methods and additional algorithms required to
complete successful catching of lightly tossed objects. This thesis addresses each of the required
elements. Techniques for locating and robustly tracking objects using visua
l information are
presented. Methods of cross calibration between the vision system and the manipulator are
discussed. A recursive least squares algorithm for model
based path prediction of the tossed object
is presented. And finally, methods for determina
tion of safe catch points and new polynomial path
generation techniques for the robot manipulator are discussed.

Experimental results for the application of the above algorithms to the task of catching free
spherical balls are presented. The system

was tested on under
hand tosses from random locations
approximately 1.5
2.5 meters distant from the arm. The average time of travel from leaving the
hand of the tosser to successful catching is approximately 0.5 seconds. The best performance
results were
found to be 70
80% success over similar tosses.”

Finally, some considerations about this project in order to clarify its limitations would be that the
colour, shape and the trajectory of the target are known “a priori”. Current research is being done
to e
xamine catching of objects with different aerodynamic characteristics. Objects of interest are
weight foam balls, paper airplanes (see figure 8, on the right), and other additional items with
parabolic trajectories. Moreover, the current model ba
sed least squares prediction methods for
the tossed object are replaced by wavelet network based prediction methods [Cannon & Slotine,
95], [Watanabe & Slotine, 95].




obot with
ctions [web
4], [Hauck, 99].


Looking for human
robot analogies in the catching problem, with the aim to gain knowledge to
explain normal and disturbed grasping movements in humans and at the same time derive new
strategies for technical
developments. By means of the cooperation between information
technology scientists and neurologists.

Observe the two similar setups employed in both cases (see figure 9), the human and the robot.

Figure 9.

Human and Robot performances.


The Biologi
cal Side.

Objective: examine healthy test persons in experimental setups to determine:

What visual information is required to predict the path of the moving object to be grasped (the target)?.

In which phases of the grasping movement new visual information

is used to control arm and hand

we will use a system for moving the target which we developed in a preceding period of funding.
In cooperation with other colleagues they plan to examine patients with parieto
occipital lesions to
determine the
influence of visual disturbances on catching.


The Artificial Side.

In this case, the robot system is integrated by a 7 joint’s arm with a parallel
jaw gripper and a
stereo par over a pan
tilt platform (see figure 9, on the right).

From the technical side
, the long
term goal is to develop a robotic hand
eye system, which more
closely approximates the robustness and flexibility of human abilities. A biologically motivated
model of movement control, which was developed during the first period of funding, wil
l now be
expanded by a module for predicting movements of the target to be grasped, in order to allow
grasping movements toward moving objects. The development of this module will depend on the
insights gained during the above
mentioned experimental studie
s. The robotic hand
eye system
will be further optimized by developing procedures that allow the autonomous adaptation of
internal models. Hence, the main tasks necessaries to achieve this catching objective could be:

Recognize objects by their silhouettes
. Estimate their position, either based on model information or via stereo
triangulation. And teach the robot to learn to fixate objects.

Develop a biologically motivated motion control strategy for vision
based reach
grasp movements.

Track moving obje
cts robustly with the camera head using active contours and stereo image par information.

Teach the robot to learn the visuomotor transformations for hand
eye coordination.

Reconstruct the 3D movement from probably only sparse visual information, to be abl
e to Determine suitable
catching points in space and time.

Catching execution in two stages: reaching and grasping de moving object.




In the following, the flowchart of events during the complete robotic catching process is showe
This proposal has been designed to meet the key considerations discussed before (sections 1 to 3).

Select Target
Initial Arm
Target Path Prediction
Catch Point
Close the
Close the
Fix Catch Time
Catch Point
Trajectory Matching
Return to “
“ Position


With the aim to clarify each one of the main steps, showed in the above flowchart, we will review
them using their attached numbers.

Select Target (1)

The selection of the target it depends of the human
robot interface implemented.

For instance, in an autonomous system, the target can be identified because its movement around
the robot visual area, detecting “optic flow” techniques or similar.

In general
, it would be necessary detect the localization, normally by triangulation techniques,
using stereo vision information. And also, colour, shape and movement information.

Initial Arm Actions (2)

Initially, maybe necessary to move the arm to one specific lo
cation. For example, in the problem of
“catching a ball”, the arm needs avoid “spatial backtracking”, so its starts moving away from the
ball (towards a position a good distance of the ball rather than towards the ball), providing greater
time for the arm
to accelerate.

Note than in other situations the arm do not need to move away from the target, like in the usual
MinERVA configuration for catching an object over a table, etc.

Path Prediction Techniques (3)

After the initial data of location are processe
d, an algorithm for object path prediction is executed.
(e.g. [Kimura et al., 92]).
Normally this kind of algorithm will be an iterative process.

Note that in general there exists many possibilities. The trajectory can be known “a priori” or not.
(e.g. “p
arabolic in the case of a flying target”). Very general techniques for “non
parabollic” and
planar” trajectories must be used, such as the neural networks employed by [Cannon &
Slotine, 95]

Catch Point Determination (4 and 5)

Within the servo loop,
using the iterative fitting information, a satisfactory catch time/point must
be determined.

* Safety constraints. The first stage is to determine if the object is catchable, and if so, where
should catching be attempted. These items can be determined by c
hecking points along the target
path against a series of workspace and deceleration constraints.

The path of the object will be defined parametrically in time using the appropriate trajectory fitting
method. So the catch point determination process will be

actually a catch time determination
process. The catch time is varied systematically, with checks base upon the corresponding catch
point coordinates. This process will be repeated until a satisfactory catch time is determined.

* Catch point determination

process. Begins by selecting an initial prospective catch time.
Coordinates corresponding to this catch time are obtained using the path prediction mechanism.
These coordinates are checked against the workspace and deceleration constraints. If the
ints are not satisfied, a new prospective catch time is selected and the process begins again.
This process is repeated until a satisfactory catch time/point is determined or the target is deemed

Observation: Note that all this process must to

be finished in a short space of time, because a very
restrictive limits imposed for the intersection between the target trajectory and the actual trajectory
reachable by the arm.

* Catch point update and timing of grasp. The calculation for the catch time
/point must be repeated
each time the predicted trajectory constants change. This will assure that the catch point stays
within the safety region of the workspace as the trajectory constants change. As a result, the catch
point and time vary as the target
progress. But, once the command to close the hand has been given
the catch time must became fixed. When the hand is fully closed, the hand and the target must be
coincident. Although the catch time is fixed, the catch point is allowed to vary as the trajec


constants vary. If the resulting catch point moves out of the safe workspace after the catch time has
been fixed, then the target is uncatchable and the arm returns to its “home” position

Catching Path Generation Possibilities (6)

Different methods f
or desired arm path generation can be applied.
(e.g. [Kimura, 92])

The arm can either be given paths in Cartesian space or joint space, with associated advantages and
disadvantages to each. In both representations, different path generation schemes could b
e used.

Hand Orientation and Timing of Closure (7, 8, 9, and 10)

A very fast method needs to be implemented to get that convenient grasping points can be
determined and reaching in due time. For instance, an adaptation of the method by [sanz et al., 98].

Moreover, during catching, the latency between the close command is oriented and the actual
finger closure must be considered. Due to this latency, the command to close the gripper must be
given before the arm has reached the catch point. Therefore, once
the command to grasp (i.e. close
the gripper) is given, the motion must be completed. For this reason, once the “close the gripper”
command has been issued, the catch time is fixed to ensure that when the fingers are closed, the
hand is in the correct catc
hing location.

Path Matching and Post Catching Deceleration (11, and 12)

Once the catch point has been reached, to increase the probability of successful catching, the arm
matches trajectories with the target for a short period of time before deceleration

(e.g. in the “flying ball”, for 0.05 sec after the catch time, the desired arm path is coincident with
the predicted ball path).

After matching paths, care must be taken to reduce the possible jarring experienced by the object
and to gradually introduce
the change of mass of the end of the arm for heavy objects. Therefore,
the arm must decelerate gracefully along the prior path of the object.




Related to the “real” collaboration between both labs, IRCS (Institute for Real Time Comp
Systems) and IRL (Intelligent Robotics Lab), we can say that a work is currently developed. The
primary objective will be the implementation of a complete 2D robot catching system. Where the
strategies for tracking and the prediction mechanism necessa
ry to intercept the target are being
implemented at IRCS, while all the necessary to grasp in a stable manner the target is under
construction at IRL. The last stage will be the integration and validation of all the pieces
developed. The next objective wil
l be to attack the 3D robot catching problem. For this long term
objective, the present document tries to be a useful starting point. Some specific contributions to
advance in this direction are in progress in both labs. For instance, a specific project ab
out visual
data fusion of colour, shape, and movement, with the aim to improve the robustness of the tracking
capabilities in the robotic system is now concluded at IRCS. And, new algorithms for 3D grasping,
making use of visual servoing techniques, to imp
rove the global response in the grasping control
execution are in progress at IRL.

Respecting to the catching proposal, addressed in this document, it is important to make some
considerations about the key aspects necessary to achieve a robust implementat

The vision/arm calibration methods. There exists different possibilities, from direct
mapping until more conventional one.

Techniques to improve the visual tracking capabilities of the vision system. Experimenting
with different cameras, and vision pr
ocessing hardware could be necessary. Test different
algorithms for improving object tracking.

Study in depth the catching path generation possibilities. The arm can either be given paths
in Cartesian space or joint space, with associated advantages and di
sadvantages to each.

In general, it will be necessary to achieve a good balance between the design requirements
(hardware and software) and the available technology to solve this very complex problem.

Finally, it is also remarkable that many open questio
ns persist in the human catching problem from
a Neurological perspective. Nevertheless, the continuous experiments and new theoretical
proposals in this field can be considered a very convenient source of inspiration to the artificial
counterpart, that is
to say, the robotic catching problem.



[Cannon & Slotine, 95]

M. Cannon and J.J.E. Slotine. Space
Frequency Localized Basis Function Networks for
Nonlinear System Estimation and Control, , Neurocomputing, 9(3), 1995.

[Hauck, 99]

A. Hauck,
M. Sorg, T. Schenk, and G. Färber.
What can be Learned from Human Reach
Movements for the Design of Robotic Hand
Eye Systems?. In Proc. IEEE Int. Conf. on Robotics and
Automation (ICRA'99), pages 2521
2526, May 1999.

[Hong, 95]

W. Hong. Robotic
Catching and Manipulation Using Active Vision, M.S. Thesis, Department of
Mechanical Engineering, MIT, September 1995.

[Hong & Slotine, 95]

W. Hong and J.J.E. Slotine.
Experiments in Hand
Eye Coordination Using Active Vision,
Proceedings of the Fourth In
ternational Symposium on Experimental Robotics, ISER'95, Stanford,
California, June 30
July 2, 1995.

[Kimura et al., 92]

H. Kimura, N. Mukai and J.J.E. Slotine. Adaptive Visual Tracking and Gaussian Network
Algorithms for Robotic Catching. DSC
Vol. 43, A
dvances in Robust and Nonlinear Control Systems,
Winter Annual Meeting of the ASME, Anaheim, CA, pp. 67
74, November 1992.

[Milner & Goodale, 98]

A. David Milner and Melvyn A. Goodale The Visual Brain in Action (Oxford Psychology
Series, No. 27). Oxford
University Press. 1998.

[Paillard & Beaubaton, 78]

Paillard, J., & Beaubaton, D. (1978). De la coordination visuo
motrice a
l'organisation de la saisie manuelle. In H. Hecaen & M. Jeannerod (Eds.), Du contro/e de /a motricite a
l'organisation du geste (pp
. 225

260). Paris: Masson.

[sanz et al., 98]

Sanz PJ, del Pobil AP, Iñesta JM, Recatalá G. “Vision
Guided Grasping of Unknown Objects for
Service Robots”. In
Proc. IEEE Intl. Conf. on Robotics and Automation
, 3018
3025, Leuven, Bélgica,

ider & Mishkin, 82]

Ungerleider, L.G. & Mishkin, M. (1982). Two cortical visual systems. In D.J.
Ingle, M.A. Goodale & R.J.W. Mansfield (Eds.). Analysis of visual behavior (pp. 549
586). Cambridge,
MA: MIT Press.

[Watanabe & Slotine, 95]

I. Watanabe and
J.J.E. Slotine. Stable real
time prediction of the trajectories of light objects
in air using wavelet networks, MIT
NSL 100195, 1995.


Tutis Vilis.
(McGill University). Departments of Physiology, Ophthalmology, Medical Biophysics and
Psychology. W
eb Course available at:


Vision and Touch Guided Manipulation Group. At the MIT Artificial Intelligence Lab & Nonlinear

Lab. Web available at:



Jacob L Driesen.
Web Course available at: http://www.driesen.com/index.h


MinERVA project on the web. http://www.lpr.ei.tum.de/research/rovi/minerva.html


A. “The Basic Eye Movements”

The present information can be found in the web page by Tutis Vilis [web
1, 2000].

There are only five types of

eye movements. Each serves a unique function and, has properties
particularly suited to that function.


If an image appears to the side, eye movements called saccades rotate both
eyes so that the image now falls on the fovea.


If one l
ook (i.e. direct the foveas) from a far object to a near one, vergence
eye movements are generated, convergence when looked from far to near and
divergence when looking from near to far.


When an object that we are looking at moves, the image is k
ept still on the
retina by means of a pursuit eye movement (e.g. tracking a ball or your moving finger).

Vestibular Ocular Reflex (VOR).

If we rotate our head, an eye movement very
similar to pursuit is elicited whose function is also to keep the image st
ill on the retina.
However, in spite of the fact that the movement looks similar, it is generated by a
different neural circuit, the VOR. The VOR does not need a visual stimulus. It works in
the dark. Indeed, if rotate your head with your eyes closed, thes
e will move.

Optokinetik Reflex (OKR)

The VOR does not work well for slow prolonged
movements. In this case vision, through the OKR, assist the VOR. The OKR is
activated when the image of the world slips on a large portion of the retina and produces
a se
nse of self motion (e.g. when sitting in a car stopped at a light and a car beside you
starts to move, you sometimes feel like you are moving).

Summarizing, the function and characteristics of the five eye movement types:

Old Systems
: Purpose to keep the

image in the background still on the entire retina when head
moves (e.g. reading signs while walking).


fast, has only 3 synapses

But decays during prolonged rotation (cupula adapts)

Calibrated by the cerebellar flocculus


helps the VOR during pro
longed rotations

Slow to get going, but that´s all right because the VOR is fast

Visual input projects via the n. optic tract & accessory optic system to the vestibular n.

Tested by a large full field visual stimulus (e.g. optokinetic drum)

Wich elicits th
e sensation of being rotated.

New Systems
: Purpose to bring the image of a selected object onto the fovea and keep it there.


are very fast and accurate. You have no voluntary control over their speed.


used to keep your fovea on a moving



Slow, needs visual feedback, requires many synapses.


points both foveae at a near target

Prevents diplopia (double vision)

Disconjugate (in opposite directions)


The Basic Brainstem Circuit for the VOR when the head turns to

the right.

Direct path of the VOR is a short tri
synaptic reflex
consisting of


vestibular n.(vn),


motoneurons (in the VI and III nuclei), and


muscles (lateral rectus, lr, and medial rectus, mr).

The direct path, by itself, is not enough. Why?

During the

head rotation there is a phasic response
seen in vestibular afferents (sense how fast you
rotate), which rotates the eyes to the left.

After the head rotation this phasic activity stops.

This presents a problem: the eye would drift back to
center because

the muscles need a tonic input to keep
the eye rotated.

Basic principles of generating horizontal saccades.

They are used to point your fovea quickly from
one object of interest to another.

The command for a saccade begins

in a structure
called the Paramadian Pontine Reticular
Formation; the PPRF

1.Burst neurons in the PPRF generate phasic
movement command which is proportional to

2.Tonic neurons in prepositus hypoglossi (PPH)
converts the phasic command to a ton
command this is like an integrator which
converts velocity to position

3.Motorneurons (MN's) combine phasic and tonic
commands this contracts muscles quickly rotates
the eyes (phasic component) & then holds (tonic
component) them there against the elas
restoring forces.

What are vergence eye movements?

The function of vergence eye movements is to point the
fovea of both eyes on a near object.

This means that each eye rotates in the opposite direction
(disconjugate): the right eye to the left a
nd the left eye to
the right when converging on a near object.

All other eye movements are conjugate.