Visually guided grasping robot MIRA
Abstract.
A mobile robot which is capable of identifying and grasping a fruit on a table,
taking into account the fruit's position.
The behaviour is guided by verbal instruction.
It focuses on the practical use of visual recognition and navigation
in a continuous world which is easy for humans
yet traditionally difficult for machine intelligence.
Core components for docking: neural network.
The figure shows the neural network for the visually guided docking manoeuvre.
Blue connections represent trained weights, the light blue ones were only used
during training, while only the dark blue connections are used for performance.
The dark rectangles show neural activations (green if active) on the
corresponding layers.
Core component 1: lower visual system.
The bottom-up recognition weights W
and their feedback counterpart
were trained using a sparse and topographic Helmholtz machine framework
[Weber, C. (2001) Self-Organization of Orientation
Maps, Lateral Connections, and Dynamic Receptive Fields in the Primary Visual
Cortex
(PS|PDF)].
As a result of training on real world images they have become feature
detectors: many neurons on the what area detect localised edges in
the image, some are color selective
(W as GIF).
Core component 2: what - where association of target object.
The lateral weights V within and between the what
and the where areas were trained as a continuous associator network
[Weber, C. and Wermter, S. (2003)
Object Localization using Laterally Connected "What" and "Where" Associator
Networks
(PS|PDF)].
They associate the what neural activations (which contain information of
the target fruit and the background) with the where location.
During training, the where location of the fruit was given as a Gaussian
activation hill at the location corresponding to the location within the image
(both, image- and where-area are the same size, 24x16 units).
For performance, the where activations are first unknown and initialised
to zero, but through pattern completion via the V
weights the (hopefully) correct location will emerge on the where area
as a Gaussian hill of activation.
Core component 3: reinforcement-trained motor strategy.
The reinforcement-trained weights R drive the robot
until the fruit target is perceived right in front of its grippers
[Weber, C. and Wermter, S. and Zochios, A. (2003)
Robot Docking with Neural Vision and Reinforcement
(PS|PDF)].
The actual state of the robot is fully given by the visually perceived
location of the target on the where area
and by the robot angle w.r.t. the table from which it has to grasp
the fruit
(note that it must arrive perpendicularly to the table so that it doesn't
hit it with its sides).
Both these inputs are first expanded on a state space where one neuron
(and its immediate neighbours) represents the state.
The weights to the critic allow every state to be evaluated
(it is better if the target will soon be reached).
The weights R to the robot motors
(forward, backward, left_turn, right_turn)
drive the robot to states which have a better value,
i.e. closer to the goal
(mpeg
video).
Embedding behaviours into a demo: writing policies with Miro.
The following image shows a primitive
Miro
policy in which one after the other "action pattern" (boxes in the image)
is activated in a series.
Within each action pattern can be several bahaviours (grey).
The above described visually guided docking is in the "NNsimBehaviour"
inside the "Docking" action pattern.
Behaviours can be parameterized and thus reused in different variations.
The sequence of events is:
SpeechRecog:
do transition OffSpeechRecognition
if any of the words "GET" or "ORANGE" has been recognised.
GotoTable:
move forward until the infrared table sensors sense a table,
and then do transition DoDockingTransition.
Docking:
now hopefully the orange target is within the camera field.
Do the neurally defined behaviour trained as described above.
Do transition OffNNsim if the orange is perceived at
front middle (defined on the where area) for 5 consecutive iterations.
Grasping:
close gripper, lift gripper,
do transition LeaveTableTransition.
LeaveTable:
go backward a few centimeters,
do transition OffStraightLimit.
Turn:
turn 180 degrees, do transition OffStraightLimit.
Forward:
go forward half a meter,
do transition OffStraightLimit.
SpeechRec2:
do transition OffSpeechRecognition
if any of the words "OPEN" or "HAND" has been recognised.
Grasping:
open gripper (to release the orange),
do transition LeaveTableTransition.
Empty: the end.
However, we had added another two action patterns. In
SpeechRec3 the robot recognised any of the words
"THANK" and "YOU" and then made a transition to
SpeechGeneration at which it said
"YOU ARE WELCOME. THAT WAS EASY FOR A ROBOT LIKE ME".
Running the demo and implementation.
Four xterms (GIF)
are opened to run the demo.
Three of them start services on the robot: the video service for the camera,
the speech service for
speech recognition (sphinx) and
production (festival)
and finally the robotBase service for all the rest: motors, range- and table
sensors, gripper, etc.
The fourth xterm starts the BehaviourEngine which loads the Policy.xml file.
All behaviours are compiled into this program.
The directory structure (GIF)
has the following:
the Policy.xml file,
the Engine and Factory directories which contain the executable
BehaviourEngine file
and a BehavioursFactory to collect the implemented behaviours,
one directory for every behaviour
and a directory "SphinxTGplusMB" for the SphinxSpeech
speech service.
The other services (video and robotBase) resided in the Miro directory.
The demo with audience.
The demo won us
(Cornelius Weber,
Mark Elshaw,
Alex Zochios,
Chris Rowan,
all members of the
HIS centre
led by
Stefan Wermter)
the
MI-prize
competition at the AI-2003
in Cambridge.
See a JPEG,
or more,
from stage or a post-prize-winning
video
(7.3 MB).
Acknowledgements
This work was made possible through the MirrorBot
project.