|
Natural cognitive systems-like humans-profit from combining the input of the different
sensory systems not only because each modality provides information about different aspects
of the world but also because the different senses can jointly encode particular aspects of
events, e.g. the location or meaning of an event. However, the gains of crossmodal
integration come at a cost: since each modality uses very specific representations,
information needs to be transferred into a code that allows the different senses to
interact. Corresponding problems arise in human communication when information about one
topic is expressed using combinations of different formats such as written or spoken
language and graphics.
In this lecture, we will focus on models and methods suitable to
realize processes and representations for cross-modal interactions in artificial cognitive
systems, i.e. computational systems. After introducing the core phenomena of cross-modal
interaction we exemplify the mono- modal basis of cross-modal interaction and the current
development of informatics-oriented research in this field with four topics:
-
Cross modal information fusion for a range of non-sensory, i.e. categorial data in
the area of speech and language processing, where visual stimuli have to be
merged with the available acoustic evidence. Among the language-related
information sources certainly lip reading provides one of the major contributions of
additional evidence, but more recently eyebrow movement and its relationship to
suprasegmental features of human speech has attracted considerable attention as
well.
-
The interaction of representational modalities - aslanguage and maps - in the
interdependence to sensory modalities, in particular, to vision, auditory perception
and haptics. The computational analysis of multi-modal documents or dialogues is a
prerequisite for advanced intelligent information systems as well as for humancomputer
interaction, in particular human-robot interaction. Furthermore, such
computational devices can be used in systems giving assistance to impaired, e.g.
blind or visually impaired, or deaf people.
-
Multimodal memory plays an important role for the next generation of mobile robots
and service robots. Using grounded memories of robot actions - use real-world
visual, audio and tactile data collected by the robot - instead of solely a
sensorimotor controller, the robot's memory can be enriched and thus robustness of
both representations and the retrieval process of autonomous agents will increase.
-
Neural architectures for multiple modalities. The brain plays the central role in all
animal or human behaviour. The integration of various kinds of sense information
with cognitive processing in neural architectures is therefore particularly relevant.
Examples of
computational neural architectures are described, from spiking neural networks to
supervised and self-organizing artificial neural networks based on midbrain and
cortical brain areas. The focus will be on auditory and visual modalities illustrated
by some examples of robotic behaviour.
|