MIN-Fakultät
Fachbereich Informatik
Arbeitsbereich Wissens- und Sprachverarbeitung

64-631 Vorlesung: CINACS-Ringvorlesung
Wintersemester 2010/11

Veranstalter
Christopher Habel, Wolfgang Menzel, Stefan Wermter, Jianwei Zhang
Zeit/Ort
Mo 14-16 D-125
Inhalt
Natural cognitive systems-like humans-profit from combining the input of the different sensory systems not only because each modality provides information about different aspects of the world but also because the different senses can jointly encode particular aspects of events, e.g. the location or meaning of an event. However, the gains of crossmodal integration come at a cost: since each modality uses very specific representations, information needs to be transferred into a code that allows the different senses to interact. Corresponding problems arise in human communication when information about one topic is expressed using combinations of different formats such as written or spoken language and graphics.
In this lecture, we will focus on models and methods suitable to realize processes and representations for cross-modal interactions in artificial cognitive systems, i.e. computational systems. After introducing the core phenomena of cross-modal interaction we exemplify the mono- modal basis of cross-modal interaction and the current development of informatics-oriented research in this field with four topics:
  • Cross modal information fusion for a range of non-sensory, i.e. categorial data in the area of speech and language processing, where visual stimuli have to be merged with the available acoustic evidence. Among the language-related information sources certainly lip reading provides one of the major contributions of additional evidence, but more recently eyebrow movement and its relationship to suprasegmental features of human speech has attracted considerable attention as well.
  • The interaction of representational modalities - aslanguage and maps - in the interdependence to sensory modalities, in particular, to vision, auditory perception and haptics. The computational analysis of multi-modal documents or dialogues is a prerequisite for advanced intelligent information systems as well as for humancomputer interaction, in particular human-robot interaction. Furthermore, such computational devices can be used in systems giving assistance to impaired, e.g. blind or visually impaired, or deaf people.
  • Multimodal memory plays an important role for the next generation of mobile robots and service robots. Using grounded memories of robot actions - use real-world visual, audio and tactile data collected by the robot - instead of solely a sensorimotor controller, the robot's memory can be enriched and thus robustness of both representations and the retrieval process of autonomous agents will increase.
  • Neural architectures for multiple modalities. The brain plays the central role in all animal or human behaviour. The integration of various kinds of sense information with cognitive processing in neural architectures is therefore particularly relevant. Examples of computational neural architectures are described, from spiking neural networks to supervised and self-organizing artificial neural networks based on midbrain and cortical brain areas. The focus will be on auditory and visual modalities illustrated by some examples of robotic behaviour.