MIN-Fakultät
Fachbereich Informatik
Arbeitsbereich Wissens- und Sprachverarbeitung

64-479 Oberseminar: Wissens- und Sprachverarbeitung und Natürlichsprachliche Systeme
Sommersemester 2010

Veranstalter
Christopher Habel, Carola Eschenbach, Wolfgang Menzel
Zeit/Ort
Di 16-18 F-534
Inhalt
Vorträge über Vorhaben und Ergebnisse von Bachelor-, Master- und Diplomarbeiten, laufenden Dissertationen und Drittmittelprojekten sowie von anderen Forschungsarbeiten aus dem Bereich der Wissens- und Sprachverarbeitung. Insbesondere wird der interdisziplinäre Charakter des Forschungsschwerpunktes berücksichtigt, d.h. die Integration von Ansätzen der Informatik, Linguistik, Logik und Psychologie steht im Vordergrund der Arbeiten.
Termine
13.04.2010 Kris Lohmann und Matthias Kerzel
Generating Verbal Assistance in Tactile-Map Explorations
Tactile maps offer access to spatial-analog information for visually impaired people. In contrast to visual maps, a tactile map has a lower resolution and can only be inspected in a sequential way, complicating the extraction of spatial relations among distant map entities. Verbal assistance can help to overcome these difficulties by substituting textual labels with verbal descriptions and offering propositional knowledge about spatial relations. Like visual maps, tactile maps are based on visual, spatial-analog representations that need to be reasoned about in order to generate verbal assistance. In this talk, we present an approach towards verbally assisting virtual-environment tactile map (VAVETaM) realized on a computer system utilizing a haptic force-feedback device. We present our current research on understanding the user's Map Exploratory Procedures (MEPs), exploiting the spatial-analog map to anticipate the user's informational needs, reasoning about optimal assistance by taking assumed prior knowledge of the user into account and generating appropriate verbal instructions and descriptions to augment the map.
20.04.2010 Jörg Didakowski (Berlin-Brandenburgische Akademie der Wissenschaften)
Finite-State Weighted Constraint Dependency Grammar - Syntactic Dependency Parsing in Linear Time
In the parsing system Weighted Constraint Dependency Grammar (WCDG) syntactic dependency parsing is formulated as Constraint Optimization Problem (COP) where well-formedness rules are written as defeasible constraints enabling partial parsing, structural preferences and degrees of grammaticality. Many algorithms are explored in order to solve such COPs efficiently. Despite substantial improvements e.g. by using transformation based techniques the running time of parsing remains a big problem. In my presentation Finite-State Machines (FSM) are proposed to represent and solve such COPs within an extended finite-state approach in order to tackle this drawback. Solving a COP is known to be NP-hard in general. To avoid such a bad behavior a problem decomposition technique is worked out where a COP is decomposed tree-like and solved bottom-up and where different potential decompositions of the problem can be handled in parallel. Together with a bounded depth of the decomposition's tree structure the parsing can actually be performed in linear time and space. I will show that solving COPs by means of FSMs is a promising alternative to improve parsing run-time.
27.04.2010 Christine Upadek
Ähnlichkeitsvergleich von Texten mithilfe der Linkstruktur der Wikipedia
Verfahren, die inhaltliche Ähnlichkeit von Texten berechnen, können für verschiedene Aufgaben eingesetzt werden. Dazu gehören unter anderem Textkategorisierung, Paraphrasenerkennung und das Vorschlagen ähnlicher Texte zu einem gegebenen.
Dieser Vortrag gibt einen Überblick über verschiedene Verfahren, die Textähnlichkeit berechnen. Vorgestellt werden Verfahren aus dem Information Retrieval und wissensbasierte Verfahren, die Taxonomien aus lexikalischen Ressourcen wie WordNet oder GermaNet nutzen. Zudem werden verschiedene Verfahren vorgestellt, die für den Ähnlichkeitsvergleich die Wikipedia verwenden. Hierfür wird auf den Aufbau der deutschen Wikipedia eingegangen und am Beispiel des Programms "Wikipedia Preprocessor" gezeigt, wie die benötigten Informationen aus der deutschen Wikipedia extrahiert werden können. Außerdem werden ein eigenes Verfahren, das Artikel, Kategorien und Links zwischen Seiten der Wikipedia benutzt, um Textähnlichkeit zu berechnen, und die Ergebnisse der durchgeführten Evaluation vorgestellt.
04.05.2010 Stefan Zimmermann
Wissensmanagement mit Wikis - Optimierungsprozesse
Wissensmanagement ist in den letzten Jahren in aller Munde. Ein neuer struktureller Wandel führt die Unternehmenswelt hin zu einer Informations- und Wissensgesellschaft. Die Erzeugung von Wissen, seine Verbreitung, Nutzung und Wiederverwendung bekommt immer mehr Gewicht, da sie für die Aufrechterhaltung der Wettbewerbsfähigkeit unumgänglich geworden sind. Es stellt sich die Frage, wie Unternehmen an diesen neuen Aspekt des Wissensmanagements herangehen und ihn in die bestehenden Geschäftsprozesse eingliedern. Was kann ein Unternehmen tun, um das vorhandene Wissen der Mitarbeiter zu benutzen? Wodurch sind Mitarbeiter angeregt, das Wissen zu verteilen? Wie kann es gelingen, dass es optimal genutzt wird, indem es Verbreitung und Wiederverwendung findet? Wie schafft es ein Unternehmen, das Wissen zu verwalten und dessen Existenz greifbar zu machen?
Die vorliegende Diplomarbeit entstand in Kooperation mit einem mittelständischen Softwareunternehmen. In der Arbeit wird analysiert, wie das Unternehmen die oben genannten Fragen beantwortet. Der Leser soll einen Einblick bekommen, welche Möglichkeiten diesbezüglich bestehen und wie das Unternehmen den Umgang mit Wissen bewältigt. Dazu wurden einzelne Elemente des Wissensmanagements analysiert, um herauszufinden, wo sich das Unternehmen noch verbessern kann. Das Social-Networking-Prinzip der Wikis macht im Unternehmen einen entscheidenden Teil des Wissensmanagement-Prozesses aus. Es wurde geprüft, inwieweit diese Methode im Wissensmanagement Anwendung finden kann und ob es noch Verbesserungspotential gibt. Dazu sind konkrete Maßnahmen vorgestellt und miteinander verglichen worden.
10.05.2010, 17 ct, B-201 Sondertermin im Rahmen des Fachbereichskolloquiums
Maite Taboada
A lexicon-based approach to sentiment analysis
Sentiment analysis is the automatic extraction of information about opinion and subjectivity from text and speech. In this talk, I describe our current research in sentiment analysis. The Semantic Orientation CALculator (SO-CAL) uses dictionaries of words annotated with their semantic orientation (polarity and strength), and incorporates intensification and negation. I describe the process of dictionary creation, and our use of Mechanical Turk to check dictionaries for consistency and reliability. SO-CAL is applied to the polarity classification task, the process of assigning a positive or negative label to a text that captures the text's opinion towards its main subject matter. I show that SO-CAL's performance is consistent across domains and in completely unseen data. I also describe current research on using discourse information to improve performance.
18.05.2010, F-334 Gemeinsamer Vortrag im Rahmen der Oberseminare von TAMS, WSV und NatS
Prof. Xiaoyan Zhu, State Key Lab of Intelligent Technology and Systems, Tsinghua University Beijing, China
Q & A Based Internet Information Acquisition
The Internet, providing the largest databases and encyclopedias of the information society, has outweighed all other media as a source of information. Therefore, the way to access the internet to acquire appropriate information has become more and more important. As a promising information acquisition technique, question answering is very active in information retrieval and natural language processing research. This talk introduces recent progress on a question answering system and its role in internet information acquisition, addresses the main topics in this area, and focuses on two main issues: text mining and summarization.
Professor Xiaoyan ZHU got her bachelor degree at the University of Science and Technology Beijing in 1982, her master degree at Kobe University in 1987, and her Ph. D. degree at the Nagoya Institute of Technology, Japan in 1990. She has been teaching at Tsinghua University since 1993. She is deputy Head of the state key lab of intelligent technology and systems, director of the Tsinghua- HP Joint research center and director of the Tsinghua-Waterloo Joint research center, Tsinghua University. She is International Research Chair holder of IDRC, Canada, since 2009. She was deputy head of the Department of Computer Science and Technology, Tsinghua University from 2004-2007. Her research interests include intelligent information processing, machine learning, natural language processing, query and answering systems and bioinformatics. She has authored more than 100 peer-reviewed articles at leading international conferences (SIGKDD, ICDM, PAKDD, CIKM, ACL, APBC) and in journals (Int. J. Medical Informatics, Bioinformatics, BMC Bioinformatics, Genome Biology and IEEE Trans. on SMC).
01.06.2010 Isabelle Streicher
Semantic processing of local and directional verbal modifiers in route instructions
The verbal inventory of route instructions mainly consists of position and motion verbs. Position and motion verbs complement with local and directional arguments respectively. The syntactic driven, compositional combination of the verbs' and the arguments' meaning is unambiguous, cause lexically fixed.
Besides obligatory arguments a verb may combine with optional modifiers additionally. In contrast to semantic verb-arguments-combination, semantic verb-modifier-combination can't be fixed lexically and is therefore less obvious to achieve. In formal semantics, a standard way of verbal modifier handling is Davidson's approach of situation modification.
Since Davidson's approach fails in numerous cases of local and directional modified verbal phrases in route instructions, I headed for an alternative, systematic, domain specific handling of the problem during my master thesis. My results will be presented and put up for discussion in this talk.
15.06.2010 Niels Beuck
Anticipatory Incremental Parsing in Multi-Modal Context - Partial Analyses and Local Ambiguities
In human-robot and human-computer interaction, natural language processing systems are confronted with real-time dialog situated in extra-lingual context. These situations provide very different challenges compared with the classical processing of digital text like in text mining. On the one hand, there is no need for a very high throughput, as utterances do not need to be processed much faster than they are produced by the human interlocutor, but on the other hand processing needs to start before the end of production of the utterance to prevent unnecessary pauses. Furthermore, connections to the context and feedback opportunities like eye movement need to be evaluated at each point during the processing, to provide a fluid and natural dialog experience for the human. To meet these requirements, an incremental dialog system needs to be able to provide partial analyses for partial language input to other modules like visual processing and action planning, and also to integrate input from those modules at each point in time.
In this talk I will present the current state of my dissertation project. The goal of my project is to design an incremental natural language processing system that generates meaningful partial analyses and provides an interface for the interaction with other processing modules. I will give an overview over the challenges in incremental NLP, especially different strategies to deal with local ambiguities and how to provide and evaluate partial analyses.
22.06.2010 Jan Christian Krause
Using Thematic Grids to Document Web Service Operations
Web Services are frequently used for system integration in business contexts, e.g. workflow automation. Therefore their documentation is required to be precise and complete. This talk discusses state-of-the-art approaches to document Web Services and provides examples of the negative impact a lack of construction guidelines has on the documentation of what the service does. A verb-focused approach to documentation construction is presented which is based on the linguistic concept of thematic roles and grids. It is complemented by an empirical study, showing that the prerequisites of the approach are satisfied. Finally a concept for a future application of the described approach in the area of Web Service Orchestration is presented, whose development is the goal of my dissertation project.
29.06.2010 Kris Lohmann und Matthias Kerzel
Verbal Assistance in Tactile-Map Explorations
Tactile maps are a means to communicate spatial knowledge providing access to spatial representations of knowledge for visually impaired people. However, compared to visual maps, tactile maps have some major drawbacks concerning the integration of information due to the need of sequential exploration. Verbal descriptions providing abstract propositional knowledge have an advantageous effect on tactile map reading. They can be used to communicate knowledge that on a visual map is usually realized in the form of textual labels. Further, verbal assistance can facilitate the acquisition of global spatial knowledge such as spatial relations of streets and support the tactile-map user by assisting exploration, for example, by giving information about landmarks next to a street. We present an approach towards a verbally assisting virtual-environment tactile map (VAVETaM), which provides a multimodal map, computing situated verbal assistance by categorizing the user's exploration movements in semantic categories called MEPs. Three types of verbal assistances are discussed. VAVETaM is realized using a computer system and the PHANToM® desktop haptic force-feedback device, which allows haptic exploration of 3D-graphics-like haptic scenarios.
02.07.2010, 14 ct, F-334 Sondertermin am Freitag
Torsten Hahmann
Contact algebras for verification and design of mereotopologies
Building and verifying mereotopological theories which capture mereological (parthood) and topological (connection) relations between regions of space plays a central role in Qualitative Spatial Reasoning. We treat a large classe of common mereotopologies (those with unique closures) as contact algebras defined over bounded lattices equipped with a unary operation of complementation. This algebraic framework extends previous works twofold: (1) it includes non-distributive contact algebras, in particular the so-called Stonian p-ortholattices, and (2) it allows establishing necessary conditions of what constitutes a contact algebra that admits a spatial interpretation. Moreover, different ontological choices are directly related to algebraic properties. It turns out that spatially representable contact algebras fix many of these properties while only few real choices remain. Thus, the framework allows us to extract three weakest, possibly spatially representable, mereotopologies.
In the second part of the talk, we give an equational axiomatization of the (non-distributive) Stonian p-ortholattices and show that the theory - and thus the original theory - exhibits unintended models. Approaches to extend the axiomatization in order to eliminate unintended models are presented. Other benefits of the equational axiomatization are significant speed-ups for some model construction and theorem proving tasks using standard theorem provers. In the broader scope, our work demonstrates how mathematical representations of ontologies can help us fully understand, verify, and design ontologies.
06.07.2010, 16:15, F-334 Sondertermin: Disputationsvortrag
Patrick McCrae
A Computational Model for the Influence of Cross-Modal Context upon Syntactic Parsing
Ambiguity is an inherent property of natural language. Despite the high frequency and diversity with which ambiguity occurs in unrestricted natural language, most ambiguities in inter-human communication pass unnoticed, mainly because human cognition automatically and unconsciously works to resolve ambiguity. A central contribution to this automatic and unconscious disambiguation is the integration of non-linguistic information from cognitively readily available sources such as world knowledge, discourse context and visual scene context.
In this talk I present a cognitively motivated computational model for the cross-modal influence of visual scene context upon natural language understanding. In line with Jackendoff's Conceptual Semantics, I argue for a model that employs semantic mediation to establish cross-modal referential links between words in the linguistic input and asserted entities in a representation of visual scene context (context model). The proposed framework assigns cross-modal referential links on the basis of conceptual compatibility between the concepts activated in the linguistic modality and the concepts instantiated in the context model. The implementation of the model centres around WCDG2, a symbolic weighted-constraint dependency parser for German. Situation-invariant semantic knowledge, including semantic lexical knowledge and world knowledge, is encoded in an OWL ontology (T-Box). The situation-specific visual scene information in the context models is represented in terms of concept instantiations from the ontology joined by thematic relations (A-Box). A predictor component computes acceptability scores for the assignment of semantic dependencies in the linguistic input given a representation of visual scene context.
In addition to the model motivation and specification, I report experiments that demonstrate the effectiveness of the framework. The experimental findings show that the model successfully integrates visual context information to effect syntactic disambiguation in notoriously hard-to-parse cases such as PP attachment, subject-object ambiguity of German plural nouns and genitive-dative ambiguity of German feminine nouns.
13.07.2010 Ogeigha Koroyin
The Instruction of Artificial Agents via Controlled Language
Human Agents give and process instructions in natural languages relatively effortlessly, but often have difficulties - especially if they are non-specialists - using formal languages. Although, on the other hand, natural languages are very expressive and highly flexible, they possess very high degrees of ambiguity and complexity which restrict automatic reasoning severely. Therefore, the formulation of route instructions to Artificial Agents, for example the Geometric Agent, with the aid of a natural language is indeed intuitive for non-expert users but is a large stumbling block for automatic processing.
In order to support a notation that is intuitive for users and simultaneously automatically processable for computers, controlled languages have been designed for diverse domains. In my thesis, I design a Controlled Language (CERI: Controlled English for Route Instructions) that may be used to express route instructions for the Geometric Agent.