A SPOKEN SWEDISH E-MAIL INTERFACE

Björn Gambäck, Maria Cheadle, Preben Hansen, Fredrik Olsson, Magnus Sahlgren
SICS, Swedish Institute of Computer Science AB; Box 1263, S-164 29 Kista, Sweden
dumas@sics.se

INTRODUCTION

The paper describes the Swedish involvement in the EU project DUMAS (Dynamic Universal Mobility for Adaptive Speech Interfaces), a project which aims at developing multilingual speech-based applications, and more specifically, investigating adaptive multilingual interaction techniques to handle both spoken and text input and to provide coordinated linguistic responses to the user. The project has a clear focus on Northern Europe with two of the eight partners coming from Sweden and four from Finland; and the languages we aim at treating are English, Swedish and Finnish. We will construct an agent-based generic framework for multilingual speech applications, supporting adaptivity to both the individual user and the particular domain. Applications based on the general architecture will benefit from the advantages of fault-tolerant semantic analysis, which combined with the dialogue management routines will handle user interaction in a very robust manner. As an initial such application, we are building a mobile phone-based e-mail interface that will deal with multilingual issues in several forms and environments, and whose functionality can be adapted to different users, different situations and tasks. Such a system produces speech output only (in the form of spoken responses and read e-mails) to the user, but gets two types of input: user speech and textual e-mail messages. It must be able to distinguish between languages, both in e-mails and in the user utterances. The contents of a user's inbox must be continuously analysed in order to enable advanced search functions.

DATA COLLECTION

In order to collect data that as closely as possible resembles what a dialogue-based speech e-mail interface will actually be required to handle, we have performed Wizard-of-Oz experiments using a multi-session group scenario (Kanto et al.2003), specially designed for observing long term user developments. We collected 6 hours of recordings consisting of 63 spoken telephone interactions between a group of participants using a simulated speech-based e-mail application. The recordings were transcribed into Annotation Graph (Bird and Lieberman1999) format and parsed. The dialogues were tagged with speech act tags from a domain specific dialogue act taxonomy. The data collected will be used to guide the development of the user and dialogue modelling components and, more specifically at SICS, to provide information on expected vocabulary and language use, for construction of the language understanding components.

LANGUAGE UNDERSTANDING

To process the type of language found in e-mails and speech we need a robust methodology for syntactic parsing, sense annotation and semantic analysis. The fault-tolerant functional dependency grammar parsers from Connexor Oy (Tapanainen1999) provide morphological and dependency syntactic analysis. The parser output will be enriched with lexical sense information. For Swedish, the main part of the semantic lexical database will rely on the Random Indexing vector-space methodology described below. For user commands we will in addition adopt a strategy similar to Buitelaar's (1998) underspecified approach to semantic tagging. We will then employ two different ways of semantic interpretation, one working directly on the parser output, and another which robustly tries to match the syntactic structure to semantic templates. When including semantic information directly in the dependency grammar structures, we will rely on the (underspecified) lexical semantic information combined with functional application rules defining the ways to pass the information in the branching structure. The semantic templates are, on the other hand, akin to the slots found in McCord's (1990) Slot Grammar. The slot information declares with which other structures its slots may or must be filled in order for it to be a valid structure. A slot structure can be partially or fully instantiated and it can be filled with representations from one or more statements to incrementally build the meaning of a statement.

CONTENT-BASED ADAPTIVITY

One of the adaptive techniques used in the DUMAS project is a vector-space methodology called Random Indexing (Kanerva et al.2000), which uses distributed representations to accumulate context vectors for words based on co-occurrence statistics. The context vectors will be used in the project for several different applications. One application is automatic lexicon extraction, which is performed by exploiting the fact that the context vectors can be used to calculate similarity between words. Another application is content-based dialogue act tagging, which is done by matching utterance vectors to a set of learned dialogue act class vectors, in the style of Jokinen et al. (2001). A third application is content-based user modelling, which we achieve by creating profile vectors (for the user's e-mail folders, interests and linguistic behavior) that are used in classification tasks, such as suggesting which e-mails that might be of particular interest to the user, and in which folders they might be stored.

INFORMATION REFINEMENT

By the term information refinement, the process is referred to in which text is handled with the aim of accessing the pieces of content that are relevant from a certain perspective (Olsson2002). In AthosMail information refinement techniques such as information extraction, information retrieval, and automatic text summarisation will be used to facilitate advanced searching within users' mailboxes. Information extraction will be used to recognize named entities in each e-mail message, i.e., names of persons, organizations and locations, which will be assigned to the e-mail as keywords. A text summarisation module will be implemented and used to generate a short summary of each message. Furthermore, an information retrieval system currently under development at SICS will allow for full text search of each mailbox.

Bibliography

Bird, S. and Lieberman, M.: 1999, Annotation graphs as a framework for multidimensional linguistic data analysis, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL, Madrid, Spain.
Workshop ``Towards Standards and Tools for Discourse Tagging''.

Buitelaar, P.: 1998, CoreLex: Systematic Polysemy and Underspecification, PhDThesis, Brandeis University, Dept. of Computer Science, Waltham, Massachusetts.

Jokinen, K., Hurtig, T., Hynnä, K., Kanto, K., Kaipainen, M. and Kerminen, A.: 2001, Self-organizing dialogue management, Proceedings of the 2nd Workshop on Neural Networks and Natural Language Processing, Natural Language Processing Pacific Rim Symposium, Tokyo, Japan.

Kanerva, P., Kristofersson, J. and Holst, A.: 2000, Random indexing of text samples for latent semantic analysis, Proceedings of the 22nd Annual Meeting of the Cognitive Science Society, Lawrence Erlbaum Associates, Mahwah, New Jersey.

Kanto, K., Cheadle, M., Gambäck, B., Hansen, P., Jokinen, K., Keränen, H. and Rissanen, J.: 2003, Multi-session group scenarios for speech interface design, Proceedings of the 10th International Conference on Human-Computer Interaction, Crete, Greece.
(to appear).

McCord, M. C.: 1990, Slot Grammar: A system for simpler construction of practical natural language grammars, Research report, IBM Thomas J. Watson Research Center, Yorktown Heights, New York.

Olsson, F.: 2002, Requirements and design considerations for an open and general architecture for information refinement, PhLic Thesis, Uppsala University, Department of Linguistics, Uppsala, Sweden.

Tapanainen, P.: 1999, Parsing in Two Frameworks: Finite-State and Functional Dependency Grammar, PhD Thesis, University of Helsinki, Dept. of General Linguistics, Helsinki, Finland.