Objectives

The DUMAS (Dynamic Universal Mobility for Adaptive Speech Interfaces) project will concentrate on intelligent and ambient interaction management, with a special emphasis on new adaptive interaction methods. The goal of DUMAS is to provide a robust framework for application development and to develop multilingual conversational applications that integrate natural communication modes with mobile devices and information sources. DUMAS will concentrate on spoken language interaction, taking into account the users' personalised needs and requirements for flexible interaction, and being able to learn from previous interactions to optimise the communication.

Recent advances in human language technology have made spoken-dialogue systems a commercial possibility which can be used in several interactive applications. The state of the art speech technology is already on such a high level of accuracy and precision that the users can dictate text or execute simple control commands to direct system operations, and also have short conversations with the system to search for information or to complete well-defined tasks like hotel room booking. However, current speech-based applications, interaction techniques and application development architectures still lack many features necessary for natural multilingual interaction. The three main areas where the current technology falls short and needs improvement are:

  1. Incapability to process structured text with different presentation formats and with different languages: e.g. electronic text processing requires new facilities that can cope with various different formats like tables, lists, URLs and email addresses, possibly in different languages while speech interface requires intelligent error handling and compensation of the possible misunderstandings.

  2. Limited conversational abilities: present-day dialogue systems designed for various information services (train schedule, hotel information) do not satisfactorily cope with requests that are syntactically incomplete and/or semantically creative, or with requests that refer to information that results from previous requests within the same interaction situation ("go to the previous paragraph", "try Brussels instead").

  3. Limited user models: incapability to adapt to the user's personalised needs and to take into consideration previous interactions with the user, or to learn the user's profile through repeated interactions with him/her.

The main ambitious goal in the DUMAS project is to furnish electronic systems with intelligent spoken interaction capabilities. We will not develop basic speech technology such as speech synthesis and recognition, but focus on the dialogue level problems. Especially, we will investigate adaptive multilingual interaction techniques to handle both spoken and text input and to provide coordinated linguistic responses to the user. Moreover, future communication with electronic systems requires dynamic and adaptive capabilities: systems that can learn through interaction and adapt their behaviour to different users and different situations, as opposed to the current state of affairs where the systems require the user's adaptation to the system.

For that purpose, we will investigate interaction techniques for multilingual communication, involving contributions at the level of intelligent text parsing and summarization, dialogue management and output presentation. We will construct Athos1, a generic and modular framework for multilingual speech-based application development. In particular, Athos will use the highest-quality ASR and TTS, and also make use of state of the art application generator and cutting edge software. The architecture will be based on the notion of an interaction agent, which is a software agent type component of the system, designed to handle various tasks in the human-computer interaction. The architecture will also be hybrid, utilising components based on different computational paradigms, such as symbolic computation, statistical methods and neural networks.

Based on the generic framework and various interaction techniques, we will construct AthosMail, an e-mail application that will deal with multilingual issues in several forms and environments, and whose functionality can be adapted to different users, different situations and tasks. To ensure that the implemented methods are general with wide coverage, applicability of the Athos framework will be investigated on other applications, such as speech-based document retrieval, speech user interfaces for SMS messages, text-television and radio stations, and applications for disabled people.

Below we give an overview of the research problems that are important in constructing the Athos architecture.

Framework

Athos is heavily based on agents, which are used in a novel way. Instead of several monolithic agents there are many specialized agents for different situations and also alternative agents for the same situation. In this way it is possible to build a generic architecture that supports maximal adaptability to different user needs, languages, communication strategies, and input and output customs. The modular architecture uses a shared knowledge and distributed component structure that make the system highly adaptive. Interesting research questions in this area focus on the coordination of different system components and information management. Robustness and scalability to the wide range of applications are also issues to be considered.

Text parsing

The main part of our work will be done in the domain of electronic mail. Since e-mail messages can contain any mixture of different languages in unpredictable combinations, this offers a rich and challenging domain for investigating various multilingual interaction and information processing aspects. Especially in the many multilingual countries of Europe, e-mail messages often contain several languages. We need robust text-parsing methods to analyse messages into a form that is suitable for multilingual speech communication. Current text-parsing methods are targeted mainly at well-formed presentation such as technical documents, etc. To analyse (often syntactically ill-formed) e-mail messages efficiently, Athos will utilize fast and fault-tolerant functional dependency parsing methods combined with template-based underspecified semantic interpretation.

Presentation techniques

To present written messages using synthesised speech we need to develop innovative presentation methods. Most of the previous research has focused on what information should be presented to the user, but equally important is the question of how this information should be presented. Athos uses concepts called presentation agents to implement novel presentation techniques that make system output more intelligible and pleasant for the user. We will investigate how complex structural elements such as tables and URLs should be presented to the user as well as the use of prosody in order to produce natural sounding output for speech applications.

Input handling techniques

E-mail is a challenging domain for speech input handling, since the interaction with multilingual messages often raises needs to handle multilingual inputs even within a single user utterance that may contain names and addresses in different languages. Another important research issue with speech inputs is error management. Current error handling methods are not robust enough, and in the case of multilingual inputs we need to develop new, more robust methods for speech-based communication. Athos uses input agents together with dialogue and user information to construct software components which can handle speech inputs using high-level interaction techniques and be able to control devices such as speech recognizer in a timely manner.

Dialogue management techniques

Different languages and cultures have very different communication strategies, shown clearly in spoken interaction. In order to support multilingual spoken language communication, we will implement a dialogue model that is able to support different communication strategies. For example, mixed-initiative and system-initiative dialogue handling strategies are found to have different kinds of benefits and drawbacks. The dialogue manager must also be able to trace and coordinate different topics of a conversation, and be able to select information that is appropriate to the user in the situation at hand.

User modelling

In order to adapt to different users, Athos must be able to learn new domains and user characteristics automatically. Also, to be able to scale up the system, its knowledge must be based on dynamic interaction with the user rather than hand-coded domain and user models. Reinforcement learning techniques and high-dimensional distributed computing algorithms will be explored to dynamically create, update and maintain user abstraction. Information provided by the user model will be used in many parts of the system, e.g. in the input disambiguation, the dialogue strategy selection, and the output presentation.

New methods and advanced techniques

DUMAS will investigate new frontiers in machine learning techniques by exploring how techniques such as artificial neural networks, genetic algorithms and reinforcement learning can be used for robust language processing and building dynamic updates via interaction. For instance, neural nets can be used in user classification, a task where human classification knowledge is not easily expressed in terms of logical rules. Reinforcement learning techniques also have great potential for learning specific interaction strategies; the high-dimensional distributed computing algorithms enables adaptive and robust computing and can be applied for example to semantic analysis or user classification. Suitability of the non-standard techniques for enhancing interaction management will be evaluated and compared with the existing technology. Special emphasis will be put on efficient representation languages between different system modules. The central notion that combines both the goal of flexible interaction and the goal of studies on non-symbolic representation, is a system that learns.

We aim at developing methods that are applicable to other domains too. Thus we will investigate the application of Athos to other services which will help users to have more equal access to a wide range of information services in the mobile multilingual society. We also aim at implementing some of them in the Athos framework to advance further innovative multilingual applications and services.

Finally, the Athos system will be evaluated with respect to real users. New evaluation techniques are needed for application developers, techniques that account not only for system accuracy, but also its interactive abilities and ease of use.


Contact us: dumas@sics.se


Last update: Friday, 21-Sep-2001 15:58:07 CEST