Annika Wærn
Notes from an introduction seminar, March 1997.
Links to more material on intelligent interfaces is being collected in "http://www.sics.se/~annika/ii_links.html.
The area of Intelligent Interfaces is one of the most heterogeneous research subjects dealing with computers that exist. In this area, people from vastly different disciplines and research areas within disciplines meet, debate and collaborate. The term is so wide that people will shrink from it in practice - survey articles have been written about intelligent tutoring, adaptive interfaces, explanations or multimodal dialogue, but no survey article tries to address the whole area of intelligent interfaces. Even though all of these areas can claim to develop intelligent interfaces, none of them address this aspect specifically.
If the work in this area is so widespread and diverse, one may ask whether there is any reason at all to give it a specific name. Wouldn't it be better to avoid the notion of intelligent interfaces altogether, and continue to investigate these areas in parallel, with their different focuses?
I believe that there is an added value in the notion of intelligent interfaces, in that it captures a set of problems and ideas that are shared between all these more specific research areas. Firstly, the term provides a common framework of reference for a large group of research directions, but it is also defines a set of research issues that are worth pursuing in their own respect, without being artificially constrained by an application area or a specific technical solution. The purpose of this paper is to scope this research area and highlight its specific research issues.
To understand the notion of intelligent interfaces, we can start by a discussion of what cannot be seen as a definition of intelligent interfaces.
Firstly, we can note two things: An "intelligent system" does not necessarily have an intelligent interface, and neither is a well-designed interface necessarily intelligent.
Why is an "intelligent system" not an intelligent interface? The reason is that the intelligence of an "intelligent system" does not necessarily manifest itself in a user interface. The term "intelligent system" is as difficult to define as the term "intelligent interface", but we can consider the more limited field of knowledge-based systems, which definitely constitute a kind of intelligent systems. Knowledge-based systems are constructed to reason about and act upon a vast source of expertise in some limited field of application. The system may take its input from any sources such as human users or automatic sensors, and the output may equally well be actions in an automatic control loop or advice to a human user. The first generation expert systems were characterised by a very mechanical and system-controlled dialogue [Berry and D. E. Broadbent 1986]. Developing intelligent interfaces to knowledge-based systems is by no means an easy task, and can be considered a research area of its own. A specific issue here is the construction of explanations, that motivate the system's advice to the user [Southwick 1989].
Next, we must understand why any "good" interface cannot be considered intelligent. There exist today several approaches to the development of easy-to-use and effective interfaces, documented by guidelines or interface standards [Smith and Mosier 1986, Nielsen 1989]. Why, then, will we not call these interfaces intelligent? The answer is far from straightforward, but we can note that such standards and guidelines often impose arbitrary restrictions in the behaviour of the interface. The reason for these restrictions is mainly that the interface should be easy to learn. In design guidelines for spoken menus, it may be stated that a menu should not be more than three items long, lest users will forget some of the options. In the interface standard for the Macintosh, there is a specific set of menus that always should be included in an application, and some of these menus consist of standard menu entries. It also contains some conventions, such as shadowing of disabled options. There is nothing wrong with such guidelines or conventions, but they do not always lead to optional behaviour. Some users may be able to listen to very long menus, in particular users well aquatinted with the application. The standard entries of the menu bar can become very awkward for novel applications, such as virtual reality environments.
The main issue in human-machine interaction is obtain a "collaboration situation" between a human user and a computer system. The system must be attuned to the user, and the user to the system. "Good" conventions and guidelines shift the entire burden of adaptation to the user, and the design restrictions that they impose are geared towards easing this task for the user.
Both "intelligent systems" and "good interfaces" are thus too broad definitions: they encompass systems that we do not want to consider as intelligent interfaces. But there are also two possible definitions suggested in literature that I view as being too narrow: Systems that mimic human dialogue, and adaptive interfaces.
Most researchers would agree that a system that can maintain a human dialogue would be considered intelligent (remember the Turing test?). The problem is that there are a lot of interfaces that we would consider intelligent, that do not look "human" in any sense at all. An example is the PUSH interface [Höök E.A. 1996, Höök 1997], which presents hypertext in a manner that is adapted to the user's current task. The system is controlled mainly through direct manipulation, but the output consists of a text where certain pieces of the text are "hidden" from view, to give a comprehensive overview of the pieces of text that are most relevant to the user in his or her current task. This very passive form of user adaptation does not in any way mimic human behaviour, but is constructed to be a natural extension of the hypertext view of information.
The view of intelligent interfaces as mimicking human behaviour is not only too restrictive, it may even be considered harmful to the research field as such. The problem is that it may put emphasis on characteristics of human communication, that may be peripheral and of little use in computer communication. A specific issue is the usage of natural language in human - machine interaction. An inherent quality of human language is that it is ambiguous. Words and sentences mean different things in different situations, and the same sentence may convey messages at several different levels of interpretation. This may be effective in human - human communication, but it requires a level of interpretation and initiative from both partners, that users may not want or expect from a computer. Here, many of the principles of standard interface guidelines apply: a computer interface should be transparent and predictable, to allow users to understand it and learn to use it. Computers and humans are also good at different things, and a human - computer dialogue can be designed to make the most out of the different capabilities. User interfaces can for example use the capacities of computers to store vast information, to help the user maintain a memory of previous interactions, and to present information in multiple modalities to enhance the presentation.
Finally, an intelligent interface does not necessarily maintain a model of the user and adapts to this model [Wahlster and Kobsa 1988]. This is a possible definition of intelligent interfaces that I will avoid, because it imposes an unnatural technical constraint. Consider for example the case when we aim to develop an intelligent interface, but discover during design that it suffices to maintain several input and output mechanisms in parallel. For example, some users may prefer to input queries through a query language and some through point-and-click. We can then construct an interface which always allows both input modes, or one which only maintains one input mode, and lets the user choose which one. In the second case, the system maintains a very simple model of the user, consisting of his or her preferred input mode. Obviously, one would either like to call both interfaces intelligent, or both unintelligent, but there is no reason why one would be intelligent and the other not.
Typically, we require of an intelligent interface that it should employ some kind of intelligent technique. What, exactly, counts as an intelligent technique will vary over time, but the following list is a fairly complete list of the kinds of techniques that today are being employed in intelligent interfaces:
But providing such a list of technologies does not capture the essential feature of the intelligent interface research area: an intelligent interface must utilise technology to make an improvement: the resulting interface should be better than any other solution, not just different and technically more advanced.
One way to understand the research area better, is to compare it to the research goals outlined by Russel and Wefald, in their definition of intelligent agents [Russel and Wefald 1991]. Firstly, Russel and Wefald defines an ideal rational agent as an agent that always does the right thing. Obviously, the ideal rational agent does not exist - even if we could define an algorithm for always computing the ideal response, it would take infinite computational power to produce the ideal response before it becomes obsolete. So Russel and Wefald defines an intelligent agent as an agent that has some limitations in its reasoning power, but that always does the right thing within these limitations. The limitations of an agent are essentially given by its architecture, so that certain results take very long time to produce, and may for this reason become sub optimal in a changing world.
Following Russel and Wefald, we could define intelligent interfaces the same way: an ideal interface is simply an interface that always gives the absolutely optimal response, and an intelligent interface is one that has limited capabilities, but give the optimal response within these limitations. But for interfaces, the limitations are not restricted to the internal architecture of the system, but lie foremost in its abilities to interact. For example, an optimal speech interface is something entirely different from an optimal VR interface.
For Russel and Wefald's definition of rational agency, this definition allows a very nice split into two research issues: the issue of what is the "right" action, and the issue of how to value a degradation in result against a time delay. This division is not really possible when we consider interfaces. The reason is that there is no clear 'degradation curve': given a certain set of restrictions in reasoning power or available interaction modalities, the optimal interface behaviour may be completely different from one under other constraints. The research area of intelligent interfaces comprise two research issues that are dual and complementary: we must seek to create an optimal design of an interface given a particular model of the limitations in reasoning power and interaction modalities, and conversely, the quest for a novel and better interface design may require an extension of the reasoning power and presentation means of an interface. We can define the intelligent research area based on this double aim:
The research area of Intelligent Interfaces combines design principles and technology advancements for effective human-computer interaction, and research on intelligent interfaces aim to extend the boundaries of both.
If we use this as our definition of the research area of intelligent interfaces, we find a number of characteristic features of a research project in this area.
So how do people actually go about doing research on intelligent interfaces? As mentioned in the beginning, the research area is to large to be addressed in a single, ambitious project or even in a research programme. Instead, researchers will typically focus on developing intelligent interfaces for a particular application or application area. The rest of this paper is devoted to an extremely brief run-through of state of the art in intelligent interfaces. I will first sketch the main application areas for intelligent interfaces, and then go through some techniques and design principles for intelligent interfaces.
There exist a lot of applications that work well without adding any kind of system "intelligence" - applications where the computer is a mere tool, for a user that is well aware and capable of performing a specific task. We can compare this usage of the computer with the usage of a hammer: the hammer need not be intelligent, it suffices that it can be used by a user who can handle a hammer. Tools, typically, can be used in several ways, even to things that the original inventor did not think about; this requires a flexible and robust design but not any intelligence or adaptivity built into the tool.
The main application areas for intelligent interfaces are thus such where the knowledge about how to solve a task partially resides with the computer system. Since the user does not know exactly what should be done, he or she cannot manipulate the computer as a tool, but must ask the system to do something for him or her. This request may be incomplete, vague or even incorrect given the user's real needs.
Some typical application areas that can be characterised this way are Intelligent tutoring, intelligent help and information filtering.
Intelligent Tutoring. A "tutor" is a program that aims to give a personalised "education" to a user in a specific domain of knowledge [Shute and Psotka 1994]. The tutor program may need to infer the user's understanding of the domain through analysing the user's performance on test problems. The advice can be given by actively intervening, and suggesting alternative courses of actions, or passively, by answering explicit user's queries. In both cases, the answers can be tailored to what the system perceives as the user's needs and misunderstandings. Passive tutoring is often done in the style of "critiquing", where the user first suggests a full solution and the system then judges this solution, points out errors and suggests alternative solutions.
Intelligent Help. A "help" system aids a user in performing a specific task [Breuker 1990]. Help is very similar to tutoring, but the main objective for a help system is to get something done, and not to make the learn something. Another difference is that many tutoring systems will lay out specific tasks for the user to do, in order to diagnose his or her misconceptions. A help system must act upon whatever information it can gather from the user's own choice of interactions with the system. A help system can either give help about the functionality of a computer program, or about some computer-independent task (repairing a car, for example). As with tutoring, help can be active or passive.
Information filtering. In open information sources such as the Internet, it is comparatively "cheap" to distribute information to a very large group of recipients. For recipients, this means that they are flooded with large masses of information, and find it hard to extract the information that is really relevant or interesting to them. Users need help in selecting the information that is relevant to them, but the problem is that they do not know what is out there. Information filtering techniques, and information retrieval in general, aim to find structure in the available information that can be used to aid users in navigating the information space and selecting the information that is relevant to themselves. The task is called "filtering" when the information space is rapidly changing. Information filtering tools may rely on text or image processing, but may also log the reading patterns of groups of users, to determine what kind of users are interested in a certain piece of information.
Most computer tools and techniques that are used in intelligent interfaces stem from the artificial intelligence field. There are two main areas that come into play: user modelling and natural language dialogue.
The term "User Modelling" is used in two different meanings. In software design methodologies, it is sometimes used to denote the analysis of the prospective users of a computer system to be developed. In the research area of intelligent interfaces, it is used to denote a model of the user that the system maintains, and adapts its behaviour to. This is sometimes also called 'system user modelling'. In some literature of user modelling for intelligent interfaces, it is also required that the model is explicit, so that it can be easily inspected and modified [Wahlster and Kobsa 1988]. In this view, a bunch of switches in the program that determine what certain inputs our outputs will look like, do not constitute a user model. This is a somewhat awkward distinction, since it may be possible to maintain a very explicit model of the user during development, that decide which switches are needed and what effects they should have, but that do not motivate an explicit user model in the actual program. In the seminar, we will assume that any program that adapts its behaviour to some characteristics of the user, maintains a user model.
A program that maintains a user model may be adaptable or self-adaptive. An adaptable program lets the user select how the system should adapt, and a self-adaptive adapts autonomously, by deducing the user's needs from his or her interactions with the system. The distinction can be made more fine-grained; Malinowski et al [Malinowski E.A. 1992] distinguish between several levels of adaptivity, depending on who takes initiative to the adaptation, who proposes the adaptation, who decides upon it, and who carries it out. For example, a system may detect that a user would do better with a slightly different format of menus. It then takes the initiative, and suggest that this modification of the interaction style is done. The user can accept the modification, or reject it, and if the user accepts it, the system moves over to the new menu style. In this example, the initiative and the suggestion both comes from the system, and the system also performs the adaptation. The control still resides with the user, since the user can accept or reject the proposed adaptation.
Research on natural language dialogue is directly inspired by the thought of getting a computer to carry out a human-like dialogue. Since people are able to interact with each other in natural language, it should be natural and easy to interact with a computer in the same manner. The research has many facets, ranging from the literal interpretation of natural language sentences to recognising the focus and topic shifts of natural dialogue [Grosz and Sidner 1986]. Here we also find such research on text processing that is necessary to enable advanced information filtering.
General language capacities for computers have been a vision since the sixties. Unfortunately, while many others of the AI visions of the sixties have come true, true natural language remains a vision. In the mean time, several other effective means of interacting with computers have become a reality, such as direct manipulation and restricted speech. An important area of research for intelligent interfaces is to integrate several ways of interaction in a multimodal human - computer interaction [Bretan 1995]. In this view of interaction, each combination of language and media constitute a "mode". Language can for example be communicated through speech or text - these constitute two different modalities. Similarly, selection by clicking on an icon or through a menu choice constitute different modalities. Different modalities are good at different things [Bretan 1995] - speech is for useful for input when your hands are used for something else (placed on a steering wheel, for example), and language is a useful input means when one needs to refer to something that is not currently visible, and cannot be pointed at. The task for multimodal dialogue management is to integrate and combine several modalities for interaction into a seamless human - computer dialogue. In multimodal interaction, natural language is a central and important ingredient, but it is not a target goal of its own, and deficiencies in language understanding can be compensated by the ability to interact using other modalities such as direct manipulation.
The roots for research on interface design for intelligent interfaces lie mainly in cognitive psychology -- the theory of human thought. Intelligent interfaces are intended to be adapted to the user's way of thinking, and to some extent understand how the user thinks.
These are very ambitious goals. Some of the early models of human cognition in interacting with computer interfaces aimed to be analytical in this sense. GOMS [Kieras 1988] for example, can be used to estimate the cognitive load on users in routine interface tasks. But these models can be used only for an analysis at a rather low level of detail, and provide little insight in what is an appropriate design of an interface. The alternative has become to apply methods and principles that have been developed for traditional interfaces, but extend and modify them to be applicable to the new functionalities and interaction principles found in intelligent interfaces [Ereback and Höök 1994]. The prevailing development strategy is that of iterative and user-oriented design, where the interface is repeatedly tested with users to refine the design, and see which adaptations work and which do not work.
One such interaction principle is the principle of transparency and control. In general, an interface should allow the user to inspect the functionality of a system, to be able to control and correct it, if it goes off target and produces an unwanted result [du Boulay E.A. 1981]. If an interface is self-adaptive, the same applies. A user must be able to inspect why a certain adaptation was generated, and correct the behaviour if the result was not what the user wanted [Höök E.A. 1996]. Inspection is also important to allow the user to trust a system: an expert system must for example be able to produce an explanation of why it suggested a certain action. Else, a user may ignore the system's advice because it mistrusts its competence.
We previously noted that intelligent interfaces may provide both active and passive adaptations to the user's needs. These may require different interaction metaphors to be understood by the user. If the prevailing interaction metaphor is that of direct manipulation, the system must behave rather passively, and let the user maintain the initiative and control of the interaction. The intelligence of the system may show only in the set of options that the system suggests, for example. An example of a very passive intelligent interface are the adaptive prompts suggested in [Kühme EA 1993]. Their system presents a completely standard direct manipulation interface, but in addition to the normal interface, the system maintains a small menu of the three or four most useful "next actions". This menu is continuously updated, and may provide shortcuts to the user's next action. On the other hand, if the system is to take a lot of initiative, make active suggestions of may interpret the user's queries or commands differently in different situations, this can be conveyed through an "interface agent". In this situation, the user will perceive the interface agent as a conversation partner, rather than a useful tool. This mode of interaction has sometimes been called "indirect management" [Maes 1994].
The definition of intelligent interfaces is as ambiguous as the definition of Artificial Intelligence. However, it is possible both to scope the area of research for Intelligent Interfaces, and to find good reasons to pursue this research area. The research area is inherently cross-disciplinary: we must both strive to make technological advancements in interface generation, and develop novel interaction principles aimed to perfect the human - computer dialogue.