Most information technology so far has been built to further the needs of information producers. The cost of producing, publishing, and distributing information has steadily been lowered since the invention of clay tablets and cuneiform script; the attendant consumer cost of sifting and assessing incoming information streams has not been addressed to the same degree. It is crucially important that information be well distributed to the right recipients. People need relevant and trustworthy news; they need information about the surrounding world, to keep them abreast of whatever developments in it that affect their lives and to make them aware of how to change the world; they must be able to participate in public discourse; they must be provided with means to entertain themselves when they need distraction and to inform them when they need knowledge; they must feel empowered to make the right choices and wise decisions in matters both mundane and existential.
Without support for the timely access to relevant information, people risk making the wrong decisions and will feel frustrated, alienated, side-tracked, and inconsequential. In the end, this will become a threat against participatory democracy, against transparent, understandable, and just legal processes, against publicly available educational systems, against the broad acceptance and use of technology, and against a viable information industry working to fulfil the needs of all citizens. The research track for the study of information access works with questions on how information can be found and adapted to the specific needs of individuals or groups. Information access is about providing tools and methods for people to find the information they need reliably and simply.
This research track is the theoretical backbone to application-oriented research and studies in language technology, information seeking behavior, and collaborative retrieval systems produced at SICS during the past fifteen years.
The study of relevance ties together all our proposed and current research projects in information access. Relevance - the momentary quality of a text that makes it valuable enough to read - is a function of task, text characteristics, user preferences and background, situation and social context, tool, time, and untold other factors.
Our belief is that text analysis beyond term frequency calculation is the key to better systems. We build tools to analyze texts using the cutting-edge linguistic methodology; we believe that studying and modeling text characteristics with ever so sophisticated algorithms without anchoring knowledge of text in the needs of the user is certain to lead to application prototypes with little or no real potential for application: central to our projects are the perspectives and needs of the user community we aim to serve.
Of equal importance is how pieces of information (texts or others) are accessed, used, and rated by people. Collaborative methods and techniques make use of user opinions and usage patterns as well as analyses of information carriers such as textual documents. Users of a system should be able to benefit from the aggregated experience of other users of the same system. Collaborative filtering, information retrieval and machine learning are essential techniques to achieve this.
Our approach ties together text analysis with text context, text domain, and text usage, and information retrieval with human information seeking behavior and readership.
Our hypothesis is that if we deconstruct the concept of relevance, and make information access systems model the above mentioned factors, we will be able to better satisfy the needs of a professional information seeker - the goal of the track is to find and model the factors relevant to the notion relevance in information access and refinement.
Testing the validity of our various approaches can be made systematically on several levels; information access research has a well-established tradition of quantifiable evaluation.
The following main application areas that motivate our research efforts in the near future.
Information access systems of today view texts as single entities, topics as variations in word frequency, information access sessions as brief one-shot affairs, and users as all alike. If these assumptions are contested we find that access systems could well tailor to user or user group, task, and situation; could take their time in producing a result; and analyze in smaller entities. Variants along these lines are conceivable; only a few have been tested.
The track will include projects that address the questions of how to interact with information access systems that analyze text; what levels of analysis are or could be interesting for what task and what user population; what to find in text; and how to specify what to extract.
Managing large document collections is a many-faceted task: endorsement, authority, quality, selection, and currency of the information items are some of the central aspects a collection manager must address. Maintenance and upkeep of document collection is cumbersome to the point that many projects to digitize collections stumble and fail on the issue: before the entry or conversion of the collection is complete the management of the catalog has overwhelmed the management of the collection. The most interesting tasks for the purposes of the present program involve modeling the content of the collection in some way; this must include coping with foreseen and unforeseen change and evolution and questions of conceptual drift and topical change.
The track will include projects that will pay attention to domain dependent and temporal factors in knowledge modeling.
Many people know more than one language to some extent. Most tools for information access seem to assume readers only work with one language at a time. Enabling information access tools to work with more than one language, and especially allowing users to access material in one or several languages after specifying an information need in another will mean more than adding a dictionary to the search engine. Building real cross- and multi-lingual information access systems will involve knowledge intensive development of some sort of translation tools, providing access to lexical resources of varying type, and, crucially, careful design and realization of the interactive interface.
The track will address questions of interacting with multi-lingual document collections, and will focus attention on the processing of Swedish text.
An interesting issue for recommender systems is for what tasks they offer a suitable solution. What are the important features of good recommender applications and what are the demands on the data needed to fuel them? Investigating the contexts in which recommendations based on others' opinions can be of use is one part of defining what the domains of application for collaborative and recommender systems are. Just as important are aspects of the data available. Is user data available, e.g. in usage logs? Can "ratings" of items be collected directly, or must they be derived, and in that case, from what?
The track will develop a model for recommender system application design, based on a previously developed tool.
The results from our research projects will feed into the above-mentioned application areas, as research results, as prototype products, and as information passed to stakeholder organizations.