This document sets out a revised plan for the SVENSK project. Rather than build a single monolithic language processing system which attempts to meet the (diverse) needs of academia and industry, the project will develop a toolbox of language processing components, primarily aimed at teaching and research. Since each of the components has a standardized interface, users will have the choice of working with the supplied development system --- as may be appropriate for academic research on particular aspects of language use --- or selecting and combining supplied components for integration into a user application.
The goal of the SVENSK project is to develop a multi-purpose language processing system for Swedish based, where possible, on existing components. The generality of the system arises from its adherence to general principles of language engineering, combined with tools which faciliate adaption to specific domains and applications. Each component will be integrated into a language engineering development platform which, together with the definition of standard interfaces, will ensure interoperability of major components. The system will make it possible to incorporate linguistic resources (such as lexica, tagged corpora, etc) provided by Swedish universities and research institutes, so that SVENSK can be seen as a computer linguist's toolkit for the Swedish language. The components can be tested for robustness and accuracy on testsuites from a number of specific application domains. Documentation will describe how users can adapt the system to their own domains and applications (such as dialogue systems and machine translation).
Currently, there are few easily-available development platforms for language engineering. We have considered two platforms, ALEP and GATE, and selected the GATE platform (see `` Comparative Evaluation of ALEP and GATE '' for more details). The GATE language engineering platform was developed at the University of Sheffield and funded by the U.K. Engineering and Physical Sciences Research Council (EPSRC).GATE provides a communication and control infrastructure for linking together sets of Language Engineering (LE) software. It is based on the Tcl/Tk communication toolkit and is to be considered as a shell, without having pre-defined LE modules. Instead it emphasizes plug-and-play module interchangeability.
Each component integrated into GATE has a standard I/O inteface, which conforms to the annotation model of the TIPSTER architecture. This means that GATE compatibility equals TIPSTER compability. This allows developers (and users) to easily add components to the platform, and then link them together to form an application. Alternatively, two components with the same interfaces and functionality can be defined for the platform, and then evaluated in the same application. This allows students to experiment with different approaches to a lingustic problem, or experiments to use the most appropriate component for their purposes and performance criteria (such as speed, robustness, etc).
The SVENSK project is guided by a reference group composed of 2 academic and 2 industrial members. The purpose of the group is to provide a way of organizing exchange of information between SVENSK users and developers. This supplements more direct channels available to users such as email, WWW sites, presentations, site visits, etc. Members of the group are:
- Barbro Atlestam, NUTEK
- Anna Sågvall Hein, Dept. of Linguistics, Uppsala University
- Robin Cooper, Dept. of Linguistics, Göteborg University
- Scott McGlashan, PipeBeach AB
- Mats Wirén, Telia Research AB
- Mohammad Sanamrad, IBM Svenska AB
| Updated at 10:58:45 on 991020. |