This project, funded by the Swedish Research Council, will investigate the confluence of formal linguistic description and distributional models of linguistic behaviour. This will be done by studying the distributional behaviour of linguistic items such as words, entities, constructions on various levels, and related to the topical character of the text in question and its proposed use.

This proposed project will focus on written text, but will in addition to newsprint and other traditional sources for text analysis experimentation use informal written material such as internet blogs and e-mail messages and other forms of new text, less edited, and somewhat less bound by conventions from standard language.

This project will provide

  • better distributional modeling of entities and the relations entities enter into in text
  • a discovery procedure for relationships motivated by construction grammar and
  • a computer implementation of such a discovery procedure.

If successful, this will constitute a validation of some of the basic tenets of construction grammar as a theory and provide a path to application of such models to future language understanding tasks. The procedure and its implementation will be evaluated by intrinsic, model-based evaluation, by comparison with human linguistic performance, and by extrinsic task-based evaluation. A third implicit contribution of the project will be to make explicit the difference between the three – a distinction which in most language technology projects can become blurred.

Jussi Karlgren
Magnus Sahlgren
Gunnar Eriksson
Anders Holst
Oscar Täckström

