Word Space Model

The Word Space Model research theme investigates the representation of textual content and linguistic meaning using a vector space model populated by distributional data. The SICS implementation of the word space model utilizes Random Indexing, a dimensionality-reduction technique based on the Sparse Distributed Memory model, both originally developed by Pentti Kanerva (1988, 2000).
 

  Sparse Distributed Memory model

 
 
Everything everywhere.
 

 Random Indexing

 
The standard reference is: Kanerva, Pentti, Kristoferson, Jan & Holst, Anders (2000): Random Indexing of Text Samples for Latent Semantic Analysis. In Gleitman, L.R. and Josh, A.K. (Eds.): Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Mahwah, New Jersey: Erlbaum, 2000.here.

An introductory text can be accessed at http://www.sics.se/~mange/papers/RI_intro.pdf.
 
 

Current Research Questions

 

  • What requirements should we pose on a linguistic representation of meaning? (Karlgren 2005, Sahlgren 2006)
  • What features should we extract from the textual data? Can we assess the usefulness of the features somehow? Are there features from standard natural language processing - morphology, syntax, dependencies of various sorts - that can be put to use? (Sahlgren 2006)
  • How can we investigate and probe the character of the knowledge representation? Can we evaluate the knowledge representation somehow? (Sahlgren 2006, Sahlgren and Karlgren 2005)
  • How can we build models for representing clauses, sentences, and texts rather than words only? Is there a model for compositionality? (Sahlgren and Cöster 2004)
  • What applications beyond vanilla information retrieval can we foresee with a more informed knowledge representation?

 

References

  • Kanerva, Pentti. (1988) Sparse Distributed Memory. MIT Press.
  • Kanerva, Pentti, Kristoferson, Jan & Holst, Anders (2000): Random Indexing of Text Samples for Latent Semantic Analysis. In Gleitman, L.R. and Josh, A.K. (Eds.): Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Mahwah, New Jersey: Erlbaum, 2000.
  • Karlgren, Jussi and Magnus Sahlgren (2001): From Words to Understanding. In Uesaka, Y., Kanerva, P. & Asoh, H. (Eds.): Foundations of Real-World Intelligence, pp. 294-308, Stanford: CSLI Publications.
  • Sahlgren, Magnus and Rickard Cöster (2004): Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004, August 23-27, Geneva, Switzerland, pp.487-493.
  • Sahlgren, Magnus (2005): An Introduction to Random Indexing. In Proceedings of the Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, TKE 2005, August 16, Copenhagen, Denmark.
  • Sahlgren, Magnus and Jussi Karlgren (2005): Automatic Bilingual Lexicon Acquisition Using Random Indexing of Parallel Corpora. Journal of Natural Language Engineering, Special Issue on Parallel Texts, 11(3) September 2005.
  • Sahlgren, Magnus and Jussi Karlgren (2005): Counting Lumps in Word Space: Density as a Measure of Corpus Homogeneity. In Proceedings of the twelfth edition of the Symposium on String Processing and Information Retrieval, SPIRE 2005, November 2-4, Buenos Aires, Argentina, pp.151-154.
  • Karlgren, Jussi (2005) Meaningful models for information access systems. In: Inquiries into Words, Constraints and Contexts: Festschrift in the Honour of Kimmo Koskenniemi on his 60th Birthday. CSLI Studies in Computational Linguistics. CSLI Publications, Stanford, California, pp. 241-248.
  • Sahlgren, Magnus (2006): Towards pertinent evaluation methodologies for word-space models. In Proceedings of the fifth international conference on Language Resources and Evaluation, LREC 2006, May 24-26, Genoa, Italy.
  • Sahlgren, Magnus (2006): The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. dissertation, Department of Linguistics, Stockholm University.
  • Karlgren, Jussi, Anders Holst and Magnus Sahlgren. Filaments of meaning in word space. Forthcoming.

 

(Co-)Organized workshops

SCAR 2007: http://www.sics.se/~mange/scar2007/

COSMO 2007: http://clic.cimec.unitn.it/~marco.baroni/beyond_words/