Natural Language Processing

Björn Gambäck     <gamback@sics.se>
Gunnar Eriksson     <gunnar@ling.su.se>
Athanassia Fourla     <afourla@hotmail.com>

  • Visiting hour
    by appointment
  •  
  • Course material
    available for copying in the department's course material bookshelf.
    1. Books
    2. Articles
    3. Other material
  •  
  • Lecture schedule
  •  
  • Examination
    1. Written account of the five lab assignments.
      Send them in by Wednesday 17.3 (NB!).
    2.  
    3. A written exam Tuesday 9.3, 10:00 (NB!).
      (Exam feedback: Thursday 11.3, 14:00.)
    4.  
    5. The course grade will be assigned as follows:
      exam 55%, assignment 1 5%, assignments 2-5 10% each.

  • Course material

    Books

  • Daniel Jurafsky and James H. Martin (J&M)
    SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
    Prentice-Hall, 2000.
  • Reading Instructions:
    Ch. 1; Ch. 2; Ch. 3; Ch. 4, pp. 91-112, 120-133; Ch. 6; Ch. 7.1-7; Ch. 8; Ch. 9; Ch. 10; Ch. 11.1-3; Ch. 12.4-5; Ch. 14, pp. 501-527; Ch. 15.1-2, 4-5; Ch. 16; Ch. 17; Ch. 18.1-3; Ch. 19; Ch. 20.1-2; Ch. 21
  • Articles

  • Douglas Arnold et al. 1994:
    Machine Translation: An Introductory Guide, Ch. 3, 4, 6, 8, 9.
  • Björn Gambäck 1999:
    Human Language Technology: The Babel Fish (pdf) (ps)
  • John Kimball 1973:
    "Seven Principles of Surface Structure Parsing in Natural Languages", Cognition 2(1), pp. 15-47.
  • Yorick Wilks & Roberta Catizone 2000:
    "Human-Computer Conversation", Encyclopedia of Microcomputers, Dekker, New York.
  • Victor Zue & James Glass 2000:
    "Conversational Interfaces: Advances and Challenges", Proceedings of the IEEE 88(8), pp. 1166-1180.
  • Other material

  • Nikolaj Lindberg
    egrep for Linguists
  • Jurafsky and Martin
    Links to NLP resources
  • Amharic Links

  • Lecture Schedule

     time
    Lecture 1: Introduction, Language Fri Jan 16, 2:00 pm
    Lecture 2: Language structure, NLP Mon Jan 19, 3:00 pm
    Computer lab 1 (1) Fri Jan 23, 2:00 pm
    Lecture 3: Phonology and Morphology Mon Jan 26, 3:00 pm
    Computer lab 1 (2) Tue Jan 27, 8:30 am
    Lecture 4: Computational Morphology Tue Jan 27, 3:00 pm
    Computer lab 1 (3) Wed Jan 28, 8:30 am
    Computer lab 2 (1) Fri Jan 30, 2:00 pm
    Lecture 5: Words, their neighbors, and POS Mon Feb 2, 3:00 pm
    Computer lab 2 (2) Tue Feb 3, 8:30 am
    Lecture 6: POS Tagging Tue Feb 3, 3:00 pm
    Computer lab 2 (3) Wed Feb 4, 8:30 am
    Computer lab 3 (1) Fri Feb 6, 2:00 pm
    Lecture 7: The meaning of words Mon Feb 9, 3:00 pm
    Computer lab 3 (2) Tue Feb 10, 8:30 am
    Computer lab 3 (3) Tue Feb 10, 3:00 pm
    Computer lab 3 (4) Wed Feb 11, 8:30 am
    Lecture 8: NLP systems, Overview Fri Feb 13, 2:00 pm
    Lecture 9: Machine Translation I Mon Feb 16, 3:00 pm
    Lecture 10: Machine Translation II Tue Feb 17, 3:00 pm
    Lecture 11: Basic Grammars Thu Feb 19, 2:00 pm
    Lecture 12: Parsing Fri Feb 20, 2:00 pm
    Lecture 13: Features and Unification Mon Feb 23, 3:00 pm
    Computer lab 4 (1) Sat Feb 28, 3:00 pm
    Computer lab 4 (2) Mon Mar 1, 10:00 am
    Lecture 14: Semantics Mon Mar 1, 3:00 pm
    Computer lab 4 (3) Tue Mar 2, 10:00 am
    Lecture 15: Pragmatics Tue Mar 2, 3:00 pm
    Computer lab 5 (1) Wed Mar 3, 10:00 am
    Computer lab 5 (2) Thu Mar 4, 10:00 am
    Lecture 16: Speech Thu Mar 4, 3:00 pm
    Written Exam Tue Mar 9, 10:00 am
    Computer lab 5 (3) Wed Mar 10, 10:00 am
    Computer lab 5 (4) Thu Mar 11, 10:00 am
    Exam Feedback Thu Mar 11, 2:00 pm

    Lectures

    Lecture 1: Introduction

    Introduction; Administrative details; Language and languages; Relations between languages; Written and spoken languages

    Slides     To read     Other relevant material
    Ethnologue: Languages of the World
    Ethnologue: Languages in Ethiopia
    Omniglot: Writing systems



    Lecture 2: Language structure and Natural Language Processing

    Human language and other types of communication; The structure of language; What is NLP?

    Slides     To read     Other relevant material
    J&M Ch. 1



    Lecture 3: Phonology and Morphology

    Phonemes; Syllables; Writing systems; Transliteration

    Regular expressions; Finite state automata; Morphology

    Slides     To read     Other relevant material
    J&M Ch. 4, pp. 91-112
    J&M Ch. 2, pp. 19-53
    J&M Ch. 3, pp. 57-65
    Any description of Amharic morphology
    J&M links to NLP resources



    Lecture 4: Computational Morphology

    Morphology; Regular languages; Finite state tranducers; Two-level morphology; Stemming; Morphological parsing

    Slides     To read     Other relevant material
    J&M Ch. 2, pp. 33-53
    J&M Ch. 3, pp. 65-88
    J&M Ch. 13, pp. 477-488
    J&M links to NLP resources



    Lecture 5: Words, their neighbors, and their category

    Words; Tokenization; Counting words; Types and Tokens; Syntax; Phrases; Collocations; N-grams; Sparse data; Part-of-speech category and syntactic function; Open and closed classes; Lexical and grammatical words; Tagsets; Ambiguity

    Slides     To read     Other relevant material
    J&M Ch. 6
    J&M Ch. 8, pp. 287-298
    J&M links to NLP resources



    Lecture 6: POS Tagging

    Methods for part-of-speech tagging; Unknown words; Evaluation; Precision - Recall

    Slides     To read     Other relevant material
    J&M Ch. 8, pp. 298-319
    J&M p. 578
    J&M links to NLP resources



    Lecture 7: The meaning of words

    Lexemes; Relations between lexemes: homonymy, polysemy, synonymy, hyponomy; Thematic roles; Word sense disambiguation; Words and Information retrieval

    Slides     To read     Other relevant material
    J&M Ch. 16
    J&M Ch. 17
    J&M links to NLP resources



    Lecture 8: NLP Systems, Overview

  • Examination; Schedule for the second half of the course.
  • The components of an NLP system; what do we need to build language processing systems?
  • Slides     To read    
    J&M Ch. 1
    Gambäck 1999: Human Language Technology: The Babel Fish (pdf) (ps)



    Lecture 9: Machine Translation I

    Machine Translation theory

    Slides     To read     Other relevant material
    J&M Ch. 21 Systran commercial application
    Arnold et al.. "MT: An Introductory Guide" Various translation tools
    Eurodicautom



    Lecture 10: Machine Translation II

    Machine Translation systems, MT users, applications, evaluation

    Slides     To read     Other relevant material
    J&M Ch. 21 English-Amharic parallel texts
    Arnold et al.. "MT: An Introductory Guide" Parallel texts - EU languages



    Lecture 11: Basic Grammars

    Context-free grammars; Definite Clause Grammars; Dependency Grammar.

    Slides     To read     Other relevant material
    J&M Ch. 9 Pasi Tapanainen & Timo Järvinen, 1997. "A Non-Projective Dependency Parser", Proc. 5th Conf. on Applied NLP, Washington, DC.



    Lecture 12: Parsing

    Parsing with context-free grammars; top-down parsing; bottom-up parsing; Well-formed substring tables; head-first parsing; charts; LR parsing; Human language processing.

    Slides     To read     Other relevant material
    J&M Ch. 10, 12.5
    Kimball. "Seven Principles of Surface Structure Parsing in Natural Languages"
    Martin Kay. 1989. "Head-Driven Parsing", Proc. 1st Int. Workshop on Parsing Technologies. Pittsburgh.
    Alfred Aho & Jeffrey Ullman. 1972. Ch. 4.2: Tabular Parsing Methods. The Theory of Parsing, Translation, and Computation. Prentice-Hall.
    Masaru Tomita. 1986. Ch. 2: Informal Description of the Algorithm. Efficient Parsing for Natural Language. Kluwer Academic Press.



    Lecture 13: Features and Unification

    Feature structures, subcategorization, unification.

    Slides     To read     Other relevant material
    J&M Ch. 11.1-3, 12.4 SWI-Prolog.
    Patrick Blackburn, Johan Bos & Kristina Striegnitz. 2001. Learn Prolog Now!. Saarbrücken, Germany.



    Lecture 14: Semantics

    Semantic models, representing meaning. Syntax-driven semantic analysis, quantifiers, compositionality.

    Slides     To read     Other relevant material
    J&M Ch. 14, pp. 501-527
    J&M Ch. 15.1-2, 15.4-5
    Patrick Blackburn & Johan Bos. 1999. Representation and Inference for Natural Language: A First Course in Computational Semantics. Saarbrücken, Germany.



    Lecture 15: Pragmatics

    Discourse processing, reference resolution, conversational agents, dialogue.

    Slides     To read     Other relevant material
    J&M Ch. 18.1-3, 19
    Wilks & Catizone. "Human-Computer Conversation"
    Johan Bos. 2001. DORIS: Discourse Oriented Representation and Inference System



    Lecture 16: Speech

    Automatic speech recognition, speech generation, text-to-speech synthesis. Spoken dialogue systems, speech-to-speech machine translation.

    Slides     To read     Other relevant material
    J&M Ch. 4.6-7, 7.1-7, (20.5)
    J&M Ch. 19, 20.1-2
    Zue & Glass. "Conversational Interfaces: Advances and Challenges"
    J&M links to NLP resources



    Exam and Exam Feedback

  • Written Exam Tue Mar 9, 10:00 am
  • Exam Feedback (slides) Thu Mar 11, 2:00 pm

  • Responsible: Björn Gambäck <gamback@sics.se>
    Last change 050516 kl. 11:16:57.