UPGRADE YOUR BROWSER! If you see this message, it means your browser, which is CCBot/1.0 (+http://www.commoncrawl.org/bot.html), does not support current webstandards. Please, see the webstandards project.

This web page contains the solution to an enhanced version of the sixth assignment assignment of the second part (transformation based learning) of the course on machine learning taught at the graduate school of language technology, fall 2003. Authors of this page are Fredrik Olsson and Magnus Sahlgren.

1 Using all Maptask data

Running the mu-tbl system using the maptask templates provided in the course pack on the full Maptask corpus yields the shown in the table below. In total, there are 27084 utterances in the Maptask data. For the first run, we split the data into two pieces, one for training (containing approximately 9/10 of the full corpus) and one for testing (containing the remaining 1/10 of the corpus). The learned rule sequence is available here.

training data set 9/10 of full maptask data, 24542 utterances.
f-score 66,2%
no rules 456

2 Using Finnish dialogue data

The first run on the Finnish data (Interact corpus of spoken Finnish recorded at the Helsinki regional transport center, see Jokinen et al (2002)) utilised the original set of dact templates with the data converted to mu-tbl readable format under the assumption that the most common dact tag was the same for the Finnish data set as it was for the English one, that is, "ancknowledge". The results is shown in the table below.

data set 9/10 of finnish data, 2073 utterances.
f-score 63,3%
no rules 73

However, the most common dact tag, i.e., the default tag assigned to each utterance, for the Finnish data is "statement" (23% of the utterances are correctly tagged using this tag). Once that was reflected in the conversion procedure, the results obtained from running mu-tbl on the Finnish data using the original set of dact templates was:

data set 9/10 of finnish data, 2073 utterances.
f-score 65,7%
no rules 49

2.1 Using lemmatized Finnish dialogue data

Using the same set of templates, and a lemmatized version of the Finnish data yields the results shown in the table below. Lemmatization was done using the Finnish functional dependency parser from Connexor Oy.

data set 9/10 of finnish data, 2086 utterances. lemmatized.
f-score 60,3%
no rules 55

2.2 Using lemmatized and part-of-speech tagged Finnish dialogue data

Using the same set of templates, and a lemmatized and part-of-speech tagged version of the Finnish data yields the results shown in the table below. Lemmatization and pos-tagging was done using the Finnish functional dependency parser from Connexor Oy.

data set 9/10 of finnish data, 1958 utterances. lemmatized and pos-tagged.
f-score 62,3%
no rules 50