| UPGRADE YOUR BROWSER! If you see this message, it means your browser, which is CCBot/1.0 (+http://www.commoncrawl.org/bot.html), does not support current webstandards. Please, see the webstandards project. |
This web page contains the solution to an enhanced version of the sixth assignment assignment of the second part (transformation based learning) of the course on machine learning taught at the graduate school of language technology, fall 2003. Authors of this page are Fredrik Olsson and Magnus Sahlgren.
Running the mu-tbl system using the maptask templates provided in the course pack on the full Maptask corpus yields the shown in the table below. In total, there are 27084 utterances in the Maptask data. For the first run, we split the data into two pieces, one for training (containing approximately 9/10 of the full corpus) and one for testing (containing the remaining 1/10 of the corpus). The learned rule sequence is available here.
| training data set | 9/10 of full maptask data, 24542 utterances. |
| f-score | 66,2% |
| no rules | 456 |
The first run on the Finnish data (Interact corpus of spoken Finnish recorded at the Helsinki regional transport center, see Jokinen et al (2002)) utilised the original set of dact templates with the data converted to mu-tbl readable format under the assumption that the most common dact tag was the same for the Finnish data set as it was for the English one, that is, "ancknowledge". The results is shown in the table below.
| data set | 9/10 of finnish data, 2073 utterances. |
| f-score | 63,3% |
| no rules | 73 |
However, the most common dact tag, i.e., the default tag assigned to each utterance, for the Finnish data is "statement" (23% of the utterances are correctly tagged using this tag). Once that was reflected in the conversion procedure, the results obtained from running mu-tbl on the Finnish data using the original set of dact templates was:
| data set | 9/10 of finnish data, 2073 utterances. |
| f-score | 65,7% |
| no rules | 49 |
Using the same set of templates, and a lemmatized version of the Finnish data yields the results shown in the table below. Lemmatization was done using the Finnish functional dependency parser from Connexor Oy.
| data set | 9/10 of finnish data, 2086 utterances. lemmatized. |
| f-score | 60,3% |
| no rules | 55 |
Using the same set of templates, and a lemmatized and part-of-speech tagged version of the Finnish data yields the results shown in the table below. Lemmatization and pos-tagging was done using the Finnish functional dependency parser from Connexor Oy.
| data set | 9/10 of finnish data, 1958 utterances. lemmatized and pos-tagged. |
| f-score | 62,3% |
| no rules | 50 |