| UPGRADE YOUR BROWSER! If you see this message, it means your browser, which is CCBot/1.0 (+http://www.commoncrawl.org/bot.html), does not support current webstandards. Please, see the webstandards project. |
This web page contains the solution to the third assignment assignment of the second part (transformation based learning) of the course on machine learning taught at the graduate school of language technology, fall 2003. Authors of this page are Fredrik Olsson and Magnus Sahlgren.
Running the word sense disambiguator according to the instructions given in the task (using both training sets available), the following results were obtained:
| data set | interest_small | interest |
| f-score | 78,8% | 81,6% |
| no rules | 19 | 47 |
As can be seen in the table above, there's a big difference between the number of rules learned on the two training sets, and there's also a difference in performance. A (perhaps naive) assumption would be; the more rules, the merrier. In order to obtain more rules to see whether this assumption may prove valid, the score threshold (which was initially set to 3) was lowered to 1, and the learning process was done all over again. The results are shown in the table below.
| data set | interest |
| f-score | 87,6% |
| no rules | 274 |
Lowering the score threshold resulted in a significant increase in the number of rules learned from the training data, and these rules actually improved on the performance of the word sense disambiguator by several percents. The listing of the rules is available here and a dump of the output produced by the learner while learning is available here. From the latter, it can be seen that most of the rules learned are perfect in the sense that accuracy is 1. The combination of: a large number of rules, many of which are perfect; a relatively small training set (some 2000 instances); and templates that does not allow the learner to abstract away from the lexical level of the data, imply that rule sequence may be over-fitted to the data at hand.
Several things may lead to domain and genre dependence. First of all, the data from which the rules are learned does probably not contain good examples of instances of all meanings of the word "interest". Second, since the templates are purely lexical, the language use "shines through" in the learning process; the genre and domain specifics represented by the training data at hand reflects the use of language in WSJ, and that is indeed a specialised language. Since the rules are tied to the vocabulary used and to the examples of the different meanings of "interest" available, the rule sequence is probably not a good candidate for applying to another domain or genre.