UPGRADE YOUR BROWSER! If you see this message, it means your browser, which is CCBot/1.0 (+http://www.commoncrawl.org/bot.html), does not support current webstandards. Please, see the webstandards project.

This web page contains the solution to an enhanced version of the fifth assignment assignment of the second part (transformation based learning) of the course on machine learning taught at the graduate school of language technology, fall 2003. Authors of this page are Fredrik Olsson and Magnus Sahlgren.

Training a constraint grammar tagger

The tagger is trained and tested according to the instructions in the assignment, the output from the training procedure is available here, while the result is showed in the table below.

data set wsj_cg_test
(before applying rules)
wsj_cg_test
(after applying rules)
recall 100% 99,9%
precision 68,7% 84,3%
f-score 81,5% 91,4%
no rules - 54

The fundamental difference between tagging à la Brill and tagging with constraint grammars is that the former assigns exactly one tag to each word (a tag which may subsequently be altered by transformation rules), while the latter assigns a set of possible tags to each word (a set which may then be reduced, but not entirely emptied). When reducing the set of possible tags assigned to a word, the goal is to have as few tags left as possible once the reduction process has finnished, that is, to remove ambiguities. If the accuracy threshold is lowered, it means that we allow the learner to learn rules that are not entirely correct, and incorrect rules is more likely to remove the one tag that we want as a sole member of the set in the end.

Changing the accuracy threshold from 1.0 to 0.9 yields the result shown in the table below. The precision (and, in this case, the f-score) has increased at the expense of the recall (which was decreased from 99,9% to 99.1%).

data set wsj_cg_test
(before applying rules)
wsj_cg_test
(after applying rules)
recall 100% 99,1%
precision 68,7% 88,9%
f-score 81,5% 93,7%
no rules - 54