| UPGRADE YOUR BROWSER! If you see this message, it means your browser, which is CCBot/1.0 (+http://www.commoncrawl.org/bot.html), does not support current webstandards. Please, see the webstandards project. |
This web page contains the solution to the second part of thge assignment of the fourth part (inductive logic programming) of the course on machine learning taught at the graduate school of language technology, fall 2003. Authors of this page are Fredrik Olsson and Magnus Sahlgren.
The task was to make Aleph induce a reasonable theory regarding the past tense of English verbs. The keyword here is reasonable; since it was pointed out in the definition of the task that Aleph isn't the best choice for this kind of problem, the solution provided at this web page is what we were at after spending a reasonable amount of time trying to solve the problem.
The past tense information about the verbs is represented as pairs of Prolog lists, of which the first list is the present tense, and the second list is the past tense of a given verb, e.g.
past([a, d, v, e, r, t, i, s, e],[a, d, v, e, r, t, i, s, e, d]). past([w, i, t, h, d, r, a, w],[w, i, t, h, d, r, e, w]). past([w, e, t],[w, e, t]).
Our idea was to have Aleph induce a predicate past/2 that
would map the first argument, i.e, the present tense, to the second
one, i.e., the past tense, while making use of the regularities that
are present in the 1393 training examples. The background knowledge/predicate
used was this:
split([X, Y|Z], [X], [Y|Z]). split([X|Y], [X|W], Z):- split(Y, W, Z).
Which, in essence, is a modified append/3 saying that "a
list can be split in to two non-empty sublists" (the original
append/3 allowed for empty substrings as well, and such a
behaviour would not be of use in the present task). In giving Aleph
the possibility to split lists, our hope was for it to exploit this in
order to find and generalise over some of the regularities present in
the training data. The mode and determination declarations used in the
background knowledge was the following (the complete file is available
here):
:- mode(*, past(+wrdl, -wrdl)). :- mode(*, split(+wrdl, -wrdl, -wrdl)). :- mode(*, split(-wrdl, +wrdl, +wrdl)). :- determination(past/2, split/3).
However it turned out that running Aleph using these settings caused little generalisation over the input data; there were 1393 instances in the training data, and as many as 1321 rules were generated. The entire rule sequence (theory) is available here, while the rules that generalised over more than one input example were these:
[Rule 100] [Pos cover = 20 Neg cover = 0] past(A,A). [Rule 120] [Pos cover = 29 Neg cover = 0] past(A,B) :- split(A,C,D), split(A,E,F), split(B,C,F). [Rule 288] [Pos cover = 41 Neg cover = 0] past(A,B) :- split(A,C,D), split(B,A,C). [Rule 398] [Pos cover = 4 Neg cover = 0] past(A,B) :- split(A,C,D), split(B,A,D). [Rule 557] [Pos cover = 2 Neg cover = 0] past(A,B) :- split(A,B,C).
It should be noted that the solution presented here is inspired by the paper Learning the Past Tense of English Verbs Using Inductive Logic Programming by Raymond Mooney and Mary Elaine Califf.