Words that do not appear in the lexicon - so called unknown words - present a problem to part of speech tagging. A guesser is that module in a part of speech tagger that deals with the problem by trying to guess the part of speech for unknown words. In this exercise, we will see that it is straightforward to train a guesser in the µ-TBL system.
Brill dealt with the problem in (Brill 1994) and what we are going to experiment with here is strongly influenced by his approach. The idea is simply to let rules inspect prefixes and suffixes of words, and to take a guess from that - a guess that may later be overridden by rules further down in the sequence.
We will use a subset of the templates that Brill used, converted into the µ-TBL system's template formalism.
Train and test, and see what happens! From the OS prompt, run:
> ./mutbl -fexamples/guessing.script
Inspect the script for information about where templates and training and test data are located.
In your report, I would like you to consider the following questions: