In this example we repeat the experiments performed by Ramshaw and Marcus on NP chunking (read the paper!). The idea is to view chunking as a tagging problem, and to encode the chunk structure as tags attached to each word.
We use the same training and test corpora as Ramshaw and Marcus did. However, although I have sucessfully trained on 150,000 words, we will use only 20,000 words here. The test data will consist of 10,000 words, taken from the same corpus.
We will use the same 100 templates as did Ramshaw and Marcus, but must first convert them into the µ-TBL system's template formalism.
Train and test, and see what happens! From the OS prompt, run:
> ./mutbl -fexamples/np_chunking.script
Inspect the script for information about where templates and training and test data are located.
In your report, quote the results that you get. Also, insert the command 'print_used_templates' (see the manual) in the script file directly after the 'learn_rule_seq' command. Create a new template file where you use this subset of templates only. Notice any difference in speed of learning? Explain!