Training a Dialogue Act Tagger

General Idea

In this example – based on (Lager & Zinovjeva 1999) – we will see that the µ-TBL system can be used to train a dialogue act tagger. We'll adopt the approach of (Samuel et al. 1998), and we'll pick our training and test data from the Maptask corpus (Carletta et al. 1997).

Each utterance is represented as a list of words, and tied to each utterance is information about speaker and dialogue act (refer to the paper for more details). Here is an example:

speaker(1494,g).
u(1494,['Right']).
da(1494,acknowledge).
da(acknowledge,ready,1494).


speaker(1495,g).
u(1495,['So',the,start,is,at,the,top,'left-hand',side,of,the,page]).
da(1495,acknowledge).
da(acknowledge,explain,1495).


speaker(1496,f).
u(1496,['Uh-huh']).
da(1496,acknowledge).
da(acknowledge,acknowledge,1496).


speaker(1497,g).
u(1497,['Do',you,have,cliffs,',',just,to,...,to,the,right]).
da(1497,acknowledge).
da(acknowledge,query_yn,1497).

We will work with a small set of templates expressed in the µ-TBL system's template formalism. The idea here is to make the conditions for changing the tag of an utterance sensitive to the actual words and word combinations used in the utterance, the length of the utterance, the previous dialogue act(s), the speaker’s role in the dialogue (giver or follower), and whether the speaker has changed since the previous utterance.

Train and test, and see what happens! From the OS prompt, run:

> ./mutbl -f examples/dact_tagging.script

Inspect the script for information about where templates and training and test data are located.

For your report, it is enough to just quote the performance figures that you get. Dialogue act tagging is, of course, a much more difficult task than part of speech tagging and this is reflected in the comparatively low accuracy that we get. However, we can improve on this by using more training data, by expanding the set of rule templates, and by employing various kinds of background knowledge. Experiment with this if you want. If you feel more training data would help, check out the file 'maptask_train.pl' in the data directory. You may also want to disable some of the templates in the template file in order to see how that effects the performance.