In a Hidden Markov Model (HMM) there are states, transitions between states, and symbols emitted by the states. There are two kinds of probabilities associated with a HMM: transition probabilities, i.e. the probability of a transition from one state to another, and output probabilities, i.e. the probability of a certain state emitting a certain symbol. We will illustrate here with a simple HMM in which the states represent parts-of-speech, and the symbols emitted by the states are words. The assumption is that a word depends probabilistically on just its own part-of-speech (i.e. its tag) which in turn depends on the part-of-speech of the preceding word (or on the start state in case there is no preceding word).

An HMM is a generative device. It generates sequences of words. Thus, to tag a given sequence of words, we need a program that computes the sequence of state transitions that has the highest probability of being the one that generated the sequence of words. Let’s assume that the output probabilities are represented as clauses of the form outprob(Word,Tag,Probability), e.g. as:
outprob(a,det,0.300). outprob(can,aux,0.010). outprob(can,v,0.005). outprob(can,n,0.001). outprob(he,pron,0.070).
Furthermore, let’s assume that the transitional probabilities are represented
as clauses of the form transprob(Tag1,Tag2,Probability), e.g. as:
transprob(start,det,0.30). transprob(v,det,0.36). transprob(start,aux,0.20). transprob(v,aux,0.01). transprob(start,v,0.10). transprob(v,v,0.01). transprob(start,n,0.10). transprob(v,n,0.26). transprob(start,pron,0.30). transprob(v,pron,0.36). transprob(det,det,0.20). transprob(n,det,0.01). transprob(det,aux,0.01). transprob(n,aux,0.25). transprob(det,v,0.01). transprob(n,v,0.39). transprob(det,n,0.77). transprob(n,n,0.34). transprob(det,pron,0.01). transprob(n,pron,0.01). transprob(aux,det,0.18). transprob(pron,det,0.01). transprob(aux,aux,0.10). transprob(pron,aux,0.45). transprob(aux,v,0.50). transprob(pron,v,0.52). transprob(aux,n,0.01). transprob(pron,n,0.01). transprob(aux,pron,0.21). transprob(pron,pron,0.01).
A commonly used algorithm for finding the most probable sequence of state transitions in an HMM given a sequence of observed outputs is the Viterbi algorithm (Viterbi 1967). In Prolog, all we need is:
most_probable_hmm_path(Words,Path) :-
probable_paths(Words,[1-[start]],PPaths),
keymax(PPaths,_P-Path1),
reverse(Path1,[start|Path]).
probable_paths([],PPaths,PPaths).
probable_paths([Word|Words],PPaths0,PPaths) :-
findall(PPath,
(outprob(Word,Tag2,PL),
findall(P2-[Tag2,Tag1|Tags],
(member(P1-[Tag1|Tags],PPaths0),
transprob(Tag1,Tag2,PT),
P2 is PL*PT*P1),
AllPaths),
keymax(AllPaths,PPath)),
PPaths1),
probable_paths(Words,PPaths1,PPaths).
Given our sample HMM, the most probable sequence of tags corresponding to “he can can a can” is computed as follows:
| ?- most_probable_hmm_path([he,can,can,a,can],Path). Path = [pron,aux,v,det,n]
By adding a tokenizer, procedures for opening and closing files, and procedures for writing to a file, I have built a small but useful standalone automatic part of speech tagger system (less than sixty lines of Prolog all together). Initial testing shows that it is able to tag 10,000 words in 15 seconds (i.e. around 670 words per second) on a SPARCstation 10, running SICStus Prolog 3.0 (compiled code). Despite (or perhaps due to) its small size it is quite useful.