Part 1: Group Exercise
We are interested in building a language model over a language with three words: A, B, C. Our training corpus is
AAABACBABBBCCACBCC

First train a unigram language model using maximum likelihood estimation. What are the probabilities? (Just leave in the form of a fraction)?
P(A) =
P(B) =
P(C) =

Next train a bigram language model using maximum likelihood estimation. Fill in the probabilities below. Leave your answers in the form of a fraction.
P(AA) =
P(AB) =
P(AC) =
P(BA) =
P(BB) =
P(BC) =
P(CA) =
P(CB) =
P(CC) =

Now evaluate your language models on the corpus
ABACABB
What is the perplexity of the unigram language model evaluated on this corpus?
What is the perplexity of the bigram language model evaluated on this corpus?

Now repeat everything above for add1 smoothing.