CS 124 / LING 180 From Languages to Information, Dan Jurafsky, Winter 2014
Week 2: Group Exercises on Language Modeling, Tuesday Jan 21, 2014

  1. Part 1: Group Exercise

    We are interested in building a language model over a language with three words: A, B, C. Our training corpus is

    For the purposes of this problem let's assume we do use an end token. (In class we tried an experiment without using an end token to explore what happened to the probabilities, but for the purposes of studying for the final let's return to the standard correct approach of using an end token.)

    1. First train a unigram language model using maximum likelihood estimation. What are the probabilities? (Just leave in the form of a fraction)?
      P(A) =

      P(B) =

      P(C) =

    2. Next train a bigram language model using maximum likelihood estimation. Fill in the probabilities below. Leave your answers in the form of a fraction.
      P(A|A) =

      P(A|B) =

      P(A|C) =

      P(B|A) =

      P(B|B) =

      P(B|C) =

      P(C|A) =

      P(C|B) =

      P(C|C) =

    3. Now evaluate your language models on the corpus
      What is the perplexity of the unigram language model evaluated on this corpus?

      What is the perplexity of the bigram language model evaluated on this corpus?

    4. Now repeat everything above for add-1 smoothing.

Part 2: Challenge Problems
  1. Suppose you build an interpolated trigram language model, with three weights lambda1 for unigrams, lambda2 for bigrams, and lambda3 for trigrams. Normally we set these lambdas on a held-out set. Suppose instead we set them on the training data. This will cause the lambdas to take on very unusual values. What will these lambdas look like? Why?

  2. Show that if we estimate two bigram language models using unsmoothed relative frequencies (MLE), one from a text corpus and the second from the same corpus in reverse order, the models will assign the same probability to new sentences (when applied in forward and backward order respectively). Hint: First write out the entire equation for sentence probabilities in terms of counts.