CS221

Research in Machine Translation (MT) began in the early 50s, in an attempt to translate Russian into English during the height of the Cold War. The early days of MT mostly used a rule based system to map Russian words and phrases to their English counterparts. These early systems suffered from highly literal translations without a deeper understanding of the semantic meaning and context of the phrase. A great example of this is the sentence below which was translated to Russian and then back to English using an MT system:

"The spirit is willing, but the flesh weak."

"The vodka is good, but the meat is rotten"

After these early attempts, research in MT was relatively dormant from the late 60s until the early 90s. With the birth of the internet and subsequently the massive availability of information, MT saw a revival. In particular a statistical approach to MT was put forward and heavily adopted in the mid 90s. This approach worked much better than its rule based counter part from the 50s, and many of the statistical MT techniques that began in this time are still applied today in state-of-the art MT systems such as Google Translate.

One of the main insights to making headway in MT involves the application of Bayes' Theorem to break the problem down into two smaller and easier to solve subproblems. Suppose we have a French sentence f that we would like to translate into an English sentence e. From a probabilistic perspective, we would like to find the sentence e that has maximal probability given the french sentence f. Using Bayes rule we can write the problem as

In the last equality we use the fact that the probability of f doesn't change and thus does not affect the optimal e that is chosen.

The two terms on the right, p(f|e) and p(e) are the translation model and the language model respectively. Modelling these two distributions is much easier than attempting to model the posterior distribution over english sentences directly. One way of viewing a simple translation model is as a way of finding the most probable 'bag of words' in english that correspond to the french sentence f. The language model can then be applied to turn this 'bag of words' into a syntactically sensical sentence. Another great reason to use this breakdown is that we don't need to use any french to english translated text to train the language model. All we need is huge amounts of english writing and for that we have an essentially unlimited supply, the web! A great intro to statistical MT and translation models in particular can be found in Kevin Knight's famous tutorial.

Once we've successfully learned translation and language models, the last step is to actually search through the space of english sentences to find the most likely one. This problem can be solved using a greedy discrete state space search algorithm.

A great place to start to learn more about statistical MT and more generally Natural Language Processing (NLP) is to take both CS 124 and CS 224N here at Stanford. Dan Jurafsky and Chris Manning, two heavyweights in the world of NLP, are the course instructors.

Beyond that, many open source near state-of-the-art MT systems exist that you can dive into to learn more about the nuts and bolts. Moses in particular is one relatively easy to use and high performing system. If you make it through Knights tutorial above, then try your hand at building a translation model and plugging it into the Moses system to see how it performs!