Ling187/287: Grammar Engineering

Homework Assignment for Week 4

Due: Friday, February 20 (by noon)
Submit assignments electronically to all three teachers (ron.kaplan "at" microsoft.com, tracy.king "at" microsoft.com, mforst "at" parc.com)


and name-eng-week4.lfg
Turn in: 1. the final grammars you end up with (eng-week4-amb.lfg and eng-week4.lfg)
Please name your grammars with name-eng-week4-amb.lfg
Exercises on:
PART 1: Ambiguity
PART 2: OT marks for filtering of analyses
PART 3: OT marks for robustness
PART 4: FRAGMENTS for robustness

There are two grammars this week:

If you put a file called xlerc in the directory with your grammar and in xlerc you put:

  create-parser eng-week4-amb.lfg (for PART 1)

or

  create-parser eng-week4.lfg (for all other parts)

then whenever you start xle in that directory, it will automatically load the respective grammar. This will save a lot of time when making and testing changes.


PART 1: Ambiguity

eng-week4-amb.lfg has been poorly engineered and contains a number of spurious ambiguities. The task is to remove them.

EXERCISE 1 -- Rules

Parse the sentences:

  he saw her.

  he saw the telescope.

These sentences both get two parses; they should only get one. Note that the lower right xle window has two blank boxes; this is usually, but not always, a sign that something is wrong since it indicates two identical f-structures. Alter the grammar so that they each get only one parse. Make sure that you can still parse:

  they ate.

  he gave her a telescope.

EXERCISE 2 -- Templates

Parse the sentences:

  he eats.

  he sees her.

  he gives her the telescope.

These sentences all get two parses; they should only get one. Alter the grammar so that they only get one parse. [Hint: try parsing some sentences plural subjects and see how they behave.]

Count nouns in English either need to be plural or to have a specifier; this grammar was meant to encode that generalization. This was done by having count nouns like girl and banana call COUNT-NOUN-SG and COUNT-NOUN-PL.

Parse:

  the girls ate.

  she ate the bananas.

These both get two parses; they should only get one. Compare them to sentences with plurals without determiners (girls ate.) and with singulars with determiners (the girl ate.). Alter the grammar so that they only get one parse and you can still parse:

   girls ate.

   the girl ate.

But you should not be able to parse:

   girl ate.

PART 2: OT Marks for Filtering of Analyses

EXERCISE 1: Dispreference Mark

In some contexts what seem to be strictly transitive verbs can appear without an object (often in "recipe" contexts). The task is to allow transitive verbs to have a null object and then disprefer this analysis with an OT mark.

Alter the V-TRANS template to have a second disjunct which provides a null object. If you parse:

   they devour.

you should get an f-structure that looks roughly like:

   [ PRED 'devour'
     SUBJ [ PRED 'they' ]
     OBJ  [ PRED 'pro'
            NTYPE null ]
     TENSE present ]

The trouble with this is that verbs that can be both transitive and intransitive, like eat will now get two analyses: one where the verb is intransitive and one where it is transitive but with a null object. To see this, parse:

  they eat.

Add an OT mark ObjDrop to the V-TRANS template disjunction that you just added and make it a dispreference mark in the OPTIMALITYORDER. Note that there is a template OT-MARK that you can use for this to avoid dealing with the o:: notation. The call would look like:

  @(OT-MARK ObjDrop)

If you now parse:

  they eat.

you should get 1+1 readings. The analysis you see should be the intranstive version of eat. If you click on the "u" in the lower right window, you should be able to see the other analysis with the null object.

EXERCISE 2: Removing part of a rule

This grammar allows for N-N compounds like tractor trailers. It also allows adjectives to modify the first N of a N-N compound. Try parsing:

  NP: the orange box
  NP: the good orange box

The first will have two analyses: one where orange is an adjective and one where it is a noun. The second will have four analyses (NN, AA, and two AN ones with different scopes).

The task is to remove the possibility of having an adjective modify the first N in a N-N compound. Do not do this by commenting out that part of the rule. Instead, add an OT mark RemoveAinNmod to the appropriate part of the grammar and make the mark a NOGOOD mark (i.e. to the left of NOGOOD in OPTIMALITYORDER).

Now when you parse:

  NP: the good orange box

You should only get three readings.

EXERCISE 3: Passive by OBL

This will probably be the hardest part of this assignment. First you need to get the passive to work properly (if you cannot get it to work properly, you can still do the OT part below).

Important: Go to the PASSIVE template and comment in the part that is commented out. Otherwise, nothing will passivize.

Try parsing:

  they were pushed.

This gets four analyses. One is correctly the passive with a PASSIVE feature and a passive subcat frame. The others involve intransitive subcat frames or don't have the PASSIVE feature. You need to block the intransitive reading and make sure there is a PASSIVE feature. The basic idea is that the auxiliary should only show up when there is a participle (passive or progressive) and conversely the participles should only show up when there is an auxiliary. One way to do this is by using a combination of defining and constraining equations for PASSIVE and for a new feature VFORM which has the values pass and prog (parse they were eating bananas. to see it for the progressive). Note that the tense of a passive depends on the tense of the auxiliary.

Now try parsing with a by phrase. (If you did not manage to constrain the passive, you may have more parses than reported here; however, you can still constrain the overall number of optimal parses as described here.)

  they were pushed by them.

There should be two readings: one with the by phrase being an ADJUNCT and one with it being an OBL.

The task is to prefer the OBL reading. Insert an OT mark ByObl in the PASSIVE template. Make it a preference mark by putting a + in front of it in the OPTIMALITYORDER in the CONFIG (warning: do not put a + in front of it in the template). When you parse:

  they were pushed by them.

you should now get 1+1 parses, with the displayed parse being the one with the OBL. Once again, you can click on the "u" in the lower right window to see the other parse (you might have to click through the f-structures in the lower left window to actually display it).

PART 3: OT Marks for Robustness

Use the grammar you created for Part 1 and the input grammar for Part 2. You will just be making some additional changes.

To make the grammar more "robust", you can make it so that you can parse mismatched subject-verb agreement. First, to make sure the grammar is behaving as you expect, parse:

  they devour bananas.
  he devours bananas.

  they devours bananas.
  he devour bananas.

The first two should parse. The second two should get a c-structure but not an f-structure.

Put an OT mark BadVAgr into the relevant templates so that the last two sentences get a parse. In the OPTIMALITYORDER, put an * in front of it (warning: do not put an * in front of it in the templates). When you parse:

  they devours bananas.
  he devour bananas.

They should now get *1 solutions. Also, the lower left window should say "UNGRAMMATICAL" in it. The first two sentences should still get 1 parse, and no ungrammatical warning.

PART 4: Fragments

The goal of this exercise is to add a fragment grammar to the current grammar to improve robustness. If you look at the CONFIG, you will see that the REPARSECAT has already been defined:

    REPARSECAT FRAGMENTS.

In addition, there is already a lexical entry for -token:

  -token TOKEN * (^ TOKEN)= %stem; ONLY.

although this will need to be augmented slightly.

Modify the current FRAGMENTS rule (which just goes to FALSE) to cover categories such as NP and VP. Use at least four categories that you think will be useful. Note that in order to parse a VP as a FRAGMENT, you will need to provide a null subject for the verb in the VP disjunct. The basic form of a FRAGMENTS rule is:

   FRAGMENTS --> 
 	  { XP: (^ FIRST)=!
 	   |YP: (^ FIRST)=!
 	   |TOKEN: (^ FIRST)=!}

 	  (FRAGMENTS: (^ REST)=!).

where TOKEN is a specially defined category to match things that do not fit in the XP or YP possibilities. Try parsing some things with your new grammar such as:

   the the girl laughed.
   the girl ! laughed.
   ? [ thesk-Tehjsk .

NOTE: To parse the last one, you have to surround the string with {} instead of "". (XLE is a bit picky about what square brackets mean and if you have one in initial position it gets confused.) For example:

   parse {? [ thesk-Tehjsk .}

You should be getting a lot of parses. This is because there is nothing constraining the FRAGMENTS to build the fewest number of chunks and avoid tokens unless necessary.

Add OT marks to the FRAGMENTS rule and to the -token entry to constrain your rule. Make the OT marks ungrammatical ones by prefixing them with an *; this way you will be able to tell quickly if you have triggered the fragment grammar. For the "sentence":

  the the girl laughed.

your grammar should get *1 parse (plus a lot of suboptimals). For other things, you may still be getting a lot of parses, but they should be fewer than you were getting before the OT marks were added.

Turn in: The new versions of both grammars. Please name your grammars with name-eng-week4-amb.lfg and name-eng-week4.lfg.


If you have any questions, you can send us email (ron.kaplan "at" microsoft.com, tracy.king "at" microsoft.com, mforst "at" parc.com), call us (Ron: 650-245-6865, Tracy: 415-848-7276, Martin: 650-812-4788), or set up office hours with us.