Ling187/287 Assignment 2

Ling187/287: Grammar Engineering

Homework Assignment for Week 2

Due: Friday, January 30 (by noon)
Submit assignments electronically to all three teachers (ron.kaplan "at" microsoft.com, tracy.king "at" microsoft.com, mforst "at" parc.com)

Turn in:	1. the final grammar you end up with (eng-week2.lfg for PART 1-3 and eng-week2-mltsec.lfg for PART 4);
	please name it lastname-eng-week2.lfg
	2. your revised testsuite with parse statistics (eng-week2-test.lfg.new);
	please name it lastname-eng-week2-test.lfg.new
	3. a rough estimate of how long this took so that we can adjust future assignments as needed

Exercises on:
PART 1:	templates
PART 2:	testsuites
PART 3:	feature declarations
PART 4:	multiple lexicon and grammar sections
	Part 4 is completely separate from the other parts;
	if you get stuck on the other parts, try this for a change.

Start from the grammar eng-week2.lfg

Do not use punctuation or capital letters; in later grammars we will add these in.

If you put a file called xlerc in the directory with your grammar and in xlerc you put:

  create-parser eng-week2.lfg

then whenever you start xle in that directory, it will automatically load eng-week2.lfg. This will save a lot of time when making and testing changes.

PART 1: Templates

If you look at the lexicon in your current grammar, you will see that a lot of material is repeated. This can lead to mistakes and makes it difficult to maintain grammars because if you change an analysis you have to make the change in many places. To capture regularities, XLE/LFG has a formal device called a template.

EXERCISE 1 - Lexicon templates

Use the existing templates as models to redo the lexicon using templates (see the entry for orange and the templates it calls). Your resulting lexicon should have no ^ in it, although some lexical entries may call more than one template. You can use the following template names; feel free to add additional templates to group these or to have these templates call other more basic ones:

ASPECT: aspect feature
NOUN-PL: plural noun
NOUN-SG: singular noun
NUM: num feature
PRED: assigns basic pred without arguments
PREPOSITION: preposition
SPEC: spec feature
TENSE: tense feature
V-3SG: verb with third singular subject (-s verb)
V-DITRANS: ditransitive verb (give)
V-INTRANS: intranstive verb
V-NOT-3SG: verb with non-third singular subject (bare verb)
V-OBL: verb with an oblique argument
V-TRANS: transitive verb
VPROG: progressive verb (sleeping)

To see how a lexical entry expands, on the xle command line try:

   print-lex-entry orange

When you make a change to the templates, you must restart XLE for it to take effect. Templates are like grammar rules in this respect. XLE should warn you if you forgot to restart.

EXERCISE 2 - Grammar templates

Templates can be called from the grammar rules. Look at the templates:

   UP-OBJ = "annotation to assign object function"
	  
	  @(UP-GF OBJ).

   UP-GF(_GF) = "generic annotation to assign a grammatical function"
	  
	  (^ _GF)=!.

In the VP rule, replace:

   (^ OBJ)=!

with:

   @UP-OBJ

Restart the grammar and parse:

   the monkey devoured a banana

To see how your new rule expands, on the xle command line try:

   print-rule VP

Create similar templates and calls for SUBJ, OBJ2, and OBL.

Also create templates and calls for CASE and for the ! $ (^ ADJUNCT) annotation.

Turn in: Submit the final version of your grammar including the additions for PART 3 (that is, the changes for this part and PART 3 can be included in the same grammar file).

PART 2: Testsuites

As the grammar expands, it is very easy to make changes that effect sentences in ways you did not expect. To help detect this problem, you can create a testsuite.

Look at the basic testsuite eng-week2-test.lfg (the emacs library works best if you name your testsuite with a .lfg suffix). # is used to introduce comments. Each item to be parsed is on its own line, surrounded by blank lines. The default parse category is defined by ROOTCAT in the grammar; here it is S. If you want to parse another category, it must precede the item with (e.g., NP: a monkey).

In xle, try:

   parse-testfile eng-week2-test.lfg

This will produce several files:

*.new: the testfile but with the newest stats added
*.stats: a file with just the stats
*.errors: a file with any differences between this test run and the last; you should not have any at this point

Add sentences to the testfile that will cover all the basic grammar rules. For example:

with an object
with an object and second object
with a PP adjunct
etc.

Add some NPs to test out the NP rules. For example:

with different determiners
without a determiner
with a PP adjunct
etc.

Parse your new testfile. Make sure that all the items parse and get the correct number of parses (usually 1, but there may be some legitimate ambiguities which will result in 2 or more parses).

Turn in: Submit the .new version of your new testsuite.

PART 3: Feature Declarations

Like with changes to the rules and templates, if you change anything in the CONFIG or the feature declaration, you must restart XLE.

EXERCISE 1 -- Create a Feature Declaration

You need to create a feature declaration for eng-week2.lfg. (If you got stuck on PART 1 or 2, you can create a new version of the grammar for this part.) Do this in the following steps:

In the CONFIG, add a line:
```
     FEATURES (DEMO ENGLISH).
```
Make sure to include the ending period.
Create a feature section just after the ---- that end the CONFIG
(all sections start with the name and version number and end with four hyphens):
```
  DEMO ENGLISH FEATURES (1.0)


  ----
```
Start XLE. At this point it will load and allow you to parse things you could parse before.
As soon as you start adding features, it will not load the grammar until you have them all correct. XLE only checks the grammar, not the lexicon. So, features that are triggered only by the lexicon will not cause problems until you try to parse them; the next step will tell you how to systematically trigger these feature warnings.
As a first step, list each feature in the feature section, ending it with just a period. For example, after you add NUM it should look like:
```
  DEMO ENGLISH FEATURES (1.0)

  NUM.
  ----
```
You can either examine the grammar to figure out the list of features or you can read them off of the XLE warnings.
Remember to restart XLE after any changes to the feature declaration.
Note: You do not need to add features that are listed as GOVERNABLERELATIONS or SEMANTICFUNCTIONS. You also do not need to list PRED (this is a system declared feature).
If you add these in as a record keeping device, that is fine, but XLE does not require it.
To make sure you got all the features in the lexicon, on the XLE command line type:
```
  regenerate "a girl laughed"
```
Keep adding features until XLE has no more warnings and returns (the number of CPU seconds will depend on your machine):
```
  A girl laughed

  regeneration took 0.2 CPU seconds.
```
Once you get the grammar back to where you can load it and parse, fill in the values of the features. For example, after you add NUM it should look like (only with a lot of other features that just end in a period):
```
  DEMO ENGLISH FEATURES (1.0)

  NUM: -> $ { sg pl }.
  ----
```
There will be two basic formats. Features with atomic values will look like the NUM example above with the basic format:
```
  FEAT: -> $ { val1 val2 val3 }.
```
Some features may only have a single value; you still need to include the {}.
Features that take f-structures as values will have the basic format:
```
  FEAT: -> << [ FEAT1  FEAT2  FEAT3 ].
```
Every feature should have its values listed.
Once again, keep going until when you do:
```
  regenerate "a girl laughed"
```
you get back:
```
  A girl laughed

  regeneration took 0.2 CPU seconds.
```

EXERCISE 2 - Adding New Features

Adding ADJUNCT-TYPE

For every ADJUNCT, we want to know what type of adjunct it is. Modify the ADJUNCT template so that it takes one parameter. This parameter should be the value of a new feature ADJUNCT-TYPE. So, when called in the VP, the call to ADJUNCT should be:

  @(ADJUNCT VP)

And when called in the NP, the call should be:

  @(ADJUNCT NP)

And when called in S, the call should be:

  @(ADJUNCT S)

The f-structure for an ADJUNCT (PP or ADV) should now look roughly like the following for yesterday in yesterday the girl laughed:

  [ PRED    'laugh<(^ SUBJ)>'
    SUBJ    [ ... ]
    ADJUNCT { [ PRED 'yesterday'
                ADJUNCT-TYPE S ] } ]

Make sure to update the feature table as well as the template.

Adding TNS-ASP

Currently there are two features TENSE and ASPECT which can occur in the f-structures of verbs. Create a new feature TNS-ASP which takes the current TENSE and ASPECT features as its values. So, the f-structure of the girl is devouring a banana should look roughly like:

  [ PRED   'devour<(^SUBJ)(^OBJ)>'
    SUBJ   [...]
    OBJ    [...]
    TNS-ASP  [ TENSE pres
               ASPECT prog ] ]

In addition to modifying the templates, you will need to create a new entry in the feature declaration for TNS-ASP (note that you should not need to modify the TENSE and ASPECT declarations).

Turn in: Your new grammar with the feature declaration and the new ADJUNCT-TYPE and TNS-ASP features.

PART 4: Multiple Sections

For this part, you should use the grammar eng-week2-mltsec.lfg. This grammar has two additional lexicon and rule sections in it:

  DEMO-PLUS ENGLISH RULES (1.0)

  ----

occurs just after the DEMO ENGLISH RULES and:

  DEMO-PLUS ENGLISH LEXICON (1.0)

  ----

occurs just after the DEMO ENGLISH LEXICON section at the very end of the file.

As a first step, you need to modify the RULES and LEXENTRIES listings in the CONFIG so that the DEMO-PLUS sections are more highly ranked than the DEMO ones that are already there. If you don't remember how to do this, either look in the XLE documentation or at the slides for week 2.

EXERCISE 1: Lexicons

Add an entry for orange in the DEMO-PLUS lexicon so that orange is now both a noun and an adjective (see the entry for purple for a sample adjective entry).

The DEMO rules define an exremely simple AP (adjective phrase) rule. Modify the DEMO NP rule to allow you to parse things like:

  a orange monkey devoured a purple banana
  a purple orange monkey laughed
  the girl devoured a orange orange

Add entries for ate and eats in the DEMO-PLUS lexicon so that they are both transitive (V-TRANS) and intransitive (V-INTRANS). You should now be able to parse:

  the girl ate

  the girl ate a banana

Now add a new noun of your choice to the DEMO-PLUS lexicon and make sure you can parse it.

EXERCISE 2: Rules

The AP rule simply goes to A. In the DEMO-PLUS rules, create a new AP rule that allows very to optionally appear in front of the A.
(Hint: make very some new c-structure category such as AMOD and then make a lexical entry for it that looks similar to that of adverbs like today only with this new c-structure category.)

Make sure you can parse:

  a very purple monkey laughed

In the DEMO-PLUS lexicon add an entry for one other adjectival modifier similar to very; in a comment in the entry list a sentence your grammar can parse that uses this word.

Write a rule in your DEMO-PLUS rules that says:

  VPaux --> FALSE.

Figure out what effect this has on your grammar. In XLE, you can use:

  print-rule VP

  print-rule VPaux

to see what the rules expand to.

In a comment after the new VPaux rule, state what this basic affect was and list one sentence whose behaviour has changed with the addition of this rule.

Comments in the grammar are anything between "" (these have to be the straight up and down, non-directional quotes). There are some sample comments in the templates. Comments can be many lines long. An example from eng-week3-mltsec.lfg:

   NOUN-SG(_P) = "template for singular nouns"
	  @(PRED _P)
	  @(NUM sg).

Turn in: The new version of your grammar with the new lexicon and rule sections and the comment about the VPaux rule in it.

If you have any questions, you can send us email (ron.kaplan "at" microsoft.com, tracy.king "at" microsoft.com, mforst "at" parc.com), call us (Ron: 650-245-6865; Tracy: 415-8487276, Martin: 650-812-4788), or set up office hours with us.