The study of natural language semantics cannot make real headway by treating meaning solely in the philosophico-mathematical abstract: it must make reference to how we as speakers of a particular language use language on a day-to-day basis. Any theory of natural language meaning must take seriously language use in context. As such, my work aims to combine insights from mathematical logic and psychology to develop computationally and psychologically viable models of natural language inference (NLI). Over the past five years, I have woven together a research program centered around non-traditional proof-theoretic semantics, computational semantics and pragmatics, as well as more traditional approaches to meaning, themselves augmented with advancements in model, order, and measure theory. The various aspects of my program provide formally precise and quantitative ways at looking at the same question: how is successful communication between (at least) two speakers possible? By being both formally quantitatively precise, the semantic analyses I develop are transparent and make explicit, testable predictions and are reusable and extendable by other researchers.
With Christopher Potts and Sven Lauer, I developed the Card Corpus, a highly structured corpus of 744 task-oriented dialogues collected with the goal of informing models of pragmatics and discourse. The corpus distribution includes Python and R code for working with the corpus as well as a slide show documenting its properties and reporting on some pilot studies. Version 2 can be found here. Some papers associated with this project are listed below:
I developed the House Proceedings Corpus (HPC), a highly structured corpus of complete congressional house proceedings that contains over 2,700 transcripts, tagged for part-of-speech (POS) using the Stanford POS tagger. The HPC is comprised of individual .JSON files to avoid data-corruption and easily importable into a MongoDB. The HPC has 181,648,994 tokens with a vocabulary of 314,031 words. The corpus itself is available upon request, and a Python wrapper for working with the corpus (as well as other tools) are available here.
I became interested in the Finnish case system in 2010, when I took a class offered by Paul Kiparsky , Lauri Karttunen and Arto Anttila. In Finnish, the object of a transitive verb is case marked in one of two ways: with the accusative or partitive. Which case is assigned is a function of the lexical semantic properties of the verb, its object and the way in which their meanings are put together. The interesting question is: which lexical semantic properties trigger which case assignment? I argue that Finnish states (in the sense of Vendler (1967)) which lexically encode for the property of existential commitment assign accusative case to their objects, whereas ones which lack it assign partitive. This observation is consistent with Barbara Partee's observation regarding the relation between intensionality and the genitive of negation in Russian.
I have re-conceived of OT in algebraic, or equivalently, propositional terms. Given that, for a set of arbitrary constraints and candidate, i.e., an input/output pair, the set of all grammars over that constraint set that make that candidate optimal can be determined, it is natural to think of a candidate in terms of the set of grammars that make it most optimal. In this way, notions like 'candidate entailment' can naturally be defined, allowing the OT-theorist to reason about datasets in strictly algebraic or logical terms.
OT Orders is a web-app designed by Cameron Jeffers that will primarily function as an Optimality Theory (OT) constraint-ranking app. Implemented using Flask, it is built on a Python implementation of my solution to the ranking problem in theoretical OT.
With Arto Anttila, I have solved the ranking problem in Partial Order Optimality Theory (PoOT), which can be stated as follows: allowing for free variation, given a finite set of input/output pairs, i.e., a dataset, that a speaker knows to be part of some language, how can learn the set of all PoOT grammars under some constraint set compatible with that dataset?
For an arbitrary dataset, we provide set-theoretic means for constructing the set of all PoOT grammars compatible with that dataset. Specifically, we determine the set of all strict orders of constraints that are compatible with dataset. As every strict total order is in fact a strict order, our solution is applicable in both PoOT and classical optimality theory (COT), showing that the ranking problem in COT is a special instance of a more general one in PoOT. Currently, Arto and I are developing a web-application implementing the solution laid out below:
The role of inference as it relates to natural language (NL) semantics has oft been neglected. Recently, there has been a move away by some NL semanticists from the heavy machinery of Montagovian-style semantics to a more proof-based approach. This represents a belief that the derivability plays as central a role in NL semantics as that of entailment.
I have begun work in this area by logicizing certain aspects of Bill MacCartney's algorithmic approach to natural logic and proved a completeness theorem for it. Code associated with this project can be found here. Some papers and talks associated with this project are listed below:
More to come.