25 February 2003

Probabilistic and Distributional Modeling of Linguistic Knowledge: Perspectives from Comprehension, Production, and Learning

Dan Jurafsky

University of Colorado, Boulder

Human language processing is fundamentally probabilistic in nature. Thus while it is critical for linguists to model deep, rich structural knowledge at many linguistic levels, it is equally critical to understand the way this knowledge is used probabilistically by human language users.

This talk summarizes a number of results from our lab on the role of probabilistic and statistical knowledge in human language processing in comprehension, learning, and production of phonological, morphological, lexical, and syntactic structure. In comprehension, we show that humans compute the probability of an interpretation in order to resolve lexical, syntactic, and semantic ambiguities, and I discuss recent extensions of these results to aphasics. In production, we show that speakers compute the probability of words in language production to help determine the surface form the words should take, and that this probability is reflected in both prosodic structure and segmental form. I'll also offer some functional reasons based on speaker-hearer interaction for why probability might matter in production. In learning I'll talk about how rich linguistic prior structure can be viewed as a `learning bias' and hence combined with empirical, distributional learning, to attack the problem of learning phonological and morphological structure. This talk describes joint work with all sorts of really smart people.