My research takes the position that listeners are exquisitely sensitive to patterns of covariation in speech —that knowing these subtle patterns is part of what it means to truly know a language.

a cartoon of Kevin by Markus Nee
When we speak we communicate not only our ideas, but also our identities...

Which is great because if people sound the way we expect them to, we're able to understand them better (McGowan, submitted; 2012 LSA presentation)

proportion accurate transcriptions by visual stimulus and listener experience level Asian woman simple silhouette Caucasian woman

even if our expectations are based on stereotypes rather than authentic experience (McGowan, in press).

proportion `is authentic' ratings to authentic and imitated Chinese-accented English
Sumner et al. 2014, proposed dual route approach to speech perception
In Sumner, Kim, King, and McGowan (2014) we propose a model (above) of how the linguistic and the social aspects of speech interact to support perception. We propose that listeners process both phonetically cued social information and phonetically cued linguistic information prior to word recognition and that these dual routes can interact.
So does all this knowledge and sensitivity only apply to social variation?

First, some quick background on how sounds like [p], [t], and [k] differ from sounds like [b], [d], and [g] at the beginning of English words like pit and bit. What word is this native American English speaker saying?

spectrograms of [pɪt] and [bɪt] The image to the left is a spectrogram (frequency analysis over time) of the word pit. Hear the puff of air at the beginning? It is highlighted in blue in the spectrogram. pit and bit both start with the lips completely closed. One of the main differences between them is the duration of the puff of air, this duration is called VOT (voice onset time).  

[pʰɪt]

[bɪt]

At least in American English, that puff of air is so important that cutting it out of pit (that first sound you played) results in a word that sounds a lot like bit —though probably with a funny [b], and that funniness is every bit as interesting and important as the change from [p] to [b]!

image of ear from wikipedia
My research suggests that when we listen to speech we are phenomenally sensitive to covarying patterns of phonetic detail. One such covarying pattern is the tendency for VOT to be shorter in a fast speech style than in slower speech...

In fact, removing most of the VOT from [p], [t], and [k] words makes them less useful to listeners (shortest green bar) in slow (Citation) speech, but if the rest of the word is spoken quickly the short VOT sounds fine (Fast speech, on the right) (abstract).

proportion accurate transcriptions by visual stimulus and listener experience level
Okay, but slower, more careful speech must be easier to understand than faster, more casual speech?

Well.. no! When we tested how well careful and casual speech styles activate meanings for listeners in a sentence like "Elephants are big animals". We found that casual speech was actually more helpful than careful speech. (CUNY 2014 presentation).

We also found this with eye tracking...

When hearing sentences with predictable final words, listeners were able to look at the intended picture more quickly with casual speech than with careful speech. This and other evidence supports the hypotheses of Lindblom (1990) and Sumner (2013) that casual speech is processed using more world and contextual knowledge than careful speech.

Another covarying feature is the way vowels before nasal consonants in English tend to be nasalized. Listeners can use this as soon as it becomes available, not only a large distinction like bend/bed...

but also a much more subtle distinction like the difference in nasalization between these two sound files. Can you hear a difference?

spectrograms of early and late onset nasalization 'bend' This first recording has late nasalization starting 100 miliseconds after the [b]. This second recording has early nasalization starting 33 miliseconds after the [b].

In an eye tracking task we found that listeners can use nasalization as soon as it is present. Looks to the heavily-nasalized word were, on average, 60 ms faster —the same average difference between early and late nasalization in the recordings (Beddor, McGowan, Boland, Coetzee, and Brasher, 2013).

a drawing of a human brain, I'm not sure whose

Whether the information is social, contextual, articulatory, or idiosyncratic, we humans have an astonishing ability to attend to it, remember it, and activate it during perception. This ability, my research suggests, is not irrelevant to linguistic competence or even peripheral to it, it is fundamentally and centrally part of what it means to know and speak a human language.

Thank you for reading! If you have any questions, please contact me via e-mail or twitter.
And many, many thanks to my friend Markus Nee for turning me into this cartoon.

a cartoon of Kevin by Markus Nee