Simulating the Natural Selection of Language                                          Mike LeBeau

A Review of Nowak et al.Õs Evolution of Universal Grammar                     SSP 205

                                                                                                                        Spring 2005

 

 

            For a long time, a significant criticism of ChomskyÕs views on Universal Grammar has been that, while claiming the existence of an innate Ôlanguage acquisition deviceÕ (Chomsky 1980), he makes no attempt to explain such a bold claim in terms of how it could have come to exist in humans over an evolutionary time span.  Whether or not it is the responsibility of Chomsky and others promoting this view of Universal Grammar to address the evolutionary perspective is a matter of opinion, and some may argue that Ò[generative syntacticians and behaviorists] are merely interested in different aspects of language: Chomsky in structure, we behaviorists in functionÓ (Palmer 2000). Nonetheless, the question of whether Universal Grammar is plausible in a natural selection context is an interesting and important one that, until recently, has not been very seriously addressed.  The work of Nowak et al. attempts to provide the answer that Universal Grammar is indeed possible, perhaps even likely, according to simulation models of natural selection.  They claim to provide Òa mathematical framework for the evolutionary dynamics of grammar learningÓ (Nowak et al. 2001), which is essentially an Òexistence proofÓ (Axelrod 2003) that, under some specific set of circumstances (that is, a simulation with a particular set of rules as its starting point) it is possible to see the development of something like Universal Grammar.  The remainder of this review first briefly attempts to explain Universal Grammar, and then the framework of Nowak et al.Õs simulation and its results.  Along the way, we will question some of the underlying assumptions of both Universal Grammar and the Nowak paper.

            The classic justification for the idea of Universal Grammar is that Òchildren acquire their mental grammar spontaneously and without formal trainingÓ without obtaining enough information to Òuniquely determine the underlying grammatical principlesÓ (Nowak et al. 2001).  This is the classic Òpoverty of stimulusÓ concern (Chomsky 1980).  ChomskyÕs proposal in response to this concern is that humans must contain some innate predispositions for learning the structure of language. Since all languages share some fundamental traits, or rather, the structure of language tends to have many commonalities across the world, Chomsky argues that an innate language acquisition device must be present in the human brain which guides the acquisition process, setting ÔswitchesÕ in the brain for the various syntactic rules of the acquired language.  This was a powerful claim that had, and continues to have, a large impact on linguistics and cognitive science, but it does make a number of large assumptions which may or may not be founded. 

            The assumption mentioned above that children acquire a grammar spontaneously and without training has been questioned in many contexts (e.g., Huttenlocher 1998).  It does not seem by any means certain that this is the case.  One could imagine that children in fact do hear enough language input to accurately learn a language in the time that they do, by making use of their general cognitive abilities and intelligence.  Highly detailed studies are needed to attempt to quantify how much data is ÔenoughÕ to learn a language effectively without any additional help, and even this notion of ÔenoughÕ is very subjective.  Chomsky might claim that there are fundamental aspects of language which are not learned by any input sentences, such as the ability to form sentences using recursive applications of a grammar rule, in which case such long-term studies of child language acquisition may not be relevant. But nonetheless, ChomskyÕs (and Nowak et al.Õs) language acquisition claim makes large assumptions which should not be ignored.

            Another large assumption inherent in ChomskyÕs claims is that humans can make binary grammaticality judgments of every sentence they process.  That is to say, most, if not all, theories of syntax assume that every possible sentence of a language is either strictly grammatical or ungrammatical. It seems clear, however, that this is not the case.  There are many cases in language that lie somewhere on the boundary between grammatical and ungrammatical, and traditional theories of syntax do not seem to take this into account.  Modern attempts at natural language processing have perhaps most obviously illuminated this fact by the difficulty presented in attempts to process language using rule-based systems that assume strict grammaticality.  Statistical methods have been generally more successful at capturing a wider range of natural language than rule-based attempts precisely because they do not make this assumption.  This assumption may be a helpful and justified simplification in order to make generalizations which hold true across much of language, but it is important to point out the issues with making such an assumption.  Rule-based accounts do not seem to ever have the potential to describe language with absolutely complete accuracy and coverage.

            Having outlined some of the questions and concerns surrounding Universal Grammar in general, let us move on to discuss how Nowak et al.Õs paper attempts to support the idea of Universal Grammar and what aspects of the argument seem either suspicious or difficult to accept.  First we will take a look at the mathematical framework put forth, and then we will discuss the implications and results of this framework, and potential concerns about the assumptions and simplifications made in the model.

            Nowak claims that Universal Grammar can be broken down into two components: Ò(i) a mechanism to generate a search space for all candidate mental grammars and (ii) a learning procedure that specifies how to evaluate the sample sentencesÓ (Nowak et al. 2001).  Using this notion, he considers a situation in which a universal grammar  generates a search space of a number of different grammars, , and these grammars all have some compatibility with one another (that is, a matrix  exists which defines the probability of a speaker using one grammar and being understood by a listener who uses another grammar), some payoff for successful communication (that is, a function  defines the reward for an individual using one grammar successfully communicating with an individual using another grammar), some frequency of individuals () who use the grammar, and some probability (from the matrix ) that a parent speaking one grammar will pass this grammar onto his child.  These notions are used to construct a population dynamics equation which, in effect, is the simulation.  One can choose different values for the various elements of the equation and see how the results change.  A fundamental finding in the paper is the existence of a Òcoherence thresholdÓ, a value for learning () which, for each instance of the simulation, must be met or exceeded in order to induce Òthe emergence of grammatical communicationÓ, that is, the dominance of one grammar over the others such that the population is generally able to communicate with one another.

            This mathematical setup is somewhat compelling due to its simplicity, but it could be argued that the framework makes too many simplifications to be a good representation of natural selection and its effect on the emergence of language.  First, the article simply assumes the existence of Universal Grammar without question, even going so far as to state its existence as fact in the introduction.  One might question why the authors are so concerned with demonstrating the evolutionary feasibility of Universal Grammar if they are already so certain that it exists.  It is more likely that the authors are assuming this for the sake of simplicity, and are simply attempting to show a way that, assuming it exists, UG could have developed.  The authors, however, might have done well to be somewhat more modest in their claim, especially if their purpose is to persuade the UG-skeptical.

            Additionally, the assumption that a Universal GrammarÕs function is to generate a search space containing a finite number of specific grammars is highly suspect.  In real-world terms, this would mean containing an absurdly large number of different grammars in the human brain, many of which are unused.  This may seem more plausible if one considers a compact representation of the grammars in the brain, much like a word lattice, specifying grammars only with respect to how they differ from each other, but this is nonetheless a very difficult claim to accept.  Something like ChomskyÕs idea of Ôsetting switchesÕ in the brain which correspond to various grammar rules (Chomsky 1980) seems like a more likely form of Universal Grammar, but it is unclear how such a theory would be evaluated mathematically.  Even the notion that grammars can be individuated and that two speakers of, say, American English, have the Òsame grammarÓ (Nowak et al. 2001) is somewhat difficult to accept.  What constitutes a different grammar?  If one person speaks grammar , which is otherwise identical to grammar  but has some idiom or other idiosyncrasy that another person speaking  does not have, would we have to say that the two grammars are different, but just happen to have a very high  value?  If so, the size of the grammar space would be enormous indeed.

            The paper then turns to actual calculations of the language coherence threshold in terms of the number of grammars  that the universal grammar produces.  Nowak claims that two types of learning procedures, that of the Òmemoryless learnerÓ and that of the Òbatch learnerÓ, respectively define the upper and lower bounds for the grammar search space.  The memoryless learner starts with a random Ôhypothesis grammarÕ, and uses it until a speaker generates a sentence incompatible with this grammar, at which point the memoryless learner simply chooses another random hypothesis grammar.  For the memoryless learner, this continues until he finds the right grammar, at which point every sentence generated by speakers will be compatible.  The batch learner stores every utterance in a mental ÔdatabaseÕ, and forms his grammar hypothesis according to the grammar that fits with all the utterances he has heard.  This method very rapidly decreases the size of the search space but has significant cognitive requirements.  Nowak claims that the real technique that humans use must be somewhere between these two extremes.

            This claim seems reasonable, but appears to be without any real basis.  Intuitively, one can understand why the batch and memoryless methods seem to define ends of a spectrum, but Nowak makes no attempt at justifying this claim scientifically.  However, the main purpose of the claim was just roughly to establish ends of the spectrum, and make the larger claim that the real human learning technique probably lies somewhere in between these two extremes in terms of cognitive demand and size of search space, and this claims seems fairly reasonable since it is not particularly bold.  ChomskyÕs Ôswitch-settingÕ concept, mentioned earlier, could likely be seen as falling somewhere in between these two extremes, for example.

            Near the end of the paper, the authors touch upon the concept of different grammars having different fitness levels and thus one being preferred over another, and competition between different universal grammars which allow for novel grammatical concepts to be introduced over time.  Nowak expects Òsearch spaces [in universal grammars] to be as large as possible but still below the coherence thresholdÓ (Nowak et al. 2001).  Finally, the authors discuss conditions under which recursive, rule-based grammars could succeed over simple, list-based grammars, and determine that, past a certain threshold of number of different Òsentence typesÓ, a recursive system would likely be preferred by natural selection, in order to improve efficiency and cognitive demand.

             The argument about recursive lists being fitter in a natural selection context seems fairly solid, but does presume that a certain number of sentence types would be required.  It is not clear whether this answers the question of Òhow recursion helped in the hunt for mastodonsÓ (Pinker 2003), because of this very issue of requiring a certain level of complexity before recursion becomes advantageous.  It seems reasonable to think that such complexity is highly useful even to a primitive human society, though, in that Òit makes a big difference whether a far-off region is reached by taking the trail that is in front of the large tree or the trail that the large tree is in front ofÓ (Pinker 2003).  For this reason, the recursive rule-based grammar argument is perhaps the most compelling (and yes also perhaps the most modest) in the entire paper.  So although this argument may answer the ÒmastodonsÓ question, a general concern remains that, because of the simplicity of the model, the results may not go very far in characterizing real language in adequate detail.

            Nowak et al.Õs paper attempts to address the concern that ChomskyÕs arguments for the existence of a Universal Grammar are purely logical arguments with no basis in reality.  In some ways, however, one could argue the very same about NowakÕs argument.  Nowak presents Òexistence proofsÓ which show that it is, in some model, possible to see the emergence of coherent communication via language.  What he fails to demonstrate is how Universal Grammar itself might come to be in the first place; rather, he assumes the existence of a Universal Grammar, and then discusses ways in which it could result in grammatical communication.  Furthermore, one might argue that these existence proofs come no closer to having a basis in reality than ChomskyÕs original arguments, because they are purely simulations built on a constructed premise.  In many ways, the authors present a compelling argument for the possible development of language, given a Universal Grammar, especially certain aspects of language such as its recursive, rule-based structure.  Questions, however, remain about how adequately such a simple simulation can capture the reality of language, and whether assumptions made by the simulation are perhaps too many, too large, and too simplifying.
References

Axelrod, R. (2003). Advancing the art of simulation in the social science. Japanese Journal for Management Information Systems.

 

Chomsky, N. (1980). Rules and representations. New York: Columbia University Press.

 

Huttenlocher, J. (1998). Language input and language growth. Preventive Medicine, 27(2), 195-199.

 

Nowak, M.A., Komarova, N., & Niyogi, P. (2001). Evolution of universal grammar. Science 291, 114-118.

 

Palmer, D. C. (2000). ChomskyÕs Nativism Reconsidered. The Analysis of Verbal Behavior 17, 51-56.

 

Pinker, S. (2003). Language as an adaptation to the cognitive niche. In Christiansen, M.H. and Kirby, S. (Eds.), Language Evolution. New York: Oxford University Press.