Volume 9: pp. 99-126

vol9_colbert_white_thumbWhere Apes and Songbirds Are Left Behind: A Comparative Assessment of the Requisites for Speech

Erin N. Colbert-White
University of Puget Sound

Michael C. Corballis
University of Auckland

Dorothy M. Fragaszy
University of Georgia

Reading Options:

Continue reading below, or:
Read/Download PDF| Add to Endnote


A handful of mammalian and avian species can imitate speech (i.e., sounds perceived by humans as those comprising the human communication system of language). Of those species, even fewer are capable of using speech to communicate. While there has been no empirical comparison of nonhuman speech users, parrots are presumed to be the most prolific. In this review, we identify several anatomical, neurological, and sociobiological features shared by parrots and humans that could account for why parrots might emerge as the most advanced nonhuman speech users. Apes and temperate oscine songbirds, due to their phylogenetic similarity to humans and parrots, respectively, are also included in the comparison. We argue that while all four taxa share hemispheric asymmetry of communication areas and basic sociality, humans and parrots share three additional features that are not completely present in apes and songbirds. Specifically, apes, unlike songbirds, parrots, and humans, are not considered vocal learners and do not have sufficient respiratory control to support a speech stream, while parrots, humans, and apes demonstrate complex affiliative social behavior. Along with the above anatomical, neurological, and sociobiological traits, parrots’ affiliative long-term relationships, similar to that of humans, may help explain both groups’ ability to produce and use a wide variety of sounds. Thus, this paper extends parrot–human cognitive comparisons by introducing another similarity—that of complex affiliative relationships—as a possible explanation for why parrots can produce and use speech to communicate.

Keywords: speech; parrots; primates; language; songbirds

Author Note: Erin N. Colbert-White, Department of Psychology, University of Puget Sound, Tacoma, WA 98416; Michael C. Corballis, School of Psychology, University of Auckland, Auckland 1010, New Zealand; Dorothy M. Fragaszy, Department of Psychology, University of Georgia, Athens, GA 30602.

We acknowledge Patrick Murray, Gary Baker, and Marina Popkov for their help in the preparation of this manuscript.

Correspondence concerning this article should be addressed to Erin N. Colbert-White, Department of Psychology, University of Puget Sound, 1500 N. Warner St. #1046, Tacoma, WA 98416. E-mail: ecolbertwhite@pugetsound.edu

Speech is the vocalized form of language, whereby identifiable units of sound (phonemes) are combined to form more complex sounds with referential meaning (morphemes, words), which are in turn combined to form syntactic structures that can serve as descriptions about the world (phrases, sentences). Language can also be represented in other forms, including writing and sign language. Linguists generally agree that fully syntactic language, whether spoken, signed, or written, is unique to humans. Nevertheless, some nonhumans, notably parrots, are capable of producing identifiable renderings of spoken words, and even simple phrases, and using them communicatively. To this limited extent, at least, they may be said to be capable of speech, and in this review we use the term speech to cover referential speech-like vocal communication without any implication of syntactic structure.

Phylogenetically, humans and birds diverged some 300 million years ago (Burish, Kueh, & Wang, 2004). In contrast, hominids and chimpanzees (Pan troglodytes) diverged only 6 million years ago (Zollikofer et al., 2005). Though a common vocalization mechanism cannot be assumed, given these dates, chimpanzees would appear to be the most likely candidate for articulate production and use of speech, not birds. However, there is no indication that apes can articulate any sounds approximating words, precluding them from communicating with speech. Parrots, on the other hand, represent one of the most skilled of all nonhuman speech producers (e.g., Fitch, 2000b; see Pepperberg, 1999 for review).

Some mammalian species can mimic speech or speech-like sounds with varying levels of precision (e.g., harbor seals, Phoca vitulina, Ralls, Fiorelli, & Gish, 1985; one beluga whale, Delphinapterus leucas, Ridgway, Carder, Jeffries, & Todd, 2012; one Indian elephant, Elephas maximus indicus, Stoeger et al., 2012; see Janik & Slater, 1997 for others). Among birds, speech sound mimics include tuis (Prosthemadera novaeseelandiae, Whangarei Native Bird Recovery Centre, n.d.), corvid songbirds (e.g., Pica nuttalli, Noack, 1902), and sturnid songbirds (e.g., Sturnus vulgaris, West, Stroud, & King, 1983). Unlike almost every other nonhuman species, however, parrots can use speech that is identifiable as such to the human ear for communicative purposes. Lab- and home-reared studies have demonstrated the sophistication with which parrots are able to use speech, including referential comments about object properties and numbers, spontaneous recombination of syllables to produce new, arguably context-appropriate, words (e.g., Pepperberg, 1987, 1999, 2006, 2007), and predictable use of words across varying social contexts (e.g., Colbert-White, Covington, & Fragaszy, 2011). This curious similarity between humans’ and parrots’ speech ability is the subject of this discussion. Since we include the communicative function as part of our definition of speech, we exclude mere mimicry. Figure 1 provides a sample of natural vocal abilities, ranging from complex nonspeech communication systems, to speech mimicry, to the use of speech as a medium for language.

Figure 1. Spectrums illustrating differences in speech use and species-typical repertoires for a variety of animals. Exemplar species included in the figure were selected due to their frequent appearance in the literature, and do not necessarily generalize to all species within a given taxonomic order.

Figure 1. Spectrums illustrating differences in speech use and species-typical repertoires for a variety of animals. Exemplar species included in the figure were selected due to their frequent appearance in the literature, and do not necessarily generalize to all species within a given taxonomic order.


To date, great emphasis has been placed on comparing humans to extant apes to understand the speech faculty. Empirical work comparing human and ape vocal tract anatomy (e.g., Duchin, 1990; Kay, Cartmill, & Balow, 1998; Lieberman, Crelin, & Klatt, 1972) and neurobiology (e.g., Gannon, Holloway, Broadfield, & Braun, 1998; Sherwood, Broadfield, Holloway, Gannon, & Hof, 2003) has raised more questions than it has answered, as some point out (e.g., Lieberman & McCarthy, 2007). Comparisons between humans and songbirds, on the other hand, identify similarities in both neurobiology (e.g., Doupe & Kuhl, 1999; Kuhl, 2003; Teramitsu, Kudo, London, Geschwind, & White, 2004) and vocalization acquisition patterns (e.g., Doupe & Kuhl, 1999; Marler, 1970b), among others. Doupe and Kuhl (1999) and Jarvis (2004) provide extensive reviews of this literature. Yet, though parrots are highly adept at producing and using speech sounds to communicate, neurobiological and anatomical evidence comparing humans with parrots is limited (when considering the many papers comparing humans with songbirds). Furthermore, rather than African Grey parrots (Psittacus erithacus erithacus), which are renowned for speech use, comparisons between humans and parrots are frequently made using budgerigars (Melopsittacus undulates, e.g., Jarvis & Mello, 2000; Tu & Dooling, 2012).

Figure 2. Diagram visualizing speech requisites shared by humans, apes, songbirds, and parrots, as well as those that are only shared by some of the animal groups. “Hemispheric asymmetry for communication” denotes asymmetrical size or volume of structures related to communication in either the left or right hemisphere. The term “basic sociality” refers to species that have frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care; we define “complex sociality” as all features of basic sociality with the addition of the presence of discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members. The figure reflects patterns for exemplar ape, songbird, and parrot species, but deviations and exceptions do exist. “H” = Feature possessed by humans; “P” = Feature possessed by parrots; “S” = Feature possessed by songbirds; “A” = Feature possessed by apes.

Figure 2. Diagram visualizing speech requisites shared by humans, apes, songbirds, and parrots, as well as those that are only shared by some of the animal groups. “Hemispheric asymmetry for communication” denotes asymmetrical size or volume of structures related to communication in either the left or right hemisphere. The term “basic sociality” refers to species that have frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care; we define “complex sociality” as all features of basic sociality with the addition of the presence of discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members. The figure reflects patterns for exemplar ape, songbird, and parrot species, but deviations and exceptions do exist. “H” = Feature possessed by humans; “P” = Feature possessed by parrots; “S” = Feature possessed by songbirds; “A” = Feature possessed by apes.

We posit here that there is no one unique characteristic that makes a species capable of speech. We argue that our definition of speech as production and use instead requires a constellation of anatomical, neurological, and sociobiological features, many of which are possessed by species that can neither produce nor use speech (see Table 1). This constellation view is shared by others in the field (e.g., Fitch, 2000b; Wind, 1983). We begin by briefly outlining two now-debunked features previously considered to be necessary for the production of speech by humans. Next, we assess four groups—humans, parrots, apes (predominantly chimpanzees), and passerine songbirds—on features relevant to the speech faculty: basic sociality (i.e., frequent interaction with conspecifics, individual recognition of conspecifics, and extensive parental care), hemispheric asymmetry with a bias for communication areas, vocal learning, finely tuned respiratory control, and complex affiliative social behavior among conspecifics (i.e., discrete repertoire elements for affiliative nonsexual social interaction with conspecifics, social correlates of intelligence, and hierarchical relationships among group members).1 As shown in Figure 2, while songbirds and apes do share some of the features above, humans and parrots as a group possess all of the features. We hypothesize that these specific features are crucial to the speech faculty, and their presence may also relate to the many similarities identified between humans’ and parrots’ cognitive abilities (e.g., Pepperberg, 1999). Thus, by further researching parrots’ wild communication systems from a sociobiological perspective, we may uncover a new parallel to human language.

Features Irrelevant to Production of Speech

Before a species can learn to use speech to communicate, it must first be able to articulate the sounds. The speech faculty has been of interest to anatomists, linguists, neuroscientists, anthropologists, and psychologists alike, resulting in a variety of theories of the requisites of speech in humans. For example, Wind (1983) provided a detailed review of over 100 morphological, physiological, and behavioral features associated with the articulation of words. Davidson’s (2003) shorter list of necessary features included a shortened soft palate, a loss of epiglottic–soft palate lock-up, a narrow supralaryngeal vocal tract (SVT), an oropharyngeal tongue, and an anterior foramen magnum. Both Wind (1983) and Davidson (2003) created their respective lists by comparing modern humans with apes and hominid fossils to determine how and at which point modern humans were able to produce the necessary range of sounds for speech. A descended larynx and a 1:1 SVT ratio are two features discussed in great detail in the literature. However, as neither of these features is present in parrots (or devices like voice recorders, for that matter), they cannot be considered necessary for the articulation of speech. These features are nevertheless detailed here both to show how nonhumans fit into the speech faculty debate, and to introduce additional evidence pertaining to characteristics that are strongly associated with, but not necessary for, speech.

Table 1. Relevant Speech Production Characteristics Across Animal Groups

Table 1. Relevant Speech Production Characteristics Across Animal Groups

Descended Larynx

At birth, the human larynx is similar in location to that of other mammals (Lieberman, 1984). Beginning around three months of age, the larynx gradually descends and the laryngeal musculature develops until about age 6 (Greene & Mathieson, 1989; Lieberman, McCarthy, Hiiemae, & Palmer, 2001; Sasaki, Levine, Laitman, & Crelin, 1977). Fully articulated speech sounds are not achieved until after the second year, when the larynx and associated structures are fully developed (Laitman, Heimbuch, & Crelin, 1978).

In the late 1960s, researchers reconstructed a Neanderthal vocal tract to investigate its vocal production capabilities. According to Lieberman and Crelin (1971), the Neanderthal larynx was similar in location to that of a human infant or nonhuman primate. Given human infants’ inability to produce speech sounds, Lieberman and Crelin theorized that a descended larynx was one of the uniquely human features required for word production. Since that time, the accuracy of Lieberman and colleagues’ anatomical reconstructions has been criticized (e.g., Boë, Heim, Honda, & Maeda, 2002), calling into question the validity of their conclusions regarding the importance of a descended larynx. It is important to note, however, that such criticisms have been refuted by others (e.g., de Boer & Fitch’s 2010 response to Boë et al., 2002).

Today, a majority of researchers agree that possessing a descended larynx is not necessary for speech production (e.g., Fitch, 2000c)—indeed, speech is even possible following laryngectomy (Luchsinger & Arnold, 1965). Research during the late 19th century with preserved nonhuman animal specimens, along with work such as that of Lieberman and Crelin (1971), concluded that humans were the only animals with a descended larynx. Thirty years after Lieberman and Crelin, Fitch’s (2000a) X-ray studies demonstrated that there are species (e.g., dogs, Canis familiaris; goats, Capra hircus; pigs, Sus scrofa; and cotton-top tamarins, Saguinus oedipus) with larynxes that descend during loud vocalizations, some to a position similar to that of humans. Further, in red deer (Cervus elaphus; Fitch & Reby, 2001) and fallow deer (Dama dama; McElligott, Birrer, & Vannoni, 2006), males’ post-pubescent larynx is permanently descended.

As none of the above nonhuman species is capable of producing speech sounds, a descended larynx must be neither uniquely human nor necessary for speech production (Hauser, Chomsky, & Fitch, 2002). However, while human and chimpanzee neonates have similarly high-positioned larynxes at birth, it is only after the human larynx descends that infants can produce the full repertoire of speech sounds in the vocal code (Nishimura, Mikami, Suzuki, & Matsuzawa, 2003). Thus, though a descended larynx is not necessary or sufficient on its own, Nishimura et al.’s work demonstrates that it is an important pre-adaptation in the evolution of speech in humans. Others support such a conclusion (e.g., Fitch, 2000b; Hauser et al., 2002; Pulleyblank, 2008).

Finally, parrot speech debunks the theory that a descended larynx is necessary for speech articulation. Unlike mammals and reptiles that use a larynx, birds use a syrinx to vocalize. The two structures are morphologically and functionally distinct. In particular, the position of the syrinx is much lower in the vocal tract, sitting at the fork of the bronchi. This location allows birds to produce two sounds simultaneously (Catchpole & Slater, 2008; Nottebohm, 1971)2. In addition to having the necessary anatomy to produce sounds that are perceived by humans as speech, African Grey parrots articulate many phonemes by employing the same anatomical structures (e.g., tongue, glottis) as humans (see Pepperberg, 2010 for review). Such proficiency in articulation occurs without a larynx—descended or not.

1:1 SVT Ratio

Along a similar vein, some have speculated that speech requires the horizontal component of the vocal tract [SVTH, posterior oropharyngeal wall to the lips] to be equal in length to the vertical component [SVTV, vocal folds to the velum] (i.e., a 1:1 SVT ratio, Lieberman, 1984). As our human ancestors evolved, features such as the loss of large teeth resulted in face shortening. While the impetus for change in these features is unknown, the modifications contributed to the speech faculty, for example, by repositioning and enhancing the mobility of the tongue (Aiello & Dunbar, 1993; Lieberman et al., 1972) and lips (Liska, 1993) in the supralaryngeal pharyngeal cavity.

Like most nonhuman species, chimpanzees are incapable of speech production and possess an SVT ratio that is greater than 1:1 (Lieberman & McCarthy, 2007; Nishimura et al., 2003). Further, human infants are also unable to produce the range of sounds necessary for fully articulated speech until their larynxes descend to achieve a 1:1 ratio (Nishimura et al., 2003). The conclusions drawn regarding a 1:1 SVT ratio and the speech faculty have been supported for decades (e.g., Duchin, 1990; Lieberman & McCarthy, 2007).

While the 1:1 SVT ratio may have facilitated speech production in humans, Corballis (1991) points out that using morphology as a guide to determine a speaker’s vocal abilities can often be misleading and not generalizable. His example of the speech-mimicking mynah bird (Gracula religiosa, f. Sturnidae) provides another instance of birds throwing a proverbial monkey wrench in the list of characteristics said to be necessary for speech. Just by outward appearance, birds that mimic speech do not possess a flattened face, an oropharyngeal tongue, or most other face morphology features relevant to speech in humans. Additionally, because the syrinx sits so low in the trachea, a 1:1 SVT ratio is impossible (Catchpole & Slater, 2008; Nottebohm, 1971). Such findings could implicate a 1:1 SVT ratio as necessary for speech in humans only, but longitudinal MRI research with Japanese macaques (Macaca fuscata) concluded that the ratio and position of vocal tract anatomy in humans was probably not driven by speech requirements (Nishimura, Oishi, Suzuki, Matsuda, & Takahashi, 2008). Instead, Nishimura et al. argue the 1:1 SVT ratio may have arisen secondary to other factors.

Relevant Features Shared by All Four Groups

The four taxa under investigation in this review are distinct, and yet share two important features we argue to be relevant to speech: basic sociality and hemispheric asymmetry biased for vocalizations. While humans, parrots, apes, and songbirds are by no
means the only taxa possessing these features, they are highlighted here as the most relevant characteristics linked to speech that are common to all four groups.

Basic Sociality

The first feature common to all four groups, basic sociality, is defined here as frequent interaction with conspecifics (primates, e.g., Dunbar, 1988; parrots, e.g., Seibert, 2006; songbirds, e.g., Robinson, Fernald, & Clayton, 2008); individual recognition of conspecifics (primates, e.g., Tomasello & Call, 1997; parrots, e.g., Farabaugh & Dooling, 1996; songbirds, e.g., Stoddard, 1996); and engagement in extensive parental care of young (primates, e.g., Zeveloff & Boyce, 1982; parrots, e.g., Bucher, 1983; songbirds, e.g., O’Connor, 1984). For humans, speech acquisition requires frequent social interaction, demonstrating the essential connection between the two (Kuhl, 2007). Social interaction is also important for parrots learning to produce and use speech (Pepperberg, 1992), as well as for parrots and songbirds learning species-specific vocalizations (Marler, 1970a; Nottebohm, 1972; Pepperberg, 1999).

Features of sociality have been used frequently as predictors of social species’ vocal repertoire complexity (e.g., Aiello & Dunbar, 1993; Marler, 1977; Marler & Mitani, 1988; Philips & Austad, 1990). One feature of sociality, group size, is considered by some to be a driving factor in why our human ancestors developed such a complex vocal communication system (e.g., Aiello & Dunbar, 1993; Dunbar, 2003). According to Aiello and Dunbar’s (1993) hypothesis, as the size of our human ancestors’ groups increased, maintaining social cohesion became difficult. In this scenario, speech served the function of “social grooming” from a distance when there were not enough hands or time to physically groom everyone.

Critics of Aiello and Dunbar’s (1993) hypothesis argued that social grooming served more of a hygienic function than a social bonding one. An earlier study addressed this criticism. Dunbar (1991) correlated grooming time with body weight and group size in 44 species of free-living primates and found that time spent grooming was more closely related to the size of the group in Pongo, Pan, and Gorilla than it was to body weight. Dunbar interpreted the finding as evidence that allogrooming has a primarily social function among great apes, which strengthens the connections among sociality, group size, and possibly the emergence of a more complex vocal code in our ancestors. Even in modern humans, a language’s vocabulary size increases as a function of complexity and industrialization of a society (e.g., Corballis, 1991; Diamond, 1959; Morton & Page, 1992). That is to say, the more individuals there are in the group, the larger the vocal repertoire, presumably because there is more to talk about with more individuals.

Despite the primate literature supporting the pattern of sociality and group size predicting vocal repertoire complexity, Blumstein and Armitage (1997) presented an exception in their comparison of alarm call repertoire size in multiple species of ground-dwelling squirrels that had a variety of social systems. The authors found that social complexity explained some, but not all, of the complexity of the repertoires. In light of this exception, the authors offered other possible predictors that could influence a species’ repertoire size, including facial and vocal tract morphology, physical or biological constraints by the habitat, and specific needs such as developing different escape patterns for different classes of predators.

Along with group size, the quality of social interaction among members of a group may also influence the size and complexity of the vocal repertoire (McCowan, Doyle, & Hanser, 2002; Morton & Page, 1992). Pinker (2003) proposed that language evolved in humans, not for social “grooming” purposes, but as a means to process increasingly complex social information related to who, what, when, where, and why. Though fundamentally Pinker was referring to language and not speech as we define it in this paper (i.e., speech sounds used for communicative purposes), sociality is certainly a common theme. Similarly, among birds, Salwiczek and Wickler (2004) noted that language-like behavior correlated with sociality. Thus, for parrots and songbirds, which meet our earlier definition of basic sociality, a complex vocal repertoire may be closely related to features of higher social cognition such as the ability to address and communicate with conspecifics—further strengthening the link between sociality and the need for a complex vocal communication system.

The vocalizations of wild African Grey parrots in particular are highly complex and heterospecific, containing elements of other species’ vocalizations as well as their own (Cruickshank, Gautier, & Chappuis, 1993). Cruickshank et al.’s 4-minute recording of two wild African Greys contained more than 10 different mimicries representing nine different bird species and one fruit bat. The authors also commented on the complexity of the vocalizations; specifically, some of the mimicked sounds had been rearranged from the original species’ patterns. Similar to primates’ fission–fusion social system, wild parrots form complex social hierarchies based on age and experience with flock-mates (Del Hoyo, Elliott, & Sargatal, 1992), implicating a similar necessity for bonding and processing social information that both Aiello and Dunbar (1993) and Pinker (2003) described among primates.

Hemispheric Asymmetry for Communication

Brain asymmetry was once believed to be a unique feature of the human brain (e.g., Corballis, 1991). Now, comparative neuroanatomists have described brain asymmetry in every vertebrate class (for review, see Ocklenburg & Güntükün, 2012). In humans, left hemisphere brain asymmetry (LHA) is associated with cerebral specializations related to visuospatial and symbolic reasoning, speech production, and speech recognition (Falk, 1980, 1983; Holloway & de la Coste-Lareymondie, 1982). Some nonhumans also possess LHA related to species-specific vocalizations (e.g., sea lions, Zalophus californianus, Böye, Güntürkün, & Vauclair, 2005; mice, Mus musculus, Geissler & Ehret, 2004; some songbirds, Moorman et al., 2012; and some monkeys, Petersen, Beecher, Zoloth, Moody, & Stebbins, 1978; for primate review, see Ghazanfar & Hauser, 1999).

A number of asymmetries related to speech and language in the human brain also appear in great apes, suggesting that some asymmetries date back at least 6 million years. For example, in humans, the left Sylvian fissure defines many of the language-related areas in the left hemisphere, and this fissure is longer and straighter on the left side than on the right in both humans and apes (Galaburda, LeMay, Kemper, & Geschwind, 1978). Cantalupo and Hopkins (2001) have also identified a structural LHA in Brodmann’s area 44 (i.e., Broca’s area) in great apes. Brodmann’s area 44 in humans has long been considered critically involved in speech and language, although its exact role is still debated (Vargha-Khadem, Gadian, Copp, & Mishkin, 2005). The primate homologue of Brodmann’s area 44 is part of the system involved in the production and perception of grasping movements, and the lateralization of this area in great apes may signal the emergence of a communicative function. That function, though, may have had more to do with gestural than with vocal communication, and it is noteworthy that Broca’s area in humans is activated both by signers when signing and speakers when speaking (Horwitz et al., 2003).

This need not rule out the possibility that the lateralization of Brodmann’s area 44 in great apes was a precursor to speech. Direct stimulation of this area in the chimpanzee produces movements of the tongue and larynx, but no sound (Bailey, von Bonin, & McCulloch, 1950), and more recently Ghazanfar and Rendall (2008) showed similarly that electrical stimulation of the motor cortex produced lip and facial movements and vowel sound production in humans, but stimulation of the homologous area in apes and monkeys resulted in tongue, facial, and vocal cord movement but no actual sound. One possibility is that the homologue of Broca’s area in our primate and hominin precursors was initially specialized for communication through visible gestures, with vocalization incorporated in the course of hominin evolution (Cantalupo & Hopkins, 2001; Corballis, 2010; Rizzolatti & Arbib, 1998).

Such a scenario receives some support from the anatomy of vocal production. Vocalization in nonhuman primates depends on the supplementary motor area (SMA) and cingulate cortex along with diencephalic structures, a system that is primarily dedicated to emotional and instinctive vocalization with at best limited control (Jürgens, 2002). A recent study shows, for instance, that chimpanzees can direct food calls to specific individuals, such as those with whom the caller is friendly, implying a degree of intentional control (Schel, Machanda, Townsend, Zuberbühler, & Slocombe, 2013), but the calls themselves are species-specific, and largely innately structured. The learning of novel vocal patterns depends on a pathway from the face area of the motor cortex to the nucleus ambiguous, which controls muscles of the larynx (Simonyan & Horwitz, 2011). Among mammals, this appears to be unique to humans, or at least much more profuse in humans than in other mammals. This is further discussed below.

As in apes and humans, the communication areas of songbirds’ and parrots’ brains are functionally lateralized and hemispherical asymmetry is present with a bias for communication areas (Bottjer & Arnold, 1985; Nottebohm, 1970, 1977). While some songbirds have LHA for communication areas (e.g., Moorman et al., 2012), others, like zebra finches (Poephila guttata), are right-hemisphere biased (e.g., Williams, Crane, Hale, Esposito, & Nottebohm, 1992). Likewise, of the nine parrot species Rogers (1980) tested, all but one were left-foot dominant, where footedness is a measure of cerebral lateralization. Such differences across species suggest that species-wide laterality itself may be important, regardless of the direction.

The avian cerebrum has a nuclear organization rather than a layered one as in mammals, which makes identifying homologies and analogies between avian and mammalian brains difficult. Currently, seven specific vocal production nuclei are recognized in the avian brain—four in the posterior and three in the anterior areas of the brain (Feenders et al., 2008). Early researchers hypothesized that because birds lacked language, a structure comparable to Brodmann’s area 44 was nonexistent. Further, due to substantial differences in the organization of avian and mammalian brains, early neuroanatomists were unable to distinguish clearly a Brodmann’s area 44 homologue or analogue based on estimations of location alone. Since then, two structures, the magnocellular nucleus of the anterior nidopallium (MAN) and the hyperstriatum ventral pars caudale (HVC), have been proposed as an analogue to Brodmann’s area 44 in songbirds that learn their vocalizations (Bolhuis & Gahr, 2006). Bottjer, Halsema, and Arnold (1984) lesioned the lateral MAN in juvenile and adult zebra finches and found that adult birds’ songs were unaffected while juveniles’ vocalizations were severely abnormal. Thus, the MAN is theorized to be involved with early song development in songbirds which must learn their vocalizations. Intra- and extra-cellular recordings of the HVC of canaries (Serinus canaries), white-crowned sparrows (Zonotrichia leucophrys), and zebra finches demonstrated that the HVC’s role in song production is related to auditory feedback, which is necessary for normal song development (McCasland & Konishi, 1981).

Unlike songbirds, parrots develop calls (i.e., brief, simple sounds) rather than songs (i.e., long series of individual notes); and different vocalization control pathways are involved (e.g., Feenders et al., 2008). In at least budgerigar parrots, the oval nucleus of the anterior nidopallium (NAO) is presumed analogous to the songbird MAN. Likewise, the lateral neostriatum (NLC) is considered comparable to the songbird HVC (Feenders et al., 2008). The NLC is involved with the production, but not development, of learned vocalizations including speech sounds (Lavenex, 2000). Lavenex’s studies with budgerigars revealed disturbances in the ability to modulate properly the amplitude of vocalizations when the area was lesioned.

The features of vocalization-related structures in parrots and songbirds are also different. Differences lie in (a) how auditory stimuli are received, (b) the mechanisms by which sounds are produced (Striedter, 1994), (c) the nuclei involved in the vocalization pathway (Jarvis & Mello, 2000), and (d) the overall orientation of nuclei in the vocalization pathway (Matsunaga, Kato, & Okanoya, 2008). Currently, it is unknown how the differing features in the vocal production pathways of songbirds and parrots contribute to learning vocalizations, memorizing complex vocalizations, producing vocalizations, and learning to incorporate speech into the vocal repertoire (in the case of parrots).

Speech-Related Features Not Present in Apes

Despite our genetic closeness to apes, similarities are far greater between avian and human communication systems with respect to features of vocal production and vocalization acquisition (for review, see Fitch & Jarvis, 2013; Petkov & Jarvis, 2012). A review of the literature suggests two additional features—vocal learning and heightened respiratory control—create a dividing line between apes and the human–songbird–parrot triad. While the position of the vocal apparatus may contribute to difficulty in producing some speech sounds (e.g., Nishimura et al., 2003), ultimately, speech production is rendered impossible for apes due to the inability to imitate vocalizations readily and to produce a sufficiently long and controlled airstream.

Vocal Learning

Vocal learning species acquire their vocalizations through experiential mechanisms. As Jarvis (2004) points out, vocal learning requires auditory learning (i.e., the ability to create associations with auditory stimuli), but it is distinct from auditory learning. Most nonhuman animals are auditory learners. With respect to speech, this means that although they can be trained to learn the meanings of spoken words (e.g., dogs, Canis familiaris, Kaminski, Call, & Fischer, 2004; apes, see Savage-Rumbaugh, Shanker, & Taylor, 1998, for review), they do not use auditory learning to develop their own species-specific repertoires. Vocal learning nonhuman taxa include hummingbirds, songbirds, and parrots (Nottebohm, 1972); cetaceans (McCowan & Reiss, 1997); some pinnipeds (e.g., Mirounga leonine, Sanvito, Galiberti, & Miller, 2007); bats (e.g., Phyllostomus hastatus, Boughman, 1998); and elephants (e.g., Poole, Tyack, Stoeger-Horwath, & Watwood, 2005). Vocal learners are able to imitate species-atypical sounds, but most build their species-specific repertoires by imitating sounds of their own species. Some vocal learning species, such as the lyrebird, incorporate a large variety of sounds from the environment into their repertoires (e.g., Dalziell & Magrath, 2012). Zann and Dunstan (2008) reported over 20 different species’ vocalizations in their recordings of 10 male lyrebirds. Further, 16% of the vocalizations could not be attributed to any animal species, illustrating lyrebirds’ tendency to incorporate non-animal sounds into their repertoires.

Among primates, only humans are classified as vocal learners. Nonhuman primates do show some modification of vocal output, but this seems to be based largely on modification of innate calls through altering positioning of the mouth or lips rather than through control of the larynx. For instance, chimpanzees can produce novel sounds to attract attention by puckering and vibrating their lips to create a “raspberry” sound (Hopkins, Taglialatela, & Leavens, 2007), and captive orangutans (Pongo pygmaeus) have spontaneously matched human whistles (Lameira et al., 2013). In contrast, in humans there is precise control of voicing itself, allowing for a far wider repertoire of different learned patterns. A likely reason for this is that in humans there is a direct connection from the face area of the motor cortex to the nucleus ambiguous, which controls muscles of the larynx (Simonyan & Horwitz, 2011). Although this connection is generally regarded as unique to humans, there is evidence for a similar, if sparse, pathway in mice, allowing for a degree of learning in their ultrasound vocalizations (Arriaga & Jarvis, 2013). Petkov and Jarvis (2012) do not rule out the possibility of sparse connections between the nonhuman primate motor cortex and vocal control, but it appears that only humans possess the density of projection for prolific vocal learning. Nevertheless, such evidence for learned vocalizations in nonvocal learning species has led Arriaga and Jarvis (2013) to criticize the vocal learner–nonvocal learner dichotomy and offer a spectrum-based approach to studying vocal learning. According to their Continuum Hypothesis framework of vocal learning, our conclusion that parrots, songbirds, and humans exhibit far more vocal learning than apes would still hold true.

Feenders et al. (2008) noted parallels between the vocal control systems of humans and of birds, such as parrots, songbirds, and hummingbirds, that are vocal learners. In both groups, the systems divide into anterior and posterior components. The posterior component in birds includes the vocal nuclei that produce the call or song; the posterior component in humans includes the region within the face area of the motor cortex that connects with control of the laryngeal muscles, as described earlier. The anterior component in birds controls the sequencing and learning of vocal productions; in humans, it includes Broca’s area, along with the anterior striatum and anterior thalamus, critical to the production of speech. This system is distinct from the systems underlying innate nonhuman song patterns or calls.

These systems in humans and birds are very similar in architecture, and Feenders et al. (2008) propose that they derive from a more general motor system inherited from the common ancestor of birds and mammals. In most mammals and birds, that motor system is dedicated to physical movement of the body, and control over the system is present only in the relatively rare cases of vocal learners. In parrots, songbirds, and hummingbirds, the vocal learning nuclei are adjacent to the nuclei controlling limb and body movements, while in humans, laryngeal control lies within the face area, which in turn is adjacent to the area controlling hand movements. In the evolutionary scenario proposed by Feenders et al., the incorporation of vocal control did not require the emergence of new structures. Following Finlay, Cheung, and Darlington (2005), they suggest that new cortical areas arise from the enlargement of older areas, with part of an enlarged area allocated to a new function. It is further suggested that this might be accomplished through the duplication of a gene, with one copy retained for the original function and the other used for the new function (Ito, Ishikawa, Yoshimoto, & Yamamoto, 2007).

As described earlier, the organization of the vocal control system in parrots is rather different from that in songbirds. As Feenders et al. (2008) put it, the posterior motor pathway, along with the vocal portion, is shifted forward and laterally, although still posterior to the anterior portion. Feenders et al. suggest that if the motor part of the nidopallium moved with the arcopallium forward and laterally, the supralateral nidopallium (SLN) in parrots may be the homologue of dorsolateral nidopallium (DLN) in other birds. They also note that the parrot nidopallium is much larger relative to body size than in songbirds, and suggest that sensory pathways in the posterior nidopallium may also have been expanded, displacing the anterior forward and laterally. While it is only speculation, the answer to why parrots are the most versatile of avian vocal learners, to the point that they can learn to communicate using speech, could be housed within these nuanced anatomical differences.

Fitch (2010) hypothesized that vocal learners have an evolved need to communicate with a more complex repertoire in order to, for example, identify group members, engage in elaborate reproductive rituals or mate attraction, or communicate effectively in highly variable environments. According to these needs, Fitch’s hypothesis should also include apes, further muddying the waters of why they are not vocal learners. Corballis (2010) and Knight (1998) provide two different possible explanations. Corballis (2010) posits that apes could be more accurately described as “gestural learners” given the variety of discrete information apes can communicate with manual gestures (e.g., sharing food/objects, instigating co-locomotion, stopping a social partner’s action, Cartmill & Byrne, 2010). This fits with the scenario, outlined above, in which vocal control emerged from a preexisting system dedicated to movements of the limbs, including the hands. A further consideration, proposed by Knight (1998), is that innately programmed vocalizations, rather than learned ones, prevent the possibility of vocal deception, or “crying wolf”—thereby keeping vocal signals honest among individuals. Perhaps apes faced stronger selection for honest signaling (mediated through species-typical vocalizations) than for a variable, complex repertoire (mediated through vocal learning). Important to note, deception has been documented in language-trained apes (e.g., Savage-Rumbaugh & McDonald, 1988), suggesting “crying wolf” is within the realm of apes’ cognitive abilities.

While some have argued that similarities between humans and nonhumans should be easiest to find by looking to our nonhuman primate relatives (e.g., Whitaker, 1976), many agree that songbird and parrot vocalizations are more akin to human speech and language than are the calls of nonhuman primates (e.g., Hauser et al., 2002; Passingham, 1981; for a counterclaim that birdsong is more signal than symbol, see Zlatev, 2002). The foregoing review of primate vocalization shows that nonhuman primates demonstrate only limited evidence of spontaneous vocal learning (e.g., Lameira et al., 2013). After extensive training, even chimpanzees show very limited evidence of vocal learning or vocal imitation (e.g., Hayes & Hayes, 1951). Unlike vocal learners’ vocalizations that are learned from conspecifics and emitted intentionally, nonhuman primates’ vocalizations are largely innate and elicited by emotion (e.g., Corballis, 2003; Hauser et al., 2002; Jarvis, 2004; Robinson, 1967). Exceptions to this include titi monkeys (Callicebus cupreus, Müller & Anzenberger, 2002) and the lesser apes (siamangs and gibbons; f. Hylobatidae, e.g., Geissmann, 1999, 2002), which are known to modify their vocalizations to converge upon pair-specific duet “songs” among bonded individuals. Additionally, chimpanzees can direct vocalizations to specific individuals, implying some degree of intentionality in communication (Schel et al., 2013). Even in exceptions such as these, intentional use of vocalizations does not extend to the majority of the repertoire as it does with vocal learners. As a group, then, nonhuman primates do not require a vocal repertoire “tutor,” despite their characteristically highly social group-living, which we and others (e.g., Fitch, 2000b) would predict should offer substantial reason and opportunity for vocal learning.

To demonstrate the lack of necessity for a tutor in nonhuman primates, Winter, Handley, Ploog, and Schott’s (1973) work examined vocal development in infant squirrel monkeys (Saimiri sciureus) reared with muted mothers in the absence of species-specific vocalizations. Their vocal repertoires were virtually identical to those of normally reared infants. In addition, the auditory-isolated repertoires were no different from normal adults’ repertoires, further illustrating the innate nature of nonhuman primate vocalizations. These results are similar to those of isolation studies with vocal non-learner birds such as chickens (Gallus gallus domesticus) and doves (Streptopelia risoria, Konishi, 1963; Nottebohm & Nottebohm, 1971). We do acknowledge that prenatal exposure to vocalizations can significantly influence the vocalizations of developing young (e.g., Gottlieb, 1963); so some degree of vocal learning inside the egg or womb must always be considered a possibility.

Documented rare cases of extreme child neglect in humans (e.g., Curtiss, 1979) and auditory isolation studies with songbirds (e.g., Marler, 1970b) and budgerigars (e.g., Heaton & Brauth, 1999) confirm the necessity of a tutor for normal species-specific vocalization development. Without a tutor, disturbances arise in the production of species-specific vocalizations. Under normal developmental conditions, auditory stimulation provided by a tutor is hypothesized to serve as a model after which the learner modifies its output (Keller & Hahnloser, 2009; Prather, Peters, Nowicki, & Mooney, 2008). To do this, the brain connects
auditory stimuli with the required motor movements necessary to reproduce what was heard.

Research with mammals and birds has confirmed regions in the cerebrum (avian telencephalon) to be responsible for vocalizations in vocal learners (e.g., Jürgens, 1995), while regions important to vocalizations in non-learners are located in the midbrain’s limbic
system and medulla (e.g., Robinson, 1967; Wild, 1997). Robinson (1967) stimulated hundreds of neocortical sites in rhesus macaques (Macaca mulatta), and no vocal production was evoked, further confirming that the limbic system and medulla are sufficient for nonhuman primate vocalizations. In vocal learning birds, unique sub-pathways underlie vocalization production. One is a vocal motor pathway responsible for producing learned vocalizations, and the other is a pallial–basal ganglia–thalamic loop which is responsible for modifying and learning vocalizations (Jarvis, 2007). Further, vocal learning birds possess uniquely similar expression of one gene that is unexpressed in non-learners (Matsunaga et al., 2008). Despite well-documented differences in brain anatomy between humans, parrots, and songbirds (e.g., Jarvis, 2004; Paton, Manogue, & Nottebohm, 1981; Striedter, 1994), similarities in vocalization acquisition and production do exist, most relevant of these to speech is the shared commonality of vocal learning.

Heightened Control Over Respiration

While vocal learning stands out in the literature as a clear divider between apes and the human–parrot–songbird triad, many recognize the significant role that heightened control over respiration plays in normal speech production (e.g., Campbell, 1968; Lieberman, 1984; MacLarnon & Hewitt, 1999). Given the substantial, finely controlled respiratory requirements for the production of the speech stream in humans (Ghazanfar & Rendall, 2008; Lieberman, 1984), we posit that this feature should be included as a speech requisite.

In mammals and birds, the lungs provide the necessary subglottal airstream, modulated by the larynx or syrinx, respectively, to create and modify sound (Fitch & Hauser, 1995). In humans, quiet breathing is disrupted in order to produce speech sounds. Most speech vocalizations occur in a spontaneous cycle of long expirations (words) which are punctuated by rapid, silent inspirations (Ghazanfar & Rendall, 2008; MacLarnon & Hewitt, 1999, 2004). Finely tuned control of the respiratory system in response to cognitive factors is required for a speaker to time inspirations in order not to lose his or her breath while vocalizing (MacLarnon & Hewitt, 1999). Early work by Ladefoged (1968) also highlighted the importance of finely controlled breathing for varying emphasis, pitch, and intonation of words.

The breath control necessary for speech production is estimated to have appeared in humans about 600,000 years ago (MacLarnon & Hewitt, 2004). Support for this comes from studying the size of the thoracic vertebral canal in hominid fossils. This canal expanded over time to allow for enhanced innervation of the intercostal and abdominal muscles as more finely tuned speech breathing developed. Several 600,000-year-old Neanderthal specimens had canals similar in size to modern humans (MacLarnon & Hewitt, 1999). Conversely, Homo ergaster, who lived approximately 1.6 million years ago, as well as earlier hominids, had a small thoracic vertebral canal that was comparable in size to extant nonhuman primates (MacLarnon & Hewitt, 2004).

Compared to nonhuman primate vocalizations, speech is extremely taxing on the respiratory system. The rate at which nonhuman primates produce sequences of vocalizations is limited by their tendency to vocalize using a one-sound-per-breath pattern (Ghazanfar & Rendall, 2008; MacLarnon & Hewitt, 2004). On the other hand, an average human speech stream full of different sounds may last as long as 12 seconds (Winkworth, Davis, Adams, & Ellis, 1995). Among nonhuman primates, average vocalization streams are variable in length; however, longer vocalization streams are associated with species that rely upon elaborate vocal apparatus to increase vocalization length. The indri (Indri indri), a prosimian that uses air sacs to increase vocalization length, has a vocalization stream of 5 seconds (Thalmann, Geissmann, Simone, & Mutschler, 1993). The howler monkey (g. Alouatta), which possesses a large air sac beneath the hyoid bone that acts as a resonating chamber, as well as two lateral air sacs, has been documented vocalizing for more than 50 seconds in one breath (Sekulic & Chivers, 1986). The lesser apes, which also have a large air sac (Boer, 2009), may vocalize up to 30 seconds in one “great call” (Haimoff, 1983).

Unlike howler monkeys and indris, great apes (and humans) do not have specialized vocal apparatus, although great apes (but not humans) also have large air sacs (Boer, 2009). Chimpanzees’ longest documented stream is about 1.6 seconds (Clark & Wrangham, 1993; Marler & Tenaza, 1977), and orangutans’ and gorillas’ just over 2 seconds (Hardus et al., 2009; Salmi, Hammerschmidt, & Doran-Sheehy, 2013). These data may come as a surprise considering humans and apes share similarly sized lungs relative to body size (e.g., Stahl, 1967). Given this, some feature pertaining to the control of respiration rather than anatomical properties of the lungs or vocal tract must differentiate apes’ and humans’ vocal-breathing characteristics. The specific role that nonhuman primates’ air sacs play in vocalization is unclear, but Boer (2009) suggests that an air sac would actually reduce the ability to produce speech.

For birds, the demands of flight have resulted in a highly specialized respiratory system. Pressure differentials created by air passing through the air sacs, bronchi, and lungs contribute to vocalization production (e.g., Elemans, Muller, Larsen, & van Leeuwen, 2009). The tongue, larynx, and other relevant anatomy are reduced in size, and the primary breathing/vocalizing apparatus—the syrinx—sits close to the lungs (Deacon, 1997). Human and songbird (and presumably parrot) vocalizations require controlled coordination of laryngeal and syringeal (respectively), respiratory, and vocal tract muscles (e.g., Suthers, Goller, & Pytte, 1999; Wild, 1997). Despite many differences, songbird respiration during vocalization is similar to that of nonhuman primates in that songbirds respire between almost every song note. This often results in rapid “mini-breaths” between complex trill sounds which can be as quick as 25 notes per second (Calder, 1970; Wild, Goller, & Suthers, 1998). Nevertheless, songbirds such as the winter wren (Troglodytes troglodytes) produce vocal streams as long as 41 seconds (Clark, 1949), far surpassing humans, and approaching the length of nonhuman primate species with specialized vocal apparatus. Clark also noted that the winter wrens’ 41 seconds were comprised of songs, far more difficult to produce than the one-note howler monkey howl. Such a feat is made even more difficult because birds lack a muscular diaphragm, making both inspiration and expiration active processes (e.g., Codd, Boggs, Perry, & Carrier, 2005; for review of avian respiratory morphology, see Codd, 2010). Thus, though songbirds breathe in between notes, the breaths are very small and require a substantial effort on the part of the bird; yet enough air is inspired to sustain lengthy, complex songs. An early investigation of budgerigars demonstrated that the air inspired during mini-breaths provides little to no air-intake value; rather, the inspiration goes completely to vocalizing (Tucker, 1968). Long-duration vocalizations require a strong, finely tuned respiratory system that undergoes regular periods of apnea without disrupting or distorting vocal output. Such vocal control is more on par with the physical demands associated with speech breathing than nonhuman primate vocalization breathing.

Finally, vocalization is associated with controlled activation of skeletal muscle system neural pathways in humans, songbirds, and parrots (Deacon, 1997; Paton et al., 1981; Sturdy, Wild, & Mooney, 2003; for review, see Wild, 1997). In contrast, activation in nonhuman primates occurs via visceral muscle system pathways. According to Deacon (1997), recruitment of skeletal rather than visceral muscle systems allows for more finely tuned breathing and therefore a more flexible range of vocalizations that is characteristic of humans and birds, but not apes. Nevertheless, examples of controlled breathing do exist, such as Lameira et al.’s (2013) report of whistling orangutans and Perlman, Patterson, and Cohn’s (2012) description of Koko the gorilla’s fake coughs, nose blowing, and wind instrument playing. Relevant to note, Koko’s “toots” on the instruments were all less than 2 seconds, the length reported for wild gorilla vocalizations. Perlman et al. concluded that the ability to control breathing is not dichotomous, with humans being able and the great apes being unable. Rather, they hypothesized that great apes can demonstrate some degree of controlled breathing provided a motivating and relevant environment (e.g., human models encouraging the behavior).

Complex Sociality: Where Parrots and Songbirds Differ

So far, this review has proposed that speech is associated with four major features: basic sociality, hemispheric asymmetry in communication areas, vocal learning, and heightened respiratory control. The groups possessing all four of these features are humans, songbirds, and parrots. This final section proposes that complex sociality separates humans and parrots from most temperate songbirds (see Figure 2). Similar to defining “basic sociality,” arriving at an appropriate definition of “complex sociality” is difficult. Nevertheless, we define complex sociality in this paper as the presence of discrete repertoire elements for
affiliative nonsexual social interaction with conspecifics
, social correlates of intelligence, and hierarchical relationships among group members. Others (e.g., Knight, 1998) have highlighted complex forms of sociality as a catalyst for the development of complex communication systems like speech. While there are exceptions to each of the criteria posed, complex sociality as we have defined it is present in humans, parrots, and apes. As discussed earlier, vocal learning and heightened respiratory control (and certain anatomical features) make word-production impossible for apes, thus precluding them from the speech faculty as we have defined it here (i.e., production and communicative use). A detailed summary of social organization, anatomical, and repertoire-related features is provided in Table 1 for all four animal groups.

Heightened Sociality and the Vocal Repertoire

Repertoire complexity and repertoire size are distinctly different. Many have linked sociality to the size of a species’ repertoire by hypothesizing that a larger repertoire affords an individual the ability to vocalize with greater detail about more numerous experiences (e.g., Aiello & Dunbar, 1993; Blumstein & Armitage, 1997; Dunbar, 2003; McCowan et al., 2002; Morton & Page, 1992). Linguists estimate that while the Oxford English Dictionary defines over 600,000 separate words, the average native English-speaking university graduate’s repertoire contains around 20,000 word families (i.e., excluding archaic words, proper names, compound words, abbreviations, alternative spellings, and dialect forms; Goulden, Nation, & Read, 1990; Nation & Waring, 1997). By comparison, great apes, parrots, and a representative songbird, black-capped chickadees (Parus atricapillus) are estimated to have repertoire sizes of under 100 distinct vocal types, where a “type” could be a call or song (e.g., bonobos, Bermejo & Omedes, 1999; mountain gorillas, Fossey, 1972; chimpanzees, Goodall, 1986; lowland gorillas, Harcourt, Stewart, & Hauser, 1993; Salmi et al., 2013; parrots, Bradbury, 2003; black-capped chickadee, Ficken, Ficken, & Witkin, 1978). With the exception of more prolific oscine songbirds like the nightingale (Luscinia megarhynchos), with a repertoire containing over 200 elements due to song syllables (Kipper, Mundry, Sommer, Hultsch, & Todt, 2006), typical ape, songbird, and parrot distinct vocalization repertoires are within the same order of magnitude as that of the European badger (Meles meles), a vocal non-learning social mammal (Wong, Stewart, & MacDonald, 1999), and two orders of magnitude smaller than the repertoire size (synonymous with vocabulary) of humans. Yet, despite the vastly differently sized repertoires between humans and the other three groups, as well as the fact that one of the groups does not engage in vocal learning, each group is still classified as social; this suggests repertoire size is neither a perfect predictor of sociality nor related to the ability to produce and use speech.

It is important to note that meaningful determination of a vocal repertoire’s complexity or size must make assessments of call morphology together with perceptual determinations of the salience of call features. Relying on sound differences alone results in an incomplete investigation. Such challenges may explain the vast differences in reported repertoire size among nonhumans, and make it difficult to compare repertoire size across taxonomic groups. This is especially true of those avian species for which repertoire complexity is a territory and reproduction arms race among males. In these cases, selection favors males with repertoires consisting of a variety of parsed and novel vocalizations. Attempts to count specific syllables to arrive at species-typical “repertoire size” would be difficult, and possibly uninformative. For example, Tu, Osmanski, and Dooling (2011) reported 116 different elements in a budgerigar’s warble song. The question of whether each distinct element provides discrete information, or if the elements’ organization or repetition provides discrete information, is unknown and beyond current bioacoustic techniques.

Quantifying repertoire size among parrots presents additional concerns as some species have at least three classes of vocalizations: emotion-driven sounds (e.g., agnostic shrieks), intentional sounds (e.g., contact calls), and dialect-based sounds that are unique to individuals and groups for purposes of self- and group-identification (e.g., Berg, Delgado, Cortopassi, Beissinger, & Bradbury, 2012; Salinas-Melgoza & Wright, 2012). Some songbirds also show similar evidence of individual variations and dialects in their vocalizations (e.g., song sparrow, Melospiza melodia, Harris & Lemon, 1972). Decisions regarding repertoire size determination are further complicated by factors such as these, and must be made carefully. To date, we are not aware of any consistently used, appropriate methodology for comparing repertoires across species.

While repertoire complexity and size do not appear to be appropriate for comparison, functionality of the repertoire seems to provide reliable information regarding sociality and may offer a reliable difference between songbirds and parrots and humans. Among temperate songbird species, vocalizations are generally limited to males and to contexts of territory defense and reproduction (Catchpole & Slater, 2008; Kroodsma & Miller, 1996). Male songbirds have a larger syrinx than females, despite similar body size, indicating sexual selection for more robust vocalizations (Riede, Fisher, & Goller, 2010). Further, female zebra finches, for example, have only rudimentary versions of certain song-learning and song-production telencephalon areas and produce only innate vocalization patterns (Nottebohm & Arnold, 1976). This sexual dimorphism is not found in tropical duetting songbirds like chats (f. Muscicapidae) and some species of wrens where females are more vocally active (Brenowitz, Arnold, & Levin, 1985). Taken together, songbird evolution has selected for anatomy and vocal production that facilitates the repertoire’s function—whether it is communicating to conspecifics about resources and mating, or strengthening bonds in mated pairs.

Unlike songbirds in temperate regions, both sexes of parrots use learned vocalizations throughout the year in a variety of contexts that are unrelated to reproduction (Bradbury, 2003). According to Bradbury, many adult parrot calls promote cohesion, affiliation, and information transfer among individuals. These calls include, but are not limited to, a loud contact call for maintaining connection, a soft contact call for coordinating movement in dense vegetation, a pre-flight call to notify group members of an individual’s impending departure, and a paired duet call. In black-capped chickadees, a majority of calls are classified as being involved with reproduction, coordination of group movement, and various agonistic encounters with conspecifics (Ficken et al., 1978). As an exception, the Carolina chickadee’s (Poecile carolinensis) “chick-a-dee” call has been implicated in social cohesion (Freeberg & Harvey, 2008). Likewise, according to Ficken et al. (1978), black-capped chickadees have “broken dee” and “faint fee-bee” calls that attract males to females that are out of sight; however, only the “chick-a-dee call complex” is implicated in pair and flock cohesion, such as recruitment of individuals to mob predators of differing threat levels (Templeton, Greene, & Davis, 2005).

Even the social processes involved with vocalization acquisition differ greatly between songbirds on the one hand and parrots and humans on the other. Similar to human children, wild African Grey parrot juveniles learn vocalizations through affiliative social interaction with parents and flock-mates (e.g., Berg et al., 2012; Nottebohm, 1970). Conversely, many songbirds learn their vocalizations directly and indirectly through hearing aggressive, territorial interactions of their fathers and neighboring conspecifics (e.g., Nuttall’s white-crowned sparrows, Zonotrichia leucophyrys nuttali, Bell, Trail, & Baptista, 1998; European starlings, Bertin, Hausberger, Henry, & Richard-Yris, 2007; zebra finches, Zann, 1990). In zebra finches, presence and interactions with even male siblings can contribute to features of a male’s song (e.g., Tchernichovski, Lints, Mitra, & Nottebohm, 1999; Tchernichovski & Nottebohm, 1998).

From this and earlier evidence, socializing is much different in songbirds than in humans and parrots with respect to the vocal repertoire. Humans and parrots use their vocalizations to foster strong, positive bonds that last years (i.e., across breeding seasons, in parrots). In most species studied, songbirds typically use their vocalizations to attract and retain mates and to defend territories one breeding season at a time (Catchpole & Slater, 2008; Kroodsma & Miller, 1996). These contrasting overall functions of the vocal repertoire support the argument that speech capabilities may somehow be linked to fundamental differences in the functions of the repertoire and communication (e.g., Brown & Farbaugh, 1997). That is to say, parrots, like humans and great apes, may naturally have “more to say” because of their more diverse social interactions that extend beyond reproduction. This richer level of sociality may make parrots better suited than songbirds to produce human speech sounds and readily adopt the use of them for interspecies communication.

Social Correlates of Intelligence

Intelligence in birds has been linked to many features of sociality, such as interactions not obviously related to survival (Burish et al., 2004), group size in fossil and extant primates (Aiello & Dunbar, 1993; Sawaguchi & Kudo, 1990), and the tendency toward altricial young (which require extensive parental care) in avian species with large adult brains (Portmann, 1946)3. According to the Social Intelligence Hypothesis (see Byrne & Whiten, 1988), human intelligence was enhanced by the numerous roles, interactions, and experiences that came as a result of living socially. Empirical work with nonhumans has supported this theory (e.g., Burish et al., 2004; Reader & Laland, 2002; for counterevidence see Beauchamp & Fernández-Juricic, 2004; for alternative views see Zuberbühler & Janmaat, 2010; Melin, Young, Mosdossy, & Fedigan, in press). Burish et al. (2004) presented a meta-analysis of 154 bird species’ social structures, eating habits, migration habits, flight habits, mating systems, and vocalization qualities. The authors then correlated each of these factors to a telencephalon-to-whole-brain ratio. The results demonstrated that transactional (defined as engaging in at least between-individual social interaction), monogamous, herbivorous species that did not migrate, but did fly, and that were vocal learners had the largest telencephalon ratio. African Grey parrots and many other speech-using psittacids possess all of these features.

In Burish et al.’s (2004) meta-analysis, the 20 largest telencephalon ratios belonged to species of parrots, corvids, woodpeckers, and owls, with parrots never ranking below 33rd on the list of 154. Interestingly, the lowest ranking psittacid was the budgerigar, the species used most frequently in comparative research. The five species with the largest telencephalon ratios were (in order) the blue-and-yellow macaw (Ara ararauna), the red-and-green macaw (Ara chloropterus), the common raven (Corvus corax), the African Grey parrot, and the yellow-crested cockatoo (Cacatua sulphurea). The first true songbird, the Eurasian skylark (Alauda arvensis) ranked 23rd, and the next, the blue tit (Parus caeruleus) was 29th. While these are highly ranked, songbirds were scattered in the list, with the European robin (Erithacus rubecula) appearing 121st out of 154. Zebra finches, commonly used in comparative research, appeared 62nd. Though making generalizations from a meta-analysis is difficult, the data are congruent with Byrne and Whiten’s (1988) Social Intelligence Hypothesis in that parrots are both highly social and have relatively large brains.

While many nonsocial species’ lifespans can exceed 70 years (e.g., European pond turtle, Emys orbicularis, Gibbons, 1987; lake sturgeon, Acipenser fulvescens, Thomas & Haas, 2004), there are clear social and cognitive correlates of long lifespans (e.g., Carey & Judge, 2001). According to Carey and Judge, species with long lifespans have more time for intergenerational transfer of information. In addition, a longer lifespan allows for stronger social bonding due to years of exposure and accumulated experiences with group members. This may explain why humans, parrots, and cetaceans (another long-lived, highly social taxon) use signature vocal “tags” to recognize individuals (e.g., Bruck, 2013; Janik & Sayigh, 2013; Quick & Janik, 2012; Saunders, 1983). This characteristic is not prevalent in shorter-lived species. Chickadees and house finches (Carpodacus mexicanus) represent two short-lived species with vocal tags (Bradbury, 2003). Carey and Judge’s lifespan data suggest a strong relationship between complex sociality and lifespan (e.g., humans, 100+ years; parrots, 70 years; cetaceans, 40–70 years, with George et al., 1999, estimating 100+ years for bowhead whales, Balaena mysticetus; apes, 60 years). For comparison, exemplar songbird species discussed in this review live less than 10 years (e.g., zebra finch, 5 years, Burley, 1985).

The opportunity for division of labor also may somehow relate to social intelligence in highly social species (Carey & Judge, 2001). Division of labor within a vertebrate “society” requires substantial cooperation and interaction among group members—including information transfer and the cognitive capacity to remember other individuals’ identities and roles within the system. Primate species exhibit various degrees of division of labor (for review, see Galdikas & Teleki, 1981). Division of labor among songbirds (excluding parental care) has yet to be documented; however, sentinel behavior, a transient labor role within a society, is seen in some parrots (Levinson, 1980). Whether or not there is a causal relationship between the speech faculty and the presumed intelligence associated with complex sociality as we have defined it is difficult to determine at this point. What is clear is that there are definite similarities between humans and parrots with respect to these features, and that both groups’ heightened sociality distinguishes them from songbirds.

Concluding Thoughts

Given the inability of apes to speak, the common capacity of parrots and humans to produce and communicate with speech sounds must be examples of parallel evolution, arrived at for similar purposes but via different routes. One possibility for humans, hinted at earlier, is that speech arose from manual and facial gestures, perceived visually rather than auditorily. This idea has a long but intermittent history, dating at least from the writings of Rousseau and Condillac in the 18th century. It was revived by Hewes (1973) and has since found support from a variety of considerations, including the efficiency and linguistic sophistication of sign languages (Armstrong & Wilcox, 2007), the role of the mirror system in primates (Rizzolatti & Sinigaglia, 2008), the strong neurophysiological and behavioral links between hand movements and mouth movements (Gentilucci & Corballis, 2006), and the nature of gestural communication in great apes, both in captivity and in the wild (e.g., Tomasello, 2008).

Indeed, there is some suggestion that speech may have superseded a manual sign language within the past 100,000 years (Corballis, 1991, 2010). Gestures by captive chimpanzees and bonobos (Pollick & de Waal, 2007), as well as wild chimpanzees (Hobaiter & Bryne, 2011a, 2011b), appear to be more diverse and flexibly used than the vocal calls used by the species. As Pollick and de Waal (2007) reported, apes’ gestural repertoires were larger than their repertoires of facial/vocal signals. The authors also noted bonobos’ usage of multimodal communication, whereby combinations or serially produced gestures and facial/vocal signals elicited greater responsiveness by the receiver. The development of sign language in deaf infants shows remarkable parallels with that of speech, including manual ‘babbling’ (e.g., Petitto & Marentette, 1991) and similar overall phonological, morphological, and syntactical organization (e.g., Klima & Bellugi, 1979).

There remains the question of why speech would have superseded manual gesture. There are several possible answers. One is that speech frees the hands for other activities, such as carrying objects, and eventually for making and using tools. Speech is also a system of gestures, involving movements of the tongue, lips, velum, and larynx (Studdert-Kennedy, 1998), and moving the gestural system away from the external limbs into the mouth would have been much more efficient in terms of the expenditure of energy. This was perhaps an early example of miniaturization. Speech also holds the advantage at night, or when physical barriers intervene. Even so, people still gesture as they speak, and their gesturing helps convey information (Corballis, 2010).

Nevertheless, not all are convinced by the gestural theory (e.g., Burling, 2005; MacNeilage, 2008), and there may well be alternative explanations as to the parallel routes to speech in parrots and humans. Although both are employed communicatively, they do serve different functions and have different properties. In humans, speech is the dominant medium of language, a complex system involving syntax and the capacity to transmit information about past and planned future events, states of the world, explanations of how things work—or in Pinker’s words, “who did what to whom, when, where, and why” (Pinker, 2003, p. 27).

From what little is known about wild parrots, their communication appears to have more to do with social bonding than with the exchange of information, although vocalizations are used to coordinate movement and to transmit general information (as in alarm calls). Given our lack of knowledge, we cannot yet say if parrots use calls to transmit information in the ways described above for human speech, though speech-based contact calls have been documented (Colbert-White et al., 2011). In the wild and in captivity, parrots must vocally conform to a group in order to be accepted by that group. Given the importance to parrots of group
cohesion and social partners for safety and resource discovery (Bradbury, 2003), parrots have most likely experienced selection for the ability to imitate a vast array of sounds to ensure continuing acceptance in the group (i.e., to be vocal generalists).

Just as human infants are born with the ability to learn thousands of languages, parrots also appear to have the ability to learn a vast array of vocalizations. However, while human infants excel at producing a variety of human vocalizations, they—like most species—produce other species’ vocalizations quite poorly. By contrast, parrots readily produce many other species’ vocalizations. This extreme vocal generalist quality, matched with the use of a species-atypical vocal communication system to interact with social partners, is both remarkable and rare within the animal kingdom.

Stereotyped vocalizations are predominant in avian species for which inclusion in a specific group is not crucial to survival (e.g., pigeons and chickens). Temperate songbirds may therefore hold an intermediate position between taxa exhibiting stereotyped vocalizations (vocal specialist) and taxa exhibiting extensive vocal learning (vocal generalist). In some species of songbirds, while the song template is the same across individuals, and there is no requirement of song for inclusion into groups, males that recombine syllables or incorporate vocalizations of other species are considered the most attractive by females (e.g., Catchpole, 1987; Howard, 1974; see Catchpole & Slater, 2008 for review). Thus, the pressure to be somewhat of a vocal generalist in this taxonomic group is apparent.

Just as humans may have transitioned to speech from gestures as a means of overcoming issues associated with night vision and physical barriers, species that are flighted, arboreal, nocturnal, or aquatic also encounter environmental constraints that would make vocal communication systems as complex as speech more appropriate for information transfer (e.g., Janik & Slater, 1997; Jarvis, 2006; Liska, 1993; McCowan et al., 2002). Cetaceans and microchiropteran bats share environmental constraints similar to humans and parrots, as well as vocal learning and varying degrees of sociality. Among bats, researchers have identified individual- and group-specific signature contact calls (e.g., Arnold & Wilkinson, 2011; Gillam & Chaverri, 2012) similar to vocal identification systems observed in cetaceans (e.g., Janik & Sayigh, 2013), parrots (e.g., Berg et al., 2012), and humans. Many cetaceans such as bottlenose dolphins (Tursiops truncates) and humpback whales (Megaptera novaeangliae) are highly social to the level of cultural transmission of behavior (e.g., Allen, Weinrich, Hoppitt, & Rendell, 2013; Rendell & Whitehead, 2001). However, neither bats nor cetaceans are able to produce speech. Ridgway et al. (2012) did describe a beluga whale that spontaneously produced sounds that mimicked human speech rhythms and fundamental frequencies, but the authors acknowledged that the sounds were not articulated speech. Like apes, bats and cetaceans present an interesting conundrum in which some, but not all, features described in this review apply.

Along with increasing communication research on cetaceans and bats to learn more about how they fit into the speech faculty debate, future studies could investigate the vocal generalist versus vocal specialist difference described earlier. In addition to investigations of neuroanatomical differences among highly generalist species like African Grey parrots, intermediate generalists like nightingales, and vocal specialists like chickens, the literature lacks intra-family assessments of speech abilities. African Greys have received much attention and training in speech production and use (e.g., Pepperberg’s adaptation of Todt’s 1975 model/rival shaping paradigm; for review, see Pepperberg, 1999). However, intensive, specialized training protocols have not been developed for other promising parrot species like macaws, or even budgerigars, the parrot species that dominates neuroanatomical research on vocal-auditory pathways. Budgerigars may be used because they are inexpensive and easy to maintain in captivity, but they are not the most appropriate model to understand why parrots and not songbirds can produce speech. In order to understand how
African Greys are so skilled at using speech, neuroanatomical work must be done with Greys, not budgerigars. Comparisons of multiple parrot species with a range of speech abilities, including budgerigars, may answer more questions about the speech faculty. Macaws, for example, have not been studied at all in this regard, although they have the largest telencephalon ratios among those studied by Burish et al. (2004).

With the exception of some of Pepperberg’s work, comparisons between humans and African Grey parrots’ speech use do not exist in the literature, and few theories have been developed to explain why parrots have been selected to produce the large variety of sounds comprising their heterospecific repertoires—both in the wild and in captivity. Given the similarly large range of sounds human infants can learn to produce in order to communicate, we find parrots to be an interesting opportunity for comparison. Whether the comparison is at the level of social organization, ecological relevance of a complex learned repertoire, some other feature, or a combination of these, the vocal generalist quality of humans and parrots merits further investigation. Because numerous parallels between human speech and language and songbird songs have been made, it is our hope that this synthesis of the literature serves as a call to action for collaboration of linguists, animal behaviorists, neuroanatomists, and psychologists to begin to explore humans’ and parrots’ shared vocal generalist quality.


1 Since tuis (Prosthemadera novaeseelandiae; Whangarei Native Bird Recovery Centre, n.d.), corvids (f. Corvidae, Noack, 1902), and sturnids (f. Sturnidae, West et al., 1983) are also passerines that can mimic speech, we have simplified songbird to denote stereotypical temperate oscine songbirds with songs predominating the wild vocal repertoire (e.g., chickadees, finches).

2 Simultaneously producing two or even three distinguishable pitches using a larynx is extremely rare, but possible. Some traditional singing in Central Asia, Southern Siberia, India, and South Africa involves what is called overtone singing. The extraordinary practice requires years of training and greatly strains the vocal apparatus (Pegg, 1992).

3 Humans, songbirds, and parrots also share the feature of altricial young. The relevance of altriciality to arguments made in this review is unknown, but the similarity in this dimension among the three taxonomic groups should not be overlooked.


Aiello, L. C., & Dunbar, R. I. M. (1993). Neocortex size, group size, and the evolution of language. Current Anthropology, 34, 184–193. doi:10.1086/204160

Allen, J., Weinrich, M., Hoppitt, W., & Rendell, L. (2013). Network-based diffusion analysis reveals cultural transmission of lobtail feeding in humpback whales. Science, 340, 485–488. doi:10.1126/science.1231976

Armstrong, D. F., & Wilcox, S. E. (2007). The Gestural Origin of Language. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780195163483.001.0001

Arnold, B. D., & Wilkinson, G. S. (2011). Individual specific contact calls of pallid bats (Antrozous pallidus) attract conspecific’s at roosting sites. Behavioral Ecology and Sociobiology, 65, 1581–1593. doi:10.1007/s00265-011-1168-4

Arriaga, G., & Jarvis, E. D. (2013). Mouse vocal communication system: Are ultrasounds learned or innate? Brain & Language, 124, 96–116. doi:10.1016/j.bandl.2012.10.002

Bailey, P., von Bonin, G., & McCulloch, W. S. (1950). The Isocortex of the Chimpanzee. Urbana-Champaign: University of Illinois Press.

Beauchamp, G., & Fernández-Juricic, E. (2004). Is there a relationship between forebrain size and group size in birds? Evolutionary Ecology Research, 6(6), 833–842.

Bell, D. A., Trail, P. W., & Baptista, L. F. (1998). Song learning and vocal tradition in Nuttall’s white-crowned sparrows. Animal Behaviour, 55, 939–956. doi:10.1006/anbe.1997.0644

Berg, K. S., Delgado, S., Cortopassi, K. A., Beissinger, S. R., & Bradbury, J. W. (2012). Vertical transmission of learned signatures in a wild parrot. Proceedings of the Royal Society B, 279, 585–591. doi:10.1098/rspb.2011.0932

Bermejo, M., & Omedes, A. (1999). Preliminary vocal repertoire and vocal communication of bonobos (Pan paniscus) at Lilunga (Democratic Republic of Congo). Folio Primatologica, 70, 328-357. doi:10.1159/000021717

Bertin, A., Hausberger, M., Henry, L., & Richard-Yris, M.-A. (2007). Adult and peer influences on starling song development. Developmental Psychobiology, 49, 362–374. doi:10.1002/dev.20223

Blumstein, D. T., & Armitage, K. B. (1997). Does sociality drive the evolution of communicative complexity? A comparative test with ground-dwelling sciurid alarm calls. The American Naturalist, 150(2), 179–200. doi: 10.1086/286062

Boë, L.-J., Heim, J.-L., Honda, K., & Maeda, S. (2002). The potential Neandertal vowel space was as large as that of modern humans. Journal of Phonetics, 30, 465–484. doi:10.1006/jpho.2002.0170

Boer, B. (2009). Acoustic analysis of air sacs and their effect on vocalization. Journal of the Acoustical Society of America, 126, 3329–3343. doi:10.1121/1.3257544

Bolhuis, J., & Gahr, M. (2006). Neural mechanisms of birdsong memory. Nature Reviews Neuroscience, 7, 347–357. doi:10.1038/nrn1904

Bottjer, S. W., & Arnold, A. P. (1985). Cerebral lateralization in birds. In S. Glick (Ed.), Cerebral Lateralization in Nonhuman Species (pp. 11–38). Orlando, FL: Academic Press.

Bottjer, S. W., Halsema, E. A., & Arnold A. P. (1984). Forebrain lesions disrupt development but not maintenance of song in passerine birds. Science, 224, 901–903. doi:10.1126/science.6719123

Boughman, J. W. (1998). Vocal learning by greater spear-nosed bats. Proceedings of the Royal Society of London B, 265, 227–233. doi:10.1098/rspb.1998.0286

Böye, M., Güntürkün, O., & Vauclair, J. (2005). Right ear advantage for conspecific calls in adults and subadults, but not infants, California sea lions (Zalophus californianus): hemispheric specialization for communication? European Journal of Neuroscience, 21, 1727–1732. doi:10.1111/j.1460-9568.2005.04005.x

Bradbury, J. W. (2003). Vocal communication of wild parrots. In F. B. M. de Waal & P. L. Tyack (Eds.), Animal Social Complexity: Intelligence, Culture, and Individualized Societies (pp. 293–316). Cambridge, MA: Harvard University Press. doi:10.1121/1.4780035

Brenowitz, E. A., Arnold, A. P., & Levin, R. N. (1985). Neural correlates of female song in tropical duetting birds. Brain Research, 34(1), 104–112. doi:10.1016/0006-8993(85)91163-1

Brown, E. D., & Farbaugh, S. M. (1997). What birds with complex social relationships can tell us about vocal learning: Vocal sharing in avian groups. In P. McGregor (Ed.), Animal Communication Networks (pp. 98–127). Cambridge: Cambridge University Press.

Bruck, J. N. (2013). Decades-long social memory in bottlenose dolphins. Proceedings of the Royal Society B, 280, 1–6. doi:10.1098/rspb.2013.1726

Bucher, T. L. (1983). Parrot eggs, embryos, and nestlings: Patterns and energetics of growth and development. Physiological Zoology, 56(3), 465–483.

Burish, M. J., Kueh, H. Y., & Wang, S. S.-H. (2004). Brain architecture and social complexity in modern and ancient birds. Brain, Behavior and Evolution, 63, 107–124. doi:10.1159/000075674

Burley, N. (1985). Leg-band color and mortality patterns in captive breeding populations of zebra finches. The Auk, 102(3), 647–651.

Burling, R. (2005). The Talking Ape. New York: Oxford University Press.

Byrne, R. W., & Whiten, A. (1988). Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Oxford University Press.

Calder, W. A. (1970). Respiration during song in the canary (Serinus canaria). Comparative Biochemistry and Physiology, 32, 251–258. doi:10.1016/0010-406X(70)90938-2

Campbell, E. J. M. (1968). The respiratory muscles. Annals of the New York Academy of Sciences, 155, 135–140. doi:10.1111/j.1749-6632.1968.tb56757.x

Cantalupo, C., & Hopkins, W. D. (2001). Asymmetric Broca’s area in great apes. Nature, 41(6863), 505. doi:10.1038/35107134

Carey, J. R., & Judge, D. S. (2001). Lifespan extension in humans is self-reinforcing: A general theory of longevity. Population and Development Review, 27, 411–436. doi:10.1111/j.1728-4457.2001.00411.x

Cartmill, E. A., & Byrne, R. W. (2010). Semantics of primate gestures: Intentional meanings of orangutan gestures. Animal Cognition, 13, 793–804. doi:10.1007/s10071-010-0328

Catchpole, C. K. (1987). Bird song, sexual selection and female choice. Trends in Ecology and Evolution, 2, 94–97. doi:10.1016/0169-5347(87)90165-0

Catchpole, C. K., & Slater, P. J. B. (2008). Bird song: Biological Themes and Variations (2nd ed.). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511754791

Clark, A. P., & Wrangham, R. W. (1993). Acoustic analysis of wild chimpanzee pant hoots: Do Kibale forest chimpanzees have an acoustically distinct arrival pant hoot? American Journal of Primatology, 31¬, 99–109. doi:10.1002/ajp.1350310203

Clark, R. B. (1949). Some statistical information about wren song. British Birds, 42, 337–346.

Codd, J. R. (2010). Uncinate processes in birds: Morphology, physiology and function. Comparative Biochemistry and Physiology, Part A, 156, 303–308. doi:10.1016/j.cbpa.2009.12.005

Codd, J. R., Boggs, D. F., Perry, S. F., & Carrier, D. R. (2005). Activity of three muscles associated with the uncinate processes in the giant Canada goose Branta canadensis maximus. Journal of Experimental Biology, 208, 849–857. doi:10.1242/jeb.01489

Colbert-White, E. N., Covington, M. A., & Fragaszy, D. M. (2011). Social context influences the vocalizations of a home-raised African Grey parrot (Psittacus erithacus erithacus). Journal of Comparative Psychology, 125, 175–184. doi:10.1037/a0022097

Corballis, M. C. (1991). The Lopsided Ape: Evolution of the Generative Mind. New York: Oxford University Press.

Corballis, M. C. (2003). From mouth to hand: Gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences, 26, 199–260. doi:10.1017/S0140525X03000062

Corballis, M. C. (2010). The gestural origins of language. WIREs Cognitive Science, 1, 2–7. doi:10.1002/wcs.2

Cruickshank, A. J., Gautier, J.-P., & Chappuis, C. (1993). Vocal mimicry in wild African Grey parrots Psittacus erithacus. Ibis, 135, 293–299. doi:10.1111/j.1474-919X.1993.tb02846.x

Curtiss, S. (1979). Genie: Language and cognition. UCLA Working Papers in Cognitive Linguistics, 1, 15–62.

Dalziell, A. H., & Magrath, R. D. (2012). Fooling the experts: Accurate vocal mimicry in the song of the superb lyrebird, Menura novaehollandiae. Animal Behaviour, 83, 1401–1410. doi:10.1016/j.anbehav.2012.03.009

Davidson, T. M. (2003). The great leap forward: The anatomic basis for the acquisition of speech and obstructive sleep apnea. Sleep Medicine, 4, 185–194. doi:10.1016/S1389-9457(02)00237-X

de Boer, B., & Fitch, W. T. (2010). Computer models of vocal tract evolution: An overview and critique. Adaptive Behavior, 18, 36–47. doi:10.1177/1059712309350972

de Waal, F. B. M. (1988). The communicative repertoire of captive bonobos (Pan paniscus), compared to that of chimpanzees. Behaviour, 106(3/4), 183–251. doi:10.1163/156853988X00269

Deacon, T. W. (1997). The Symbolic Species: The Co-Evolution of Language and the Brain. New York: Norton.

Del Hoyo, J., Elliot, A., & Sargatal, J. (1992). Handbook of the Birds of the World. Barcelona: Lynx Editions.

Diamond, A. S. (1959). The History and Origin of Language. London: Methuen.

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22, 567–631. doi:10.1146/annurev.neuro.22.1.567

Duchin, L. E. (1990). The evolution of articulate speech: Comparative anatomy of the oral cavity in Pan and Homo. Journal of Human Evolution, 19, 687–697. doi:10.1016/0047-2484 (90)90003-T

Dunbar, R. I. M. (1988). Primate Social Systems. Ithaca, NY: Cornell University Press. doi:10.1007/978-1-4684-6694-2

Dunbar, R. I. M. (1991). Functional significance of social grooming in primates. Folia Primatologica, 57, 121–131. doi:10.1159/000156574

Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution, 22, 469–493. doi:10.1016/0047-2484(92)90081-J

Dunbar, R. I. M. (2003). The social brain: Mind, language, and society in evolutionary perspective. Annual Review of Anthropology, 32, 163–181. doi:10.1146/annurev.anthro.32.061002.093158

Elemans, C. P. H., Muller, M., Larsen, O. N., & van Leeuwen, J. L. (2009). Amplitude and frequency modulation control of sound production in a mechanical model of the avian syrinx. Journal of Experimental Biology, 212, 1212–1224. doi:10.1242/jeb.026872

Falk, D. (1980). Language, handedness, and primate brains: Did the Australopithecines sign? American Anthropologist, 82(1), 72–78. doi:10.1525/aa.1980.82.1.02a00040

Falk, D. (1983). Cerebral cortices of East African early hominids. Science, 221(4615), 1072–1074. doi:10.1126/science.221.4615.1072

Farabaugh, S. M., & Dooling, R. J. (1996). Acoustic communication in parrots: Laboratory and field studies of budgerigars, Melopsittacus undulatus. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and Evolution of Acoustic Communication in Birds (pp. 97–117). Ithaca, NY: Cornell University Press.

Feenders, G., Liedvogel, M., Rivas, M., Zapka, M., Horita, H., Hara, E., . . . Jarvis, E. D. (2008). Molecular mapping of movement-associated areas in the avian brain: A motor theory for vocal learning origin. PLoS ONE, 3, e1768. doi:10.1371/journal.pone.0001768

Ficken, M. S., Ficken, R. W., & Witkin, S. R. (1978). Vocal repertoire of the black-capped chickadee. The Auk, 95 (1), 34–48. doi:10.2307/4085493

Finlay, B. L., Cheung, D., & Darlington, R. B. (2005). Developmental constraints on or developmental structure in brain evolution? In Y. Munakata & M. Johnson (Eds.), Attention and Performance XXI: Process of Change in Brain and Cognitive Development (pp. 131–162). Oxford: Oxford University Press.

Fitch, W. T. (2000a). Comparative vocal production and the evolution of speech: Reinterpreting the descent of the larynx. In A. Wray (Ed.), The Transition to Language (pp. 21–45). Oxford: Oxford University Press.

Fitch, W. T. (2000b). The evolution of speech: A comparative review. Trends in Cognitive Sciences, 4, 258–267. doi:10.1016/S1364-6613(00)01494-7

Fitch, W. T. (2000c). The phonetic potential of nonhuman vocal tracts: Comparative cineradiographic observations of vocalizing animals. Phonetica, 57, 205–218. doi:10.1159/000028474

Fitch, W. T. (2010). The Evolution of Language. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511817779

Fitch, W. T., & Hauser, M. D. (1995). Vocal production in nonhuman primates: Acoustics, physiology, and functional constraints on “honest” advertisement. American Journal of Primatology, 37, 191–219. doi:10.1002/ajp.1350370303

Fitch, W. T., & Jarvis, E. D. (2013). Birdsong and other animal models for human speech, song, and vocal learning. In M. A. Arbib (Ed.), Language, Music, and the Brain: A Mysterious Relationship (pp. 499–540). Cambridge, MA: MIT Press.

Fitch, W. T., & Reby, D. (2001). The descended larynx is not uniquely human. Proceedings of the Royal Society of London B , 268, 1669–1675. doi:10.1098/rspb.2001.1704

Fossey, D. (1972). Vocalizations of the mountain gorilla (Gorilla gorilla berengei). Animal Behaviour, 20, 36–53. doi:10.1016/S0003-3472(72)80171-4

Freeberg, T. M., & Harvey, E. M. (2008). Group size and social interactions are associated with calling behavior in Carolina chickadees (Poecile carolinensis). Journal of Comparative Psychology, 122, 312–318. doi:10.1037/0735-7036.122.3.312

Galaburda, A. M., LeMay, M., Kemper, T. L., & Geschwind, N. (1978). Right–left asymmetrics in the brain. Science, 199, 852–856. doi:10.1126/science.341314

Galdikas, B. M. F., & Teleki, G. (1981). Variations in subsistence activities of female and male pongids: New perspectives on the origins of hominid labor division. Current Anthropology, 22(3), 241–256. doi:10.1086/202662

Gannon, P. J., Holloway, R. L., Broadfield, D. C., & Braun, A. R. (1998). Asymmetry of chimpanzee planum temporale: Humanlike pattern of Wernicke’s brain language area homolog. Science, 279, 220–222. doi:10.1126/science.279.5348.220

Geissler, D. B., & Ehret, G. (2004). Auditory perception vs. recognition: Representation of complex communication sounds in the mouse auditory cortical fields. European Journal of Neuroscience, 19, 1027– 1040. doi:10.1111/j.1460-9568.2004.03205.x

Geissmann, T. (1999). Duet songs of the siamang, Hylobates syndactylus: II. Testing the pair-bonding hypothesis during a partner exchange. Behaviour, 136(8), 1005–1039.

Geissmann, T. (2002). Duet-splitting and the evolution of gibbon songs. Biological Review, 77, 57–76. doi:10.1017/S1464793101005826

Gentilucci, M., & Corballis, M. C. (2006). From manual gesture to speech: A gradual transition. Neuroscience and Biobehavioral Reviews, 30, 949–960. doi:10.1016/j.neubiorev.2006.02.004

George, J. C., Bada, J., Zeh, J., Scott, L., Brown, S. E., O’Hara, T., & Suydam, R. (1999). Age and growth estimates of bowhead whales (Balaena mysticetus) via aspartic acid racemization. Canadian Journal of Zoology, 77, 571–580. doi:10.1139/z99-015

Geschwind, N., & Levitsky, W.(1968). Human brain: Left–right asymmetries in temporal speech region. Science, 161 (3837), 186–187. doi:10.1126/science.161.3837.186

Ghazanfar, A., & Hauser, M. (1999). The neuroethology of primate vocal communication: Substrates for the evolution of speech. Trends in Cognitive Sciences, 3, 377–384. doi:10.1016/S1364-6613(99) 01379-0

Ghazanfar, A. A., & Rendall, D. (2008). Evolution of human vocal production. Current Biology, 18(11), R457– R460. doi:10.1016/j.cub.2008.03.030

Gibbons, J. W. (1987). Why do turtles live so long? BioScience, 37(4), 262–269. doi:10.2307/1310589

Gillam, E. H., & Chaverri, G. (2012). Strong individual signatures and weaker group signatures in contact calls of Spix’s disc-winged bat, Thyroptera tricolor. Animal Behaviour, 83, 269–276. doi:10.1016/j.anbehav.2011.11.002

Goodall, J. (1986). The Chimpanzees of Gombe: Patterns of Behavior. Cambridge, MA: Harvard University Press.

Gottlieb, G. (1963). A naturalistic study of imprinting in wood ducklings (Aix sponsa). Journal of Comparative and Physiological Psychology, 56, 86–91. doi:10.1037/h0046285

Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341–363. doi:10.1093/applin/11.4.341

Greene, M. C. L., & Mathieson, L. (1989). The Voice and Its Disorders (5th ed). London: Whurr Publishers.

Haimoff, E. F. (1983).

Occurrence of antiresonance in the song of siamang Hylobates syndactylus. American Journal of Primatology, 5, 249–256. doi:10.1002/ajp.1350050309

Harcourt, A. H., Stewart, K. J., & Hauser, M. (1993). Functions of wild gorilla ‘close’ calls. I. Repertoire, context, and interspecific comparison. Behaviour, 124, 89–122. doi:10.1163/156853993X00524

Hardus, M., Lameira, A., Singleton, I., Morrogh-Bernard, H., Knott, C., Ancrenaz, M., . . . Wich, S. (2009). A description of the orangutan’s vocal and sound repertoire, with a focus on geographic variation. In S. Wich (Ed.), Orangutans: Geographic variation in behavioral ecology and conservation (pp. 49-64). Oxford: Oxford University Press.

Harris, M. A., & Lemon, R. E. (1972). Songs of song sparrows (Melospiza melodia): Individual variation and dialects. Canadian Journal of Zoology, 50, 301–309. doi:10.1139/z72-041

Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. doi:10.1126/science.298.5598.1569

Hayes, K. J., & Hayes, C. (1951). The intellectual development of a home-raised chimpanzee. Proceedings of the American Philosophical Society, 95(2), 105–109.

Heaton, J. T., & Brauth, S. E. (1999). Effects of deafening on the development of nestling and juvenile vocalizations in budgerigars (Melopsittacus undulatus). Journal of Comparative Psychology, 113, 314–320. doi:10.1037/0735-7036.113.3.314

Hewes, G. W. (1973). Primate communication and the gestural origins of language. Current Anthropology, 14, 5–24. doi:10.1086/204019

Hobaiter, C., & Byrne, R. W. (2011a). The gestural repertoire of the wild chimpanzee. Animal Cognition, 14, 745– 767. doi:10.1007/s10071-011-0409-2

Hobaiter, C., & Byrne, R. W. (2011b). Serial gesturing by wild chimpanzees: Its nature and function for communication. Animal Cognition, 14, 827–838. doi:10.1007/s10071-011-0416-3

Holloway, R., & de la Coste-Lareymondie, C. (1982). Brain endocast asymmetry in pongids and hominids: Some preliminary findings on the paleontology of cerebral dominance. American Journal of Physical Anthropology, 58, 101–110. doi:10.1002/ajpa.1330580111

Hopkins, W. D., Taglialatela, J., & Leavens, D. A. (2007). Chimpanzees differentially produce novel vocalizations to capture the attention of a human. Animal Behaviour, 73, 281–286. doi:10.1016/j.anbehav.2006.08.004

Horwitz, B., Amunts, K., Bhattacharyya, R., Patkin, D., Jeffries, K., Zilles, K., & Braun, A. R. (2003). Activation of Broca’s area during the production of spoken and signed language: A combined cytoarchitectonic mapping and PET analysis. Neuropsychologia, 41, 1868–1876. doi:10.1016/S0028-3932(03)00125-8

Howard, R. D. (1974). The influence of sexual selection and interspecific competition on mockingbird song (Mimus polyglottos). Evolution, 28(3), 428–438. doi:10.2307/2407164

Ito, H., Ishikawa, Y., Yoshimoto, M., & Yamamoto, N. (2007). Diversity of brain morphology in teleosts: Brain and ecological niche. Brain, Behavior & Evolution, 69, 76–86. doi:10.1159/000095196

Janik, V. M., Sayigh, L. S. (2013). Communication in bottlenose dolphins: 50 years of signature whistle research. Journal of Comparative Physiology A, 199, 479–489. doi:10.1007/s00359-013-0817-7

Janik, V. M., & Slater, P. J. B. (1997). Vocal learning in mammals. Advances in the Study of Behavior, 26, 59–99. doi:10.1016/S0065-3454(08)60377-0

Jarvis, E. D. (2004). Learned birdsong and the neurobiology of human language. Annals of the New York Academy of Sciences, 1016 , 749–777. doi:10.1196/annals.1298.038

Jarvis, E. D. (2006). Selection for and against vocal learning in birds and mammals. Ornithological Science, 5, 5–14. doi:10.2326/osj.5.5

Jarvis, E. D. (2007). Neural systems for vocal learning in birds and humans: A synopsis. Journal of Ornithology, 148, S35–S44. doi:10.2326/osj.5.5

Jarvis, E. D., & Mello, C. V. (2000). Molecular mapping of brain areas involved in parrot vocal communication. The Journal of Comparative Neurology, 419, 1–31. doi:10.1002/(SICI)1096-9861 (20000327)419:13.0.CO;2-M

Jürgens, U. (1995). Neuronal control of vocal production in nonhuman and human primates. In E. Zimmermann, J. D. Newman, & U. Jürgens (Eds.), Current Topics in Primate Vocal Communication (pp. 199–206). New York: Plenum Press.

Jürgens, U. (2002). Neural pathways underlying vocal production. Neuroscience & Biobehavioral Reviews, 26, 235–258. doi:10.1016/S0149-7634(01)00068-9

Kaminski, J., Call, J., & Fischer, J. (2004). Word learning in a domesticated dog: Evidence for “fast mapping.” Science, 304, 1682–1683. doi:10.1126/science.1097859

Kay, R. F., Cartmill, M., & Balow, M. (1998). The hypoglossal canal and the origin of human vocal behavior. Proceedings of the National Academy of Sciences, 95(9), 5417–5419. doi:10.1073/pnas.95.9.5417

Keller, G. B., & Hahnloser, H. R. (2009). Neural processing of auditory feedback during vocal practice in a songbird. Nature, 457, 187–190. doi:10.1038/nature07467

Kipper, S., Mundry, R., Sommer, C., Hultsch, H., & Todt, D. (2006). Song repertoire size is correlated with body measures and arrival date in common nightingales, Luscinia megarhynchos. Animal Behaviour, 71, 211–217. doi:10.1016/j.anbehav.2005.04.011

Klima, E. S., & Bellugi, U. (1979). The Signs of Language. Cambridge, MA: Harvard University Press.

Knight, C. (1998). Ritual/speech coevolution: A solution to the problem of deception. In J. R. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the Evolution of Language (pp. 68–91). Cambridge: Cambridge University Press.

Konishi, M. (1963). The role of auditory feedback in the vocal behavior of the domestic fowl. Zeitschrift für Tierpsychologie, 2, 349–367. doi:10.1111/j.1439-0310.1963.tb01156.x

Kroodsma, D. E., & Miller, E. H. (1996). Ecology and Evolution of Acoustic Communication in Birds. Ithaca, NY: Cornell University Press.

Kuhl, P. K. (2003). Human speech and birdsong: Communication and the social brain. Proceedings of the National Academy of Sciences, 100, 9645–9646. doi:10.1073/pnas.1733998100

Kuhl, P. K. (2007). Is speech learning ‘gated’ by the social brain? Developmental Science, 10, 110–120. doi:10.1111/j.1467-7687.2007.00572.x

Ladefoged, P. (1968). Linguistic aspects of respiratory phenomena. Annals of the New York Academy of Sciences, 155, 141–151. doi:10.1111/j.1749-6632.1968.tb56758.x

Laitman, J. T., Heimbuch, R. C., & Crelin, E. S. (1978). Developmental change in a basicranial line and its relationship to the upper respiratory system in living primates. American Journal of Anatomy, 152, 467–482. doi:10.1002/aja.1001520403

Lameira, A. R., Hardus, M. E., Kowalsky, B., de Vries, H., Spruijt, B., Sterck, E. H. M., . . . Wich, S. A. (2013). Orangutan (Pongo spp.) whistling and implications for the emergence of an open-ended call repertoire: A replication and extension. Journal of the Acoustical Society of America, 134, 2326–2335. doi:10.1121/1.4817929

Lavenex, P. B. (2000). Lesions in the budgerigar vocal control nucleus NLc affect production, but not memory, of English words and natural vocalizations. The Journal of Comparative Neurology, 421, 437–460. doi:10.1002/(SICI)1096-9861(20000612)421:43.0.CO;2-A

Levinson, S. T. (1980). The social behavior of the White-fronted Amazon (Amazona albifrons). In Conservation of New World Parrots: International Council for Bird Preservation, Technical Publication No. 1 (R. F. Pasquier, Ed.), (pp. 403–417). Washington, DC: Smithsonian Institution Press.

Lieberman, P. (1984). The Biology and Evolution of Language. Cambridge, MA: Harvard University Press.

Lieberman, P., & Crelin, E. S. (1971). On the speech of Neanderthal man. Linguistic Inquiry, 2(2), 203–222.

Lieberman, P., Crelin, E. S., & Klatt, D. H. (1972). Phonetic ability and related anatomy of the newborn and adult human, Neanderthal man, and the chimpanzee. American Anthropologist, 74, 287–307. doi:10.1525/aa.1972.74.3.02a00020

Lieberman, P., & McCarthy, R. (2007). Tracking the evolution of language and speech: Comparing vocal tracts to identify speech capabilities. Expedition, 49, 15–20.

Lieberman, D. E., McCarthy, R. C., Hiiemae, K. M., & Palmer, J. B. (2001). Ontogeny of postnatal hyoid and larynx descent in humans. Archives of Oral Biology, 46, 117–128. doi:10.1016/S0003-9969(00)00108-4

Liska, J. (1993). Bee dances, bird songs, monkey calls, and cetacean sonar: Is speech unique? Western Journal of Communication, 57, 1–26. doi:10.1080/10570319309374428

Luchsinger, R., & Arnold, G. E. (1965). Voice, Speech, Language. London: Constable.

MacLarnon, A. M., & Hewitt, G. P. (1999). The evolution of human speech: The role of enhanced breathing control. American Journal of Physical Anthropology, 109, 341–363. doi:10.1002/(S ICI)1096-8644(199907) 109:33.0.CO;2-2

MacLarnon, A., & Hewitt, G. (2004). Increased breathing control: Another factor in the evolution of human language. Evolutionary Anthropology, 13, 181–197. doi:10.1002/evan.20032

MacNeilage, P. N. (2008). The Origin of Speech. Oxford: Oxford University Press.

Marler, P. (1970a). Birdsong and speech development: Could there be parallels? American Scientist, 58, 669–673.

Marler, P. (1970b). A comparative approach to vocal learning: Song development in white-crowned sparrows. Journal of Comparative Physiological Psychology, 71, 1–25. doi:10.1037/h0029144

Marler, P. (1977). The evolution of communication. In T. A. Sebeok (Ed.), How Animals Communicate (pp. 45– 70). Bloomington: Indiana University Press.

Marler, P., & Mitani, J. (1988). Vocal communication in primates and birds: Parallels and contrasts. In D. Todt, P. Goedeking, & D. Symmes (Eds.), Primate Vocal Communication (pp. 3–14). Berlin: Springer.

Marler, P., & Tenaza, R. (1977). Signaling behavior of apes with special reference to vocalizations. In T. A. Sebeok (Ed.), How Animals Communicate (pp. 965–1033). Bloomington: Indiana University Press.

Matsunaga, E., Kato, M., Okanoya, K. (2008). Comparative analysis of gene expressions among avian brains: A molecular approach to the evolution of vocal learning. Brain Research Bulletins, 75, 474–479. doi:10.1016/j.brainresbull.2007.10.045

McCasland, J. S., & Konishi, M. (1981). Interaction between auditory and motor activities in an avian song control nucleus. Proceedings of the National Academy of Sciences, 78(12), 7815–7819. doi:10.1073/pnas.78.12.7815

McCowan, B., Doyle, L. R., & Hanser, S. F. (2002). Using information theory to assess the diversity, complexity and development of communicative repertoires. Journal of Comparative Psychology, 116, 166–172. doi:10.1037/0735-7036.116.2.166

McCowan, B., & Reiss, D. (1997). Vocal learning in captive bottlenose dolphins: A comparison with humans and nonhuman animals. In C. T. Snowdon & M. Hausberger (Eds.), Social Influences on Vocal Development (pp. 178–207). Cambridge: Cambridge University Press.

McElligott, A. G., Birrer, M., & Vannoni, E. (2006). Retraction of the mobile descended larynx during groaning enables fallow bucks (Dama dama) to lower their formant frequencies. Journal of Zoology, 270, 340–345. doi:10.1111/j.1469-7998.2006.00144.x

Melin, A., Young, H., Mosdossy, K., & Fedigan, L. (in press). Seasonality, extractive foraging and the evolution of primate sensorimotor intelligence. Journal of Human Evolution.

Moorman, S., Gobes, S. M. H., Kuijpers, M., Kerkhofs, A., Zandbergen, M. A., & Bolhuis, J. J. (2012). Human-like brain hemispheric dominance in birdsong learning. Proceedings of the National Academy of Sciences, 109, 12782–12787. doi:10.1073/pnas.1207207109

Morton, E. S., & Page, J. (1992). Animal Talk: Science and the Voices of Nature. New York: Random House.

Müller, A. E., & Anzenberger, G. (2002). Duetting in the titi monkey Callicebus cupreus: Structure, pair specificity and development of duets. Folia Primatologica, 73, 104–115. doi:10.1159/000064788

Nation, I. S. P., & Waring, R. (1997). Vocabulary size, text coverage, and word lists. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, Acquisition and Pedagogy (pp. 6–19). Cambridge: Cambridge University Press.

Nishimura, T., Mikami, A., Suzuki, J., & Matsuzawa, T. (2003). Descent of the larynx in chimpanzee infants. Proceedings of the National Academy of Sciences, 100, 6930–6933. doi:10.1073/pnas.1231107100

Nishimura, T., Oishi, T., Suzuki, J., Matsuda, K., & Takahashi, T. (2008). Development of the supralaryngeal vocal tract in Japanese macaques: Implications for the evolution of the descent of the larynx. American Journal of Physical Anthropology, 135, 182–194. doi:10.1002/ajpa.20719

Noack, H. R. (1902). Vocal powers of the yellow-billed magpie. The Condor, 4(4), 78–79. doi:10.2307/1361063

Nottebohm, F. (1970). Ontogeny of bird song. Science, 167(3920), 950–956. doi:10.1126/science.167.3920.950

Nottebohm, F. (1971). Neural lateralization of vocal control in a passerine bird. 1 Song. Journal of Experimental Zoology, 177, 229–262. doi:10.1002/jez.1401770210

Nottebohm, F. (1972). The origins of vocal learning. American Naturalist, 106(947), 116–140. doi:10.1086/282756

Nottebohm, F. (1977). Asymmetries in neural control of vocalization in the canary. In S. Harnad Lateralization in the nervous system (pp. 23-44). New York: Academic Press.

Nottebohm, F., & Arnold, A. P. (1976). Sexual dimorphism in vocal control areas of the songbird brain. Science, 194, 211–213. doi:10.1126/science.959852

Nottebohm, F., & Nottebohm, M. E. (1971). Vocalizations and breeding behavior of surgically deafened ring doves (Streptopelia risoria). Animal Behaviour, 19, 313–327. doi:10.1016/S0003-3472 (71)80012-X

Ocklenburg, S., & Güntükün, O. (2012). Hemispheric asymmetries: The comparative view. Frontiers in Psychology, 3, 1– 9. doi:10.3389/fpsyg.2012.00005

O’Connor, R. J. (1984). The Growth and Development of Birds. New York: Wiley & Sons.

Passingham, R. E. (1981). Broca’s area and the origins of human vocal skill. Philosophical Transactions of the Royal Society of London Series B, 292, 167–175. doi:10.1098/rstb.1981.0025

Paton, J. A., Manogue, K. R., & Nottebohm, F. (1981). Bilateral organization of the vocal control pathway in the budgerigar, Melopsittacus undulatus. Journal of Neuroscience, 1(11), 1279–1288.

Pegg, C. (1992). Mongolian conceptualizations of overtone singing (xöömii). British Journal of Ethnomusicology, 1, 31–54. doi:10.1080/09681229208567199

Pepperberg, I. M. (1987). Interspecies communication: A tool for assessing conceptual abilities in the African Grey parrot. In G. Greenberg & E. Tobach (Eds.), Language, Cognition, Consciousness: Integrative Levels (pp. 31–56). Hillsdale, NJ: Erlbaum Associates.

Pepperberg, I. M. (1992). A review of the effects of social interaction on vocal learning. Netherlands Journal of Zoology, 43(1–2), 104–124. doi:10.1163/156854293X00241

Pepperberg, I. M. (1999). The Alex Studies. Cambridge, MA: Harvard University Press.

Pepperberg, I. M. (2006). Cognitive and communicative abilities of Grey parrots. Applied Animal Behaviour Science, 100, 77–86. doi:10.1016/j.applanim.2006.04.005

Pepperberg, I. M. (2007). Grey parrots do not always ‘parrot’: The roles of imitation and phonological awareness on the creation of new labels from existing vocalizations. Language Sciences, 29, 1–13. doi:10.1016/j.langsci.2005.12.002

Pepperberg, I. M. (2010). Vocal learning in Grey parrots: A brief review of perception, production, and cross-species comparisons. Brain and Language, 115, 81–91. doi:10.1016/j.bandl.2009.11.002

Perlman, M., Patterson, F. G., & Cohn, R. H. (2012). The human-fostered gorilla Koko shows breath control in play with wind instruments. Biolinguistics, 6(3–4), 433–444.

Petersen, M. R., Beecher, M. D., Zoloth, S. R., Moody, D. B., & Stebbins, W. C. (1978). Neural lateralization of species- specific vocalizations by Japanese macaques (Macaca fuscata). Science, 202, 324–327. doi:10.1126/science.99817

Petitto, L. A., & Marentette, P. F. (1991). Babbling in the manual mode: Evidence for the ontogeny of language. Science, 251, 1493–1496. doi:10.1126/science.2006424

Petkov, C. I., & Jarvis, E. D. (2012). Birds, primates, and spoken language origins: Behavioral phenotypes and neurobiological substrates. Frontiers in Evolutionary Neuroscience, 4, 1–24. doi:10.3389/fnevo.2012.00012

Philips, M., & Austad, S. N. (1990). Animal communication and social evolution. In M. Bekoff & D. Jamieson (Eds.), Interpretation and Explanation in the Study of Animal Behavior. Vol. 1 Interpretation, Intentionality and Communication (pp. 254–268). Boulder, CO: Westview.

Pinker, S. (2003). Language as an adaptation to the cognitive niche. In M. H. Christiansen & S. Kirby (Eds.), Language Evolution (pp. 16–37). Oxford: Oxford University Press.

Pollick, A. S., & de Waal, F. B. M. (2007). Ape gestures and language evolution. Proceedings of the National Academy of Sciences of the United States of America, 104, 8184–8189. doi:10.1073/pnas.0702624104

Poole, J. H., Tyack, P. L., Stoeger-Horwath, A. S., & Watwood, S. (2005). Elephants are capable of vocal learning. Nature, 434, 455–456. doi:10.1038/434455a

Portmann, A. (1946). Études sur la cérébralisation chez les oiseaux: I. Alauda, 14, 2–20.

Prather, J. F., Peters, S., Nowicki, S., & Mooney, R. (2008). Precise auditory-vocal mirroring in neurons for learned vocal communication. Nature, 451, 305–310. doi:10.1038/nature06492

Pulleyblank, E. G. (2008). Language as digital: A new theory of the origin and nature of human speech. Proceedings of the 20th North American Conference on Chinese Linguistics, 1, 1–20.

Quick, N. J., & Janik, V. M. (2012). Bottlenose dolphins exchange signature whistles when meeting at sea. Proceedings of the Royal Society B, 279, 2539–2545. doi:10.1098/rspb.2011.2537

Ralls, K., Fiorelli, P., & Gish, S. (1985). Vocalizations and vocal mimicry in captive harbor seals, Phoca vitulina. Canadian Journal of Zoology, 63, 1050–1056. doi:10.1139/z85-157

Reader, S. M., & Laland, K. N. (2002). Social intelligence, innovation, and enhanced brain size in primates. Proceedings of the National Academy of Sciences of the United States of America, 99, 4436– 4441. doi:10.1073/pnas.062041299

Rendell, L., & Whitehead, H. (2001). Culture in whales and dolphins. Behavioral and Brain Sciences, 24, 309–382. doi:10.1017/S0140525X01243969

Ridgway, S., Carder, D., Jeffries, M., & Todd, M. (2012). Spontaneous human speech mimicry by a cetacean. Current Biology, 22, R860–R861. doi:10.1016/j.cub.2012.08.044.

Riede, T., Fisher, J. H., & Goller, F. (2010). Sexual dimorphism of the zebra finch syrinx indicates adaptation for high fundamental frequencies in males. PLoS ONE, 5, e11368. doi:10.1371/journal.pone.0011368

Rizzolatti, G., & Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21, 188–194. doi:10.1016/S0166-2236(98)01260-0

Rizzolatti, G., & Sinigaglia, C. (2008). Mirrors in the Brain. How We Share Our Actions and Emotions. New York: Oxford University Press.

Robinson, B. W. (1967). Vocalization evoked from the forebrain in Macaca mulatta. Physiology and Behavior, 2, 345–354. doi:10.1016/0031-9384(67)90050-9

Robinson, G. E., Fernald, R. D., & Clayton, D. (2008). Genes and social behavior. Science, 322, 896–900. doi:10.1126/science.1159277

Rogers, L. J. (1980). Lateralisation in the avian brain. Bird Behavior, 2(1), 1–12. doi:10.3727/015613880791573835

Salinas-Melgoza, A., & Wright, T. F. (2012). Evidence for vocal learning and limited dispersal as dual mechanisms for dialect maintenance in a parrot. PLoS ONE, 7, e48667. doi:10.1371/journal.pone.0048667

Salmi, R., Hammerschmidt, K., & Doran-Sheehy, D. M. (2013). Western gorilla vocal repertoire and contextual use of vocalizations. Ethology, 119, 831–847. doi:10.1111/eth.12122

Salwiczek, L. H., & Wickler, W. (2004). Birdsong: An evolutionary parallel to human language. Semiotica, 151, 163– 182. doi:0037–1998/04/0151–0163

Sanvito, S., Galiberti, F., & Miller, E. H. (2007). Observational evidences of vocal learning in southern elephant seals: A longitudinal study. Ethology, 113, 137–146. doi:10.1111/j.1439-0310.2006.01306.x

Sasaki, C. T., Levine, P. A., Laitman, J. T., & Crelin, E. S. (1977). Postnatal descent of the epiglottis in man. Archives of Otolaryngology, 103, 169–171. doi:10.1001/archotol.1977.00780200095011

Saunders, D. A. (1983). Vocal repertoire and individual vocal recognition in the short-billed white-tailed black cockatoo, Calyptorhynchus funereuslatirostris Carnaby. Australian Wildlife Research, 10, 527–536. doi:10.1071/WR9830527

Savage-Rumbaugh, S., & McDonald, K. (1988). Deception and social manipulation in symbol-using apes. In R. W. Bryne & A. Whiten A. (Eds.), Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes, and Humans (pp. 224–237). New York: Clarendon Press/Oxford University Press.

Savage-Rumbaugh, S., Shanker, S. G., & Taylor, T. J. (1998). Apes, Language, and the Human Mind. New York: Oxford University Press.

Sawaguchi, T., & Kudo, H. (1990). Neocortical development and social structure in primates. Primates, 31, 283– 290. doi:10.1007/BF02380949

Schel, A. M., Machanda, Z., Townsend, S. W., Zuberbühler, K., & Slocombe, K. (2013). Chimpanzee food calls are directed at specific individuals. Animal Behaviour, 86, 955–965. dx.doi.org/10.1016/j.anbehav.2013.08.013

Schwagmeyer, P. L., Bartlett, T. L., & Schwabl, H. G. (2008). Dynamics of house sparrow biparental care: What contexts trigger partial compensation? Ethology, 114, 459–468. doi:10.1111/j.1439- 0310.2008.01480.x

Seibert, L. M. (2006). Social behavior of Psittacine birds. In A. U. Luescher (Ed.), Manual of Parrot Behavior (pp. 43–48). Ames, IA: Blackwell Publishing.

Sekulic, R., & Chivers, D. J. (1986). The significance of call duration in howler monkeys. International Journal of Primatology, 7, 183–190. doi:10.1007/BF02692317

Sherwood, C. C., Broadfield, D. C., Holloway, R. L., Gannon, P. J., & Hof, P. R. (2003). Variability of Broca’s area homologue in African great apes: Implications for language evolution. The Anatomical Record Part A, 271A, 276–285. doi:10.1002/ar.a.10046

Simonyan, K., & Horwitz, B. (2011). Laryngeal motor cortex and control of speech in humans. Neuroscientist, 17, 197–208. doi:10.1177/1073858410386727

Stahl, W. R. (1967). Scaling of respiratory variables in mammals. Journal of Applied Physiology, 22, 453–460.

Stoddard, P. K. (1996). Vocal recognition of neighbors by territorial passerines. In D. E. Kroodsma & E. H. Miller (Eds.), Ecology and Evolution of Acoustic Communication in Birds (pp. 356–374). Ithaca, NY: Cornell University Press.

Stoeger, A. S., Mietchen, D., Oh, S., de Silva, S., Herbst, C. T., Kwon, S., & Fitch, T. (2012). An Asian elephant imitates human speech. Current Biology, 22, 2144–2148. doi:10.1016/j.cub.2012.09.022

Striedter, G. F. (1994). The vocal control pathways in budgerigars differ from those in songbirds. Journal of Comparative Neurology, 343, 35–56. doi:10.1002/cne.903430104

Studdert-Kennedy, M. (1998). The particulate origins of language generativity: From syllable to gesture. In J. R Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the Evolution of Language (pp. 169–176). Cambridge: Cambridge University Press.

Sturdy, C. B., Wild, J. M., & Mooney, R. (2003). Respiratory and telencephalic modulation of vocal motor neurons in the zebra finch. Journal of Neuroscience, 1(3), 1072–1086.

Suthers, R., Goller, F., & Pytte, C. (1999). The neuromuscular control of birdsong. Philosophical Transactions of the Royal Society B, 354, 927–939. doi:10.1098/rstb.1999.0444

Tchernichovski, O., Lints, T., Mitra, P. P., & Nottebohm, F. (1999). Vocal imitation in zebra finches is inversely related to model abundance. Proceedings of the National Academy of Sciences, 96(22), 12901– 12904. doi:10.1073/pnas.96.22.12901

Tchernichovski, O., & Nottebohm, F. (1998). Social inhibition of song imitation among sibling male zebra finches. Proceedings of the National Academy of Sciences, 95(15), 8951–8956. doi:10.1073/pnas.95.15.8951

Templeton, C. N., Greene, E., & Davis, K. (2005). Allometry of alarm calls: Black-capped chickadees encode information about predator size. Science, 308, 1934–1937. doi:10.1126/science.1108841

Teramitsu, I., Kudo, L. C., London, S. E., Geschwind, D. H., & White, S. A. (2004). Parallel FoxP1 and FoxP2 expression in songbird and human brain predicts functional interaction. The Journal of Neuroscience, 24, 3152–3163. doi:10.1523/JNEUROSCI.5589-03.2004

Thalmann, U., Geissmann, T., Simone, A., & Mutschler, T. (1993). The indris of Anjanaharibe-Sud, Northeastern Madagascar. International Journal of Primatology, 14, 357–381. doi:10.1007/BF02192772

Thomas, M. V., & Haas, R. C. (2004). Abundance, age structure, and spatial distribution of lake sturgeon (Acipenser fulvescens) in the St. Clair System. Michigan Department of Natural Resources, Lake St. Clair Fisheries Research Station, Harrison Township, MI. Fisheries Research Report, 2076.

Todt, D. (1975). Social learning of vocal patterns and modes of their applications in Grey parrots. Zeitschrift für Tierpsychologie, 39, 178–188. doi:10.1111/j.1439-0310.1975.tb00907.x

Tomasello, M. (2008). The Origins of Human Communication. Cambridge, MA: MIT Press.

Tomasello, M., & Call, J. (1997). Primate Cognition. Oxford: Oxford University Press.

Tu, H.-W., & Dooling, R. J. (2012). Perception of warble song in budgerigars (Melopsittacus undulates): Evidence for special processing. Animal Cognition, 15, 1151–1159. doi:10.10071-012-0539-1

Tu, H.-W., Osmanski, M. S., & Dooling, R. J. (2011). Learned vocalizations in budgerigars (Melopsittacus undulates): The relationship between contact calls and warble song. Journal of the Acoustical Society of America, 129, 2289–2297. doi:10.1121/1.3557035

Tucker, V. A. (1968). Respiratory exchange and evaporative water loss in the flying budgerigar. Journal of Experimental Biology, 48, 67–87.

Vargha-Khadem, F., Gadian, D. G., Copp, A., Mishkin, M. (2005). FOXP2 and the neuroanatomy of speech and language. Nature, 6, 131–138. doi:10.1038/nrn1605

West, M. J., Stroud, A. N., & King, A. P. (1983). Mimicry of the human voice by European starlings: The role of social interaction. The Wilson Bulletin, 95, 635–640.

Whangarei Native Bird Recovery Centre (n.d.). Woof Woof the Talking Tui. Retrieved from http://www.nbr.org.nz/node/7

Whitaker, H. A. (1976). Neurobiology of language. In E. C. Carterette & M. P. Friedman (Eds.), Handbook of Perception, vol. VII: Language and Speech (pp. 121–144). New York: Academic Press.

Wich, S. A., Krützen, M., Lameira, A. R., Nater, A., Arora, N., Bastian, M. L., . . . van Schaik, C. P. (2012). Call cultures in orang-utans? PLoS ONE, 7, e36180. doi:10.1371/journal.pone.0036180

Wild, J. M. (1997). Neural pathways for the control of birdsong production. Journal of Neurobiology, 33 (5), 653–670. doi:10.1002/(SICI)1097-4695(19971105)33:5<653::AID-

NEU11>3.0.CO;2-A Wild, J. M., Goller, F., & Suthers, R. A. (1998). Inspiratory muscle activity during birdsong. Journal of Neurobiology, 36, 441–453. doi:0.1002/(SICI)1097-4695(19980905) 36:33.0.CO;2-E

Williams, H., Crane, L. A., Hale, T. K., Esposito, M. A., & Nottebohm, F. (1992). Right-side dominance for song control in the zebra finch. Developmental Neurobiology, 23, 1006–1020. doi:10.1002/neu.480230807

Wind, J. (1983). Primate evolution and the emergence of speech. In E. de Groher (Ed.), Glossogenetics: The Origin and Evolution of Language (pp. 15–35). Paris: Harwood Academic Publishers.

Winkworth, A. L., Davis, P. J., Adams, R. D., & Ellis, E. (1995). Breathing patterns during spontaneous speech. Journal of Speech Hearing Research, 38, 124–144. doi:10.1044/jshr.3801.124

Winter, P., Handley, P., Ploog, D., & Schott, D. (1973). Ontogeny of squirrel monkey calls under normal conditions and under acoustic isolation. Behaviour, 47, 230–239. doi:10.1163/156853973X00085

Wong, J., Stewart, P. D., & MacDonald, D. W. (1999). Vocal repertoire in the European badger (Meles meles): Structure, context, and function. Journal of Mammalogy, 80(2), 570–588. doi:10.2307/1383302

Zann, R. (1990). Song and call learning in wild zebra finches in south-east Australia. Animal Behaviour, 40, 811–828. doi:10.1016/S0003-3472(05)80982-0

Zann, R., & Dunstan, E. (2008). Mimetic song in superb lyrebirds: Species mimicked and mimetic accuracy in different populations and age classes. Animal Behaviour, 76, 1043–1054. doi:10.1016/j.anbehav.2008.05.021

Zeveloff, S. I., & Boyce, M. S. (1982). Why human neonates are so altricial. The American Naturalist, 120(4), 537– 542. doi:10.1086/284010

Zlatev, J. (2002). Mimesis: The “missing link” between signals and symbols in phylogeny and ontogeny? In A. Pajunen (Ed.), Mimesis, Sign and Language Evolution (pp. 93–122). Turku, Finland: Turku University Press.

Zollikofer, C. P. E., Ponce de León, M. S., Lieberman, D. E., Guy, F., Pilbeam, D., Likius, A., . . . Brunet, M. (2005). Virtual cranial reconstruction of Sahelanthropus tchadensis. Nature, 434, 755– 759. doi:10.1038/nature03397

Zuberbühler, K., & Janmaat, K. R. L. (2010). Foraging cognition in non-human primates. In M. Platt & A. Ghazanfar (Eds.), Primate Neuroethology (pp. 64–83). Oxford: Oxford University Press.