Insights from Animals to Build Better Artificial Language Learners
Abstract
Recent advances in artificial intelligence have produced large language models that are proficient at processing and generating language. Lessons learned from comparative cognition will be pivotal to understanding the conditions under which different cognitive systems (both natural and artificial) might discover complex linguistic structures. Such lessons will fuel the development of more efficient learning algorithms. More important, they will help to advance a new science of intelligence that takes into account human, animal, and artificial systems.
Keywords: language, guided learning, artificial intelligence, LLMs
Work on comparative cognition is pivotal to advancing our comprehension of the conditions that shape different cognitive systems. For example, studying perceptual and computational abilities across species has framed our current understanding of what language is (e.g., Hauser et al., 2002). The past few years have witnessed a rise of artificial intelligence (AI) technologies. Although neural networks and deep learning algorithms, the stepping stones of current AI systems, have been well known for a long time, recent developments in generative AI systems (systems that can create sentences, music, or pictures once they have been trained with a relevant data set) have opened the door to many rapid advances in the field. This technology offers an opportunity to broaden comparative research to understand the similarities and differences across human, animal, and artificial forms of cognition. By doing so, work on comparative cognition could play a central role in a new science of intelligence. It could also help in the development of more efficient AI algorithms by unveiling the factors that guide learning across different species.
Processing Complex Grammars
Studies on comparative cognition have explored the conditions under which certain complex linguistic patterns can be learned by different species. Fitch and Hauser (2004) demonstrated that humans can learn both relatively simple grammars defined by adjacent relations between their elements (what the authors called finite-state grammars) and more complex grammars defined by nonadjacent, center-embedded relations between their elements (what the authors called phrase-structure grammars). In their study, Fitch and Hauser provided evidence that, in contrast with humans, cotton-top tamarin monkeys could learn only the simpler finite-state grammar. This finding suggested a difference across cognitive systems in their ability to grasp grammars of increasing complexity. However, the picture that emerged from subsequent studies was more puzzling. With proper training, and using biologically relevant stimuli (their own vocalizations), European starlings could learn both finite-state and phrase-structure grammars (Gentner et al., 2006). Similar findings were reported for Bengalese finches (Abe & Watanabe, 2011) using their spontaneous responses to their own species’ vocalizations, and for keas (Stobbe et al., 2012) and crows (Liao et al., 2022) in the visual domain. Comparative work has thus shown possible computing differences across cognitive systems. Importantly, it has also demonstrated the importance of taking into account the conditions that allow such different systems to extract relevant information from the signal.
Even when a learner can grasp a target regularity, comparative cognition work has highlighted differences during the learning process across cognitive systems. Jiang and collaborators (2018) showed that macaque monkeys can learn what they termed as supra-regular grammars (complex sequences characterized by embedded elements). In their experiment, the monkeys were presented with a touch screen displaying six dots arranged in the shape of a hexagon. Three of the dots were highlighted in a sequence, and the animals were trained to touch the dots either repeating the sequence (e.g., for the sequence of dots 2-4-5, touch the dots 2-4-5) or mirroring it (e.g., for the sequence of dots 2-4-5, touch the dots 5-4-2). In this way, the animals would learn to reproduce these sequences.
Once the macaques learned the task, they generalized it to novel spatial arrangements they had never seen before (e.g., instead of a hexagon, the monkeys were presented during the test with a pyramid or a horizontal line). This result provided evidence that the animals could master complex embedding sequences that were thought to be out of the reach for nonhuman animals. The experimenters ran the same experiment with 5-year-old human infants. Just like the monkeys, the infants learned to reproduce the sequences and generalized them to novel spatial arrangements. However, the monkeys needed about 10,000 training trials to reach their best performance, whereas the human infants needed about five demonstrations to do it. This fact suggested that, although both macaque monkeys and young human infants could get to master supra-regular sequences, the learning paths they used to reach the target performance had striking differences.
Similarly, Ferrigno et al. (2020) trained U.S. adults and children, adults from an Amazonian tribe (the Tsimane), and macaque monkeys to generate center-embedding recursive grammars. As in the Jiang et al. (2018) study, the participants were presented with a touch screen and asked to reproduce sequences of figures. Both the U.S. adults and children, together with the Tsimane adults, not only learned to reproduce the sequences but also generalized them to novel settings. In contrast, the authors observed no indication that the monkeys could master such a task unless given much more training.
Such differences can also be observed across other species and tasks. For example, both humans and rodents are efficient at tracking statistical regularities in auditory sequences as to discriminate frequent items from infrequent items. However, there is no evidence that additional training might lead the rodents to use such differences to group the elements in the sequence as human infants do (Toro et al., 2016). Thus, comparative work has shown how distinct cognitive systems might learn to extract increasingly complex regularities from the signal. But this work also highlights differences across such systems. These differences lead to substantial contrasts in the effort that different cognitive systems need to reach the desired performance and how these systems might use the information that is extracted from the signal.
Guiding Learning
Current large language models (LLMs; AI systems that are trained with linguistic input) have become increasingly proficient at processing and producing language. To do so, these systems are trained with large amounts of text and learn to predict the most likely text combinations in a given situation (i.e., the most likely response given to a prompt). The result is chatbots that respond in a mostly coherent manner to questions posed by the user. This way of learning clearly contrasts with the way human infants acquire language (e.g., Chomsky et al., 2023; Marcus, 2018b). From a very early age, human listeners track statistical regularities in speech (so they learn about which combinations of sounds might be more likely to occur in the signal; Saffran et al., 1996). Among other remarkable feats, they narrow down their discrimination abilities to focus on sounds that are relevant for their mother language (Werker & Tees, 1984), tune to the particular rhythm in which this language is produced (Nazzi et al., 1998), map acoustic changes to syntactic categories (Gervain & Werker, 2003), and rapidly discover the use of abstract variables that are at the core of grammar (Marcus et al., 1999). In general, infants are highly efficient at discovering the hierarchical organization of language (what has been called dendrophilia; Fitch, 2018). Thus, although LLMs have reached impressive levels of proficiency at responding to prompts from users, the path taken to reach such performance is still very different from that taken by human infants.
Can LLMs eventually be programmed to acquire language in a way that resembles how human infants acquire it? The past 20 years of comparative work have highlighted that any cognitive system trying to grasp natural language is faced with the problem of learning complex hierarchical patterns that are present in the linguistic signal. Even more, work with other species suggests that a key issue for learning such complex patterns is that of guiding the learning process toward relevant information. For example, in human infants, a division of labor between consonants and vowels facilitates word learning by giving more weight to consonants (the main carriers of lexical information) during the recognition of words (Nespor et al., 2003). Lacking such division of labor between consonants and vowels, nonhuman animals, including rats (Bouchon & Toro, 2019) and dogs (Mallikarjun et al., 2021), focus on the more salient, but less informative, vocalic segments of the speech signal. Thus, although nonhuman animals can track acoustic information as to recognize familiar phoneme sequences, they do not seem to benefit from biases that help human infants to rapidly learn and recognize words.
In contrast to young human learners, traditional deep learning algorithms used in current artificial intelligence do not include built-in biologically inspired biases (e.g., Marcus, 2018a). They rely on an approach focused on tracking the recurrence patterns among the elements in the ever-growing training corpuses. Besides being highly energy-consuming, unbiased learning might lead to the discovery of patterns that are not found in natural languages. Mitchell and Bowers (2020) found that LLMs can learn number agreement between nouns and verbs. However, the models also try to produce agreements among ungrammatical sentences that violate rules observed in natural languages, which suggests that the models have not learned the hierarchical syntactic dependencies that characterize human language. Rather, the models are reproducing specific repetitions of word pairs independently of linguistic structures.
To build LLMs that are more efficient at learning the grammar defining a language and that are able to correctly process hierarchical relations might thus involve the implementation of biologically relevant constraints (Marcus, 2018b). The data provided by work on comparative cognition on what is shared and what is unique during language learning might thus be crucial to identifing the biases that lead to successful learning of complex structures (e.g., Toro, 2016). Identification of such biases is not only important in the domain of language learning but also a necessary step for the development of AIs that interact with complex environments in general. As Versace et al. (2018) argued, biases found in precocial animals, such as chicks, might prove to be fundamental for artificial systems to properly respond in changing environments. Take the newborn chicks’ preference for biological motion. This early preference predisposes them to orient toward animated objects and facilitate filial imprinting, allowing for highly adaptive rapid learning. Such learning contrasts, again, with the massive training that current AI systems need to properly respond in a target task. Thus, taking into account how different species learn from their environment informs the type of biases that might be needed to build into a given system so it can efficiently learn from its environment.
Combining our current knowledge about natural intelligences and AIs is important at different levels. First, it might help in the development of more efficient learning algorithms. More importantly, this combination might fuel advances a new science of intelligence that takes into account human, animal, and artificial systems. Reflecting this importance, several universities in Europe and the United States are creating novel programs that combine cognitive sciences and AI. The study of comparative cognition has a central role to play in these new programs and in technology efforts to advance learning algorithms. Lessons learned about how different species tackle the complex task of discovering how their world is organized are poised to instruct the creation of AI systems that might start to resemble what we observe in nature.
References
Abe, K., & Watanabe, D. (2011). Songbirds possess the spontaneous ability to discriminate syntactic rules. Nature Neuroscience, 14, 1067–1074. https://doi.org/10.1038/nn.2869
Bouchon, C., & Toro, J. M. (2019). The origin of the consonant/vowel asymmetry in lexical processing: Long-Evan rats encode vowels better than consonant in words. Animal Cognition, 22, 839–850. https://doi.org/10.1007/s10071-019-01280-3
Chomsky, N., Roberts, I., & Watumull, J. (2023, March 8). The false promise of ChatGPT. The New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html
Ferrigno, S., Cheyette, S., Piantadosi, S., & Cantlon, J. (2020). Recursive sequence generation in monkeys, children, U.S. adults, and native Amazonians. Science Advances, 6, Article eaaz1002. https://doi.org/10.1126/sciadv.aaz1002
Fitch, W. (2018). What animals can teach us about human language: The phonological continuity hypothesis. Current Opinion in Behavioral Sciences, 21, 68–75. https://doi.org/10.1016/j.cobeha.2018.01.014
Fitch, T., & Hauser, M. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science, 303, 377–380. https://doi.org/10.1126/science.1089401
Gentner, T., Fenn, K., Margoliash, D., & Nusbaum, H. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440, 1204–1207. https://doi.org/10.1038/nature04675
Gervain, J. & Werker, J. (2013). Prosody cues word order in 7-month-old bilingual infants. Nature Communications, 4, Article 1490. https://doi.org/10.1038/ncomms2430
Hauser, M., Chomsky, N. & Fitch, T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569–1579. https://doi.org/10.1126/science.298.5598.1569
Jiang, X., Long, T., Cao, W., Li, J., Dehaene, S., & Wang, L. (2018). Production of supra-regular spatial sequences by macaque monkeys. Current Biology, 28, 1851–1859. https://doi.org/10.1016/j.cub.2018.04.047
Liao, D., Brecht, K., Johnston, M., & Nieder, A. (2022). Recursive sequence generation in crows. Science Advances, 8, Article eabq3356. https://doi.org/10.1126/sciadv.abq3356
Mallikarjun, A., Shroads, E. & Newman, R. (2021). The role of linguistic experience in the development of the consonant bias. Animal Cognition, 24, 419–431. https://doi.org/10.1007/s10071-020-01436-6
Marcus, G. (2018a). Deep learning: A critical appraisal. ArXiv. https://doi.org/10.48550/arXiv.1801.00631.
Marcus, G. (2018b). Innateness, alphazero, and artificial intelligence. ArXiv. https://doi.org/10.48550/arXiv.1801.05667.
Marcus, G., Vijayan, S., Bandi Rao, S., & Vishton, P. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80. https://doi.org/10.1126/science.283.5398.77
Mitchell, J., & Bowers, J. (2020, December). Priorless recurrent networks learn curiously. Proceedings of the 28th International Conference on Computational Linguistics, 5147–5158. https://doi.org/10.18653/v1/2020.coling-main.451
Nazzi, T., Bertonici, J., & Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm. Journal of Experimental Psychology: Human Perception and Performance, 24, 756–766. https://doi.org/10.1037/0096-1523.24.3.756
Nespor, M., Peña, M., & Mehler, J. (2003). On the different roles of vowels and consonants in speech processing and language acquisition. Lingue e linguaggio, 2, 203–230.
Toro, J. M. (2016). Something old, something new: Combining mechanisms during language acquisition. Current Directions in Psychological Science, 25, 130–134. https://doi.org/10.1177/0963721416629645
Toro, J. M., Nespor, M., & Gervain, J. (2016). Frequency-based organization of speech sequences in a nonhuman animal. Cognition, 146, 1–7. https://doi.org/10.1016/j.cognition.2015.09.006
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. https://doi.org/10.1126/science.274.5294.1926
Stobbe, N., Westphal-Fitch, G., Aust, U., & Fitch, T. (2012). Visual artificial grammar learning: Comparative research on humans, kea (Nestor notabilis) and pigeons (Columba livia). Philosophical Transactions of the Royal Society B, 367, 1995–2006. https://doi.org/10.1098/rstb.2012.0096
Versace, E., Martinho-Truswell, A., Kacelnik, A., & Vallortigara, G. (2018). Priors in animal and artificial intelligence: Where does learning begin? Trends in Cognitive Sciences, 22, 963–965. https://doi.org/10.1016/j.tics.2018.07.005
Werker, J., & Tees, R. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49–63. https://doi.org/10.1016/S0163-6383(84)80022-3