Collection

More Than Words: The Cognitive Chasm Between Humans and Large Language Models

Background

Large language models (LLMs) are a type of language model capable of performing a range of Natural Language Processing (NLP) tasks. These models have surpassed the challenges that traditional NLP models, like recurrent neural networks (RNNs) and convolutional neural networks (CNNs), used to face, including “capturing long-range dependencies between words in a sentence” (Hochreiter & Schmidhuber, 1997). These limitations have hampered the effectiveness of such models in multiple tasks, including machine translation, summarization, and question-answering. The Transformer model, introduced by Vaswani et al (2017), is based on self-attention mechanism, and it has been highly successful in addressing various shortcomings of previous models. Today, a significant number of large language models are in use, such as GPT3 and GPT4, LaMDA, BERT, PaLM, LLaMA, and Claude. These models use massive data sets and thus perform incredibly well. For instance, Generative Pre-trained Transformer 2 (GPT-2) was trained on a neural network with 1.5 billion parameters (Radford et al., 2019) and leverages self-attention mechanisms to extract information from various positions in the input sequence. With the advent of new generation GPTs such as ChatGPT, they undergo training beyond a mere abundance of data, encompassing “human reinforcement learning” and “human supervised learning”.

Applications of AI-generated natural language have been available for some time, aiding humans with tasks like auto-correct in emails and completing sentences. With the latest LLMs, these models can produce various texts autonomously in areas such as digital journalism, stock market analysis, and sports reporting. Programs like ChatGPT can generate human-like texts and provide answers to questions, while GPT3 can even write poetry. Now it was once thought that artificial intelligence could not handle creative writing, as it requires knowledge and understanding to create language. The algorithms in these models, which generate language similar to that of humans, have generated interest from both the scientific community and the general public. Naturally, questions have arisen about the possibility of it eventually replacing humans. Questions about ethics and the possibility of machines becoming conscious are topics of research interest. Although Chalmers does not refer to these models as “sentient beings” (Chalmers 2022), one important question remains: can they think? This question is crucial as the ability to think is necessary for language use in real life.

On the other hand, some researchers believe that the triumph of recent LLMs suggests that language learning can occur solely by means of linguistic input. The previous notion that language and thought are interconnected, to the extent that a significant part of language learning and usage relies on it, is now being challenged. Latest research in neuroscience has shown that the human brain stores language and other cognitive abilities separately. This means that language learning can develop without relying on non-linguistic cognitive skills. This is why LLMs have been so successful. LLMs are trained on vast language datasets, and can therefore use language with high accuracy, even without previous real-world experience. The only problem is that people with limited language skills may become highly skilled in formal language with training but may struggle to achieve the same level of skill in functional language (Mahowald et al, 2023). This is because the human brain has separate neural networks responsible for language and other cognitive abilities, such as executive function. These networks are known as the language network and the multiple demand network. In everyday life, people blend language skills into their general cognitive abilities, combining formal and practical proficiency.

Despite their success in many language-related areas, LLMs appear to fall short compared to humans in certain domains. Mahowald et al (2023) identify several such areas, including formal and social reasoning, world knowledge and situation modelling. Although these aspects are not linguistic abilities per se, they are crucial components of any significant human conversation, and this is where LLMs seem to be deficient. Therefore, we can deduce that LLMs are not yet capable of having a cognitive mechanism similar to humans.

Research areas under focus

Several research areas related to LLMs are currently underway. These include communication between AI and humans with shared ToM (Wang & Goel, 2022), reaction of humans towards AI-generated creative and factual texts (Kobis & Mossink, 2021), AI as advice/order giver (Margarita Leib et al, 2021; Lanz, Briker & Gerpott, 2022), developing socially intelligent embodied agents in complex environments (Social NeuroAI) (Bolotta & Dumas, 2022), comparing GPT with human agents on various tasks like annotating (Tornberg 2023), among many other interesting areas. A prevalent theme across many areas of research is the direct comparison of GPT’s performance to human performance on identical tasks, with findings in some cases demonstrating the superiority of large language models over humans.

Research has examined human reactions to texts generated by algorithms and identified algorithmic transparency as an important factor that predicts behaviour. Transparency means whether individuals are aware that a text was generated by AI. Studies indicate a general dislike of algorithmic decision-makers (Dietvorst, Simmons, & Massey, 2015; Burton, Stein, & Jensen, 2019). Likewise, people have expressed a preference for standardized informative texts over AI-generated poetry (Kobis & Mossink, 2021).

LLMs and the Turing test: Considering the information given, it would be intriguing to revive the Turing Test with regards to LLMs. More exactly, we could address the upcoming challenge of distinguishing between human and LLM-produced language. In particular, are there separate activation patterns linked to the two types of language output? Additionally, how does transparency interact with the brain’s potential for processing AI generated language? Do humans identify various types of texts differently depending on whether they were generated by humans or AI systems? Do people perceive a lack of AGI in LLM-generated output? These questions are relevant and fascinating, particularly since AI-generated language is becoming more advanced. Answering these questions could help us predict whether machines can outsmart humans.

LLMs and Embodied Cognition: The idea of Embodied Cognition (EC) questions the notion that thinking can be reduced to sophisticated semantic representation, like abstract reasoning or language-based constructions. If this view is correct, it raises doubts about comparing LLMs and intelligent cognitive systems. Within the context of EC, even if people with limited language skills were to perfectly copy human language to the point of being almost the same, they would still not fully replicate human thinking. This leads to queries such as: What are the particular features of human cognition that differentiate it from those of LLMs? Can LLMs advance by embracing embodied elements? What impact does the absence of a physical experience have on approaches to resolving issues? What do the EC’s views mean for educating and training LLMs? Investigating these questions will give us an understanding of the constraints and possible progress of LLMs, in the wider context of EC theory.

We propose a special issue on the human vs AI language neuroscience, which compares how the brain comprehends human-generated language in contrast to AI-generated language. Methodologically, this special issue seeks to align with the proposed approach of “artificial cognition” (Taylor & Taylor, 2021), in which LLMs are considered as “participants” or tools of comparison or analysis, while conceptually it aims to understand the connection between human and artificial cognition (Siemens et al., 2022). We invite empirical contributions that analyse various linguistic levels including phonemes, morphemes, lexemes, syntax, and discourse context across multiple languages; i.e., studies that include participants from different linguistic backgrounds to increase the generalisability of the results. We encourage a variety of approaches, including computational modelling, statistical learning, behavioural experiments, and various neuroimaging techniques. We encourage submissions from interdisciplinary research teams that combine the expertise of cognitive neuroscientists, (experimental) philosophers, linguists, computer and data scientists to ensure theoretically cross-cutting studies. Finally, case or small-sample studies involving participants with rare and non-rare brain disorders are strongly encouraged.

References

Bolotta Samuele, Dumas Guillaume (2022). Social Neuro AI: Social Interaction as the “Dark Matter” of AI. Frontiers in Computer Science. DOI=10.3389/fcomp.2022.846440.

Burton, J. W., Stein, M. K., & Jensen, T. B. (2019). A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making, 33(2), 220-239. https://doi.org/10.1002/bdm.2155.

Chalmers, David (2022). Are large language models sentient?

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Kobis, Neils & Mossink Luca D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behaviour. 114. 106553.

Lanz, L., Briker, R. & Gerpott, F.H. (2023). Employees Adhere More to Unethical Instructions from Human Than AI Supervisors: Complementing Experimental Evidence with Machine Learning. J Bus Ethics (2023). https://doi.org/10.1007/s10551-023-05393-1.

Leib Margarita, Nils C. Köbis, Rainer Michael Rilke, Marloes Hagens, Bernd Irlenbusch.(2021).

Mahowald, Kyle & Ivanova, Anna & Blank, Idan & Kanwisher, Nancy & Tenenbaum, Joshua & Fedorenko, Evelina. (2023). Dissociating language and thought in large language models: a cognitive perspective. Preprint.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 1–24. Rahwan, I. (2018). Society-in-the-loop: Programming the algorithmic social contract. Ethics and Information Technology, 20(1), 5–14. DOI: 10.1007/s10676-017-9430-8.

Siemens, G., Marmolejo-Ramos, F., Gabriel, F., Medeiros, K., Marrone, R., Joksimovic, S., & de Laat, M. (2022). Human and artificial cognition. Computers and Education: Artificial Intelligence, 3 (100107). DOI: https://doi.org/10.1016/j.caeai.2022.100107.

Taylor, J.E.T., Taylor. G.W. (2021). Artificial cognition: How experimental psychology can help generate explainable artificial intelligence. Psychonomic Bulletin & Review, 28 (2) (2021), pp. 454-475.

The corruptive force of AI-generated advice. https://arxiv.org/abs/2102.07536.

Törnberg, Petter (2023). ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning. https://arxiv.org/abs/2304.06588.

Wang, Qiaosi & Goel. Ashok K (2021). Mutual Theory of Mind for Human-AI Communication. Presented to the IJCAI-2022 Workshop on Communications in Human-AI Interaction (CHAI), Vienna, Austria, July 2022.

Editors

Articles

Articles will be displayed here once they are published.