LLM HALLUCINATIONS AND THE NATURE OF MACHINE UNDERSTANDING
We’re all hallucinating all of the time; when we agree about our hallucinations, that’s what we call reality. -Anil Seth
Large Language Models (LLMs) are revolutionizing AI and entering our lives, whether chatting online or being baked into our phones. They can generate human-like text, answer questions, create novel stories, and even write original Python code. After being trained on huge amounts of text, they appear to produce coherent and relevant outputs across a range of topics. However, as impressive as they are, they sometimes produce outputs that are inaccurate or inconsistent with known facts – often called hallucinations, confabulations, or even bullshit. These errors can have significant consequences like citing fake law precedent or making up terrible, false information about people.
While the terms might suggest intentional deception, it’s important to stress that these errors are not deliberate fabrications. LLMs are not lying because there is no intention to deceive. I think their errors are best viewed as the model’s genuine attempt to provide a response based on its understanding of reality it gained through its training. Just as humans can hold beliefs that don’t align with the objective truth, LLMs can generate responses that reflect inconsistencies or inaccuracies.
I think AI should stand for ‘Alien Intelligence” - it’s not like ours, it’s unlike anything we’ve seen before, but it’s still a kind of intelligence. -Greg Robison, PhD
UNDERSTANDING LLM HALLUCINATIONS
When talking about LLMs, “hallucinations” refer to instances where the model generates information that is incorrect, nonsensical, or inconsistent with known facts. These outputs may seem plausible and coherent at first glance, but upon closer inspection, they reveal inaccuracies or fabrications. As a fan of human hallucinations, those of AI are similar in that the prediction does not match real-world information. It’s a big challenge in AI development, as it can lead to the spread of misinformation and erodes trust in all of the model’s outputs. Many people who disparage LLMs rightfully point to the frequency of inaccuracies, especially on controversial topics. It’s a big limitation to trust and broad deployment of LLMs.
LLMs can hallucinate based on the complex nature of their architecture, training and operation. These models are trained to predict the most likely next words based on training of large text dataset, learning patterns and relationships between words and concepts. When asked to generate text or answer questions, LLMs don’t access a static database of facts – instead, they produce outputs based on the patterns they’ve learned, essentially making educated guesses about what should come next in a sequence of text. Sometimes this process leads to the model connecting ideas in a way that doesn’t align with reality. For example, training datasets may have conflicting information about a controversial topic like climate change (despite clear facts, the online discourse of these facts has not been so one sided). This situation is also likely if the topics are underrepresented or ambiguously presented in its training data.
As you can see, training data is important in shaping an LLM’s outputs and thus the potential for hallucinations. The quality, diversity, and recency of the training data impacts the model’s understanding of the world. If the training data contains biases, inaccuracies, or outdated information, these will be reflected in the model’s outputs. Some training datasets may be more focused on fluency and coherence over factual accuracy, then the model’s outputs will be similarly affected. And larger, more complex models seem to “absorb” or retain more factual information from their datasets. But the model is not trying to deceive or bullshit us, it’s a product of its training dataset and complexity. But aren’t we all?
COMPARISON TO HUMAN COGNITIVE PROCESSES
Children's Theory of Mind
Here’s one of my favorite examples of people having different subjective realities of an objective situation: theory of mind. Theory of mind is a cognitive ability that develops in humans during childhood, allowing us to understand that others have beliefs, desires and intentions that may be different from their own. This skill enables us to better predict and interpret others’ behavior, emotions, and thoughts. As children grow, their understanding of the mind becomes more sophisticated, progressing through various stages of understanding. Initially, young children struggle to understand that others may have different desires or knowledge than they do, often leading to errors in prediction people’s actions. For example, if we tell children a story that Molly left her socks on her bed and left the house to go to school. While she was out, her dog picked up the socks and took them outside to play. Where will Molly look for her socks when she returns home? Young children have trouble separating what they know from what others know and think Molly will look outside for her socks where the dog left them. However, around 5 or 6, they will correctly reason that Molly will look on her bed where she left them, where Molly’s knowledge differs from reality. This early, egotistical view of mental states is like an alien intelligence compared to adults – not more or less, just different.
I think the development of theory of mind in children has some interesting parallels to how LLMs form representations of the world from their training data. Just as children at different developmental stages may make varying predictions about where to look for an object based on their understanding of the world, LLMs can product different outputs depending on the patterns and information present in their training data. In both cases, they are not being intentionally deceptive; instead, they are predicting based on their current model of how the world works.
LLMs operate on a similar level. When they generate outputs that appear plausible, but if factually incorrect, it’s not deliberately lying or being deceptive. They are just producing outputs based on the patterns and associations learned from its training data, which may sometimes lead to inaccuracies and inconsistencies. Just as a child’s developing theory of mind may cause them to make incorrect predictions about others’ behaviors, and LLM’s internal representations may sometimes result in outputs that don’t align with objective reality. This parallel helps us view LLM hallucinations not as intentional misinformation, but as a reflection of the model’s current “understanding” of the world.
After all, our human consciousness is a hallucination of our sensory inputs filtered via attention. -Greg Robison, PhD
Differing Opinions about the 2020 U.S. Election
Just like children of different ages can hold different understandings of reality, adults are susceptible to having their beliefs about the world shaped by their input data. Americans have two diametrically opposed views on the same objective reality about the winner of the 2020 US election (I really wish this example didn’t exist…). After the election, two narratives emerged, one accepting the official results that declared Joe Biden the winner (the objectively correct reality) and another claiming fraud and that Donal Trump was the real winner (objectively false with no supporting evidence). The second narrative has persisted too long after the election, with a significant portion of the population continuing to believe unsubstantiated claims of a stolen election, despite multiple recounts, audits, and court rulings finding no evidence of widespread fraud (again, flying in the face of the objective reality).
The fact that millions still believe the false narrative can mostly be attributed to the information sources that individuals rely on. When we all have personalized news feeds and fragmented media, people often find themselves in “echo chambers” that reinforce their existing beliefs. Those who primarily consume conservative media like Fox News are likely exposed to more content questioning the election’s integrity, while those following real media outlets would encounter information supporting the official results. Social-media algorithms are designed to keep users engaged, so they amplify this effect by showing similar content that tends to produce outrage. As a result, people can maintain beliefs about the election that are fundamentally at odds with each other, not because they’re intentionally rejecting reality, but because their perception of reality has been shaped by their information environment.
The way humans can form these divergent beliefs based on their information sources looks a lot like how LLMs form their “understanding” of the world based on their training data. Just as a person exposed to one narrative about the election might firmly believe in that version of events, an LLMs trained on a dataset with certain biases, inaccuracies, or conflicting data will reflect these biases in its outputs. For instance, if an LLM’s training data included a disproportionate amount of content questioning election integrity, it might generate responses that lean towards that perspective, even if it’s not reflective of the broader, factual consensus. The quality and diversity of information, whether consumed by humans or used to train AI models, plays in shaping perceptions and beliefs about reality. It also highlights the importance of critical thinking and trustworthy data sources for both humans and AI.
THE NATURE OF REALITY REPRESENTATION
Our human reality emerges through a complex interplay of sensory experiences, cognitive processes, and social interactions. From infancy, we begin to form mental models of the world around us, constantly updating and refining these models as we encounter new information and experiences. Our brains excel at pattern recognition, allowing us to make sense of our environment and predict future events based on past experiences. However, this process is inherently subjective and prone to biases. Our personal experiences, cultural background, education, and emotional states all influence how we perceive and interpret information. This subjectivity means that individual understandings of reality can vary significantly, even when presented with the same set of facts.
While humans draw from personal experiences and sensory inputs to understand and predict language, LLMs rely solely on the text data they are trained on, without any real-world context. LLMs construct their “understanding” of reality through a process of statistical pattern recognition across text data. Unlike us, LLMs don’t have sensory experiences or consciousness. Instead, they learn to predict patterns in language, which indirectly encodes information about the world. The most efficient way to predict text is to understand syntax and represent semantic concepts and neural networks tend to find the easiest, working way to achieve its goal. To put simply, the best way to predict language is to understand language. However, this understanding is not the same as ours.
LLMs don’t have beliefs or intentions - they produce outputs based on patterns. For example, if an LLM encounters the phrase “Knock knock,” it uses statistical patterns from its training data to predict that “Who’s there?” is likely the next couple of words, even though it doesn’t understand what a knock is, feels like, or sounds like. This process of statistical prediction, though lacking human-like consciousness, enables LLMs to form a type of representational understanding. Their “knowledge” is highly complex pattern matching (but again that’s kinda what we do too), which can lead to impressively accurate responses, but also factually incorrect outputs when the patterns in the data don’t accurately reflect reality. They form representations of complex ideas that can be manipulated mathematically for more accurate predictions – the very thing they are rewarded for. While traditional autocomplete can give reasonably accurate predictions, it is less likely that any representational understanding emerges in the same way it can in massively complex LLMs.
Hinton's quote suggests that predicting the next word forces the model to develop as intricate of an understanding of language patterns as it can. This predictive mechanism is the core of how LLMs generate seemingly intelligent responses.
The quality and diversity of information are crucial in shaping both human and AI understanding of reality – garbage in, garbage out. For humans, exposure to a wide range of perspectives and high-quality, factual information can lead to a more nuanced and accurate worldview. Similarly, the breadth, depth, and accuracy of the training data significantly impacts an LLM’s ability to generate reliable and contextually appropriate responses. In both cases, limited or biased information sources can result in skewed perceptions or outputs. There is a key difference, however – humans have the capacity for critical thinking (although not used enough) and can actively seek out diverse sources, while LLMs are entirely dependent on their training data. We’ll need to rely on high-quality training data to help reduce misinformation and biases.
IMPLICATIONS AND CONSIDERATIONS
All of this exposition is not to say that we have to excuse LLM hallucinations - it helps us understand why they are giving us outputs that are wrong and what we can do to help mitigate the errors. Grounding an LLM with a document or website via Retrieval Augmented Generation (RAG) can reduce the number or severity of hallucinations, especially for lesser-known topics. We can include a textbook or an academic article to provide an objective truth for the discussion – it’s like agreeing on what information can be considered factual evidence at a trial. However, even with RAG, there is still the question of transparency and accountability. These neural networks are incredibly complex black boxes that have trouble explaining their decision-making process. That’s why it feels like they are supremely confident children even when they’re obviously wrong.
Recognizing how LLMs represent language and reality also underscores the importance of diverse and high-quality training data. Diverse data helps ensure that the model’s outputs are representative of a wide range of perspectives and experiences, reducing the risk of bias and improving the models’ ability to generate appropriate responses across many contexts. High-quality data, free from errors and misinformation, is necessary for training models that produce accurate and reliable outputs. However, it is easier said than done. We need careful curation of training datasets, which is both time-consuming and expensive. “High quality” is also a subjective, moving target.
In addition to RAG, several other strategies can help reduce hallucinations. One is to implement fact-checking mechanisms that cross-reference the model’s outputs against reliable external sources (imagine an LLM having access to the entirety of Wikipedia for every response). Another is to fine-tune models on curated, factual datasets to improve their accuracy on a specific topic, like providing psychology of logic textbooks to provide a solid base of information. Using reinforcement learning to reward models for generating factual information and penalizing for inaccuracies is showing some promise as well. I’m excited for ways to quantify uncertainty, allowing models to express various levels of confidence in its outputs, which can help users gauge the reliability of the information. While no single strategy is likely to eliminate hallucinations, reducing them and understanding their confidence level for trusting their outputs.
CONCLUSION
Parallels between LLMs and human cognitive processes shows how LLM outputs, like a child’s developing understanding of mental states or an adult’s belief about a contentious election, are not attempts at deception but rather reflections of their “understanding” based on available information. We both construct representations of reality, highlighting the role that quality and diversity of information play in shaping these understandings. AI hallucinations, while problematic, are not entirely different than human cognitive processes. We need to recognize that AI can produce convincing but inaccurate information so we can approach AI-generated content with appropriate skepticism and encourages the development of critical thinking skills. We also need transparency in AI systems, allowing us to understand the reliability and potential biases of the information they receive. Innovation in training techniques, data curation, and model architectures may lead to AI with more accurate and nuanced understandings of the world. We may see systems that are better at distinguishing fact and speculation or that can communicate their certainty (I wish humans did…). AI systems will truly complement and enhance human understanding.