THE QUEST FOR SYSTEM 2 THINKING IN ARTIFICIAL INTELLIGENCE
System 2 allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice, and concentration. -- Daniel Kahneman, Nobel Prize Winner and author of “Thinking, Fast and Slow”
In his book “Thinking, Fast and Slow”, Daniel Kahneman proposes a dual-process theory of human thought that is divided into two distinct systems: System 1 and System 2. System 1 is characterized by fast, intuitive, and automatic thinking, operating with little or no effort and under no voluntary control. It’s responsible for our snap judgements, gut feelings, and unconscious biases like stereotypes. System 2, on the other hand, represents slow, deliberate, and effortful mental operations. This system is engaged when we perform complex calculations, plan into the future, exercise self-control, or engage in logical reasoning. Kahneman’s theory has been hugely influential in understanding human decision-making, the mistakes we make, cognitive biases, and the constant interplay between intuition and reason in our everyday lives.
The concepts of System 1 and System 2 thinking are again relevant in the field of artificial intelligence, particularly in the context of Large Language Models (LLMs). As these AI systems become more sophisticated, researchers and developers are finding similarities between human and machine cognition. LLMs, with their ability to generate human-like text rapidly and fluently, look a lot like System 1 thinking. However, the challenge of implementing true System-2-like capabilities – deliberate, analytical reasoning – remains a huge hurdle. Although researchers at Meta and OpenAI may be cracking the code on mimicking System 2 thinking, understanding the parallels and differences between human cognitive systems and AI processing can help us understand the current capabilities and limitations of LLMs as well as guide us towards real reasoning abilities and perhaps Artificial General Intelligence (AGI).
Kahneman's System 1 and System 2 Thinking
Kahneman’s System 1 thinking is fast and automatic. This cognitive process operates continually, requiring minimal effort or conscious input. It is responsible for tasks we perform almost intuitively, such as recognizing faces, understanding simple sentences, or driving a car on an empty, well-known road. System 1 relies mainly on heuristics – mental shortcuts that allow for quick decision-making based on our past experiences and patterns. An unfortunate example of a heuristic is stereotypes, where we quickly make assumptions about people based on their appearance. It’s a very efficient system and may be accurate enough for routine tasks, but it is also prone to biases that can lead to errors when faced with complex or unfamiliar situations, just like stereotypes.
System 2 thinking is slow, deliberate and demands significantly more cognitive effort. This system comes into play when we encounter new problems, make complex decisions, or analytically reason through something, so activities like solving a tough math problem or learning a new skill engage System 2. This type of thinking allows for more nuanced and logical processing of information and can often override the quick judgements of System 1 when appropriate. However, because it requires more mental effort, System 2 thinking is also more fatiguing and can’t be used for long periods. But it sure is handy to have an alternate, analytical process that checks our heuristic responses or plans a detailed, long-term strategy.
It’s the interaction between System 1 and System 2 that makes us smart, but fallible. While System 1 operates continually, System 2 remains on standby, stepping in when needed. In many situations, System 1 provides quick answers or solutions that System 2 then endorses with little or no modification. Other times, when System 1 encounters difficulty, it calls upon System 2 to provide more detailed and systematic processing. This interplay enables us to navigate our daily lives efficiently while still having the capacity for deep, analytical thought when needed. It allows us to effortlessly walk down the street dodging people while talking on the phone but also land on the moon. Kahneman showed that this interaction is the root of many of our cognitive biases but can also improve decision-making processes and adaptability.
LLMs and System 1 Thinking
If you’ve spent much time with LLMs like ChatGPT or Anthropic’s Claude, you will recognize some similarities with System 1 thinking. The clearest is the speed of fluent text generation – LLMs can produce human-like text almost instantaneously, mirroring the rapid, automatic responses associated with System 1 thinking. This speed allows LLMs to engage in real-time conversations, generate quick summaries, and provide immediate answers to queries, much like how humans can quickly respond to familiar situations without conscious deliberation. You can see something similar with children who use mostly System 1 thinking (the prefrontal cortex is crucial for System 2 thinking and one of the last brain areas to develop).
The pattern recognition capabilities of LLMs are also like our System 1 thinking. These models are trained on text data, allowing them to recognize and replicate complex linguistic patterns, idioms, and translate between languages. This ability is like System 1’s capacity for rapid pattern matching of syntactic and semantic relationships. Just as humans can instantly recognize face or understand the emotional tone of a conversation, LLMs can quickly identify and generate appropriate language patterns.
As we mentioned earlier, humans rely on heuristic-based responses in System 1 cognition, and LLMs similarly look for correlations and patterns in their training data, like our rules of thumb or other mental shortcuts. These heuristics allow for quick and often accurate responses in familiar scenarios but can lead to errors or biases in new situations. Heuristics like knowing how to properly greet today’s teenagers in everyday situations might not be applicable to other situations, like adults at a funeral. Using something analogous to our heuristics, LLMs can generate coherent sentences and paragraphs. However, because they operate like System 1 thinking, they are susceptible to similar limitations and pitfalls. Like human intuition, LLMs can sometimes produce confident sounding but incorrect or biased responses, especially for lesser-known topics. They may struggle with tasks that require maintaining long-term strategy or consistently applying factual rules (the number of times ChatGPT contradicts itself with 100% confidence that both positions are tenable is uncomfortable) and the reliance on System-1-like thinking can lead to errors (often called hallucinations).
The Challenge of System 2 Thinking for LLMs
So it’s the hard part, System 2, that AI currently struggles with – it’s also the part that makes us most distinctly human (for now). LLMs, while great at System-1-like tasks, often struggle with the kind of slow, deliberate, and analytical reasoning associated with System 2. You can clearly see this deficit in tasks requiring multi-step logical reasoning or applying abstract rules to novel situations. Asking an LLM to solve complex mathematical problems, analyze logical arguments, or provide step-by-step explanations for complex processes will often lead you astray. This deficit is a basic issue with current transformer-based neural networks, where effortful cognition is nearly impossible.
A key limitation of LLMs in analytical reasoning is their linear processing that prioritizes local coherence over global consistency (e.g., making sure the words nearby make a lot of sense and words further away are considered less). While they can generate text that appears logical and coherent in short segments, maintaining logical consistency across longer pieces of text or complex reasoning chain proves challenging. Perhaps with architectures that can “look back” and revise earlier parts of their output based on later reasoning will emerge. These new architectures might be our steppingstone to System 1 thinking.
To build truly intelligent systems, they’d need to understand the physical world, be able to reason, plan, remember and retrieve. The architecture of future systems that will be capable of doing this will be very different from current large language models. --Yann LeCun, Chief AI Scientist at Meta
Another issue that current LLMs demonstrate is a “snowballing of errors”, where because of their auto-regressive nature, each token is generated sequentially based on the preceding tokens, a small error early on can lead to a cascade of successive errors. It’s like making a small multiplication error early in the steps of a complex math problem can lead to completely wrong final answer. The auto-regressive nature of these models means they lack the ability to easily backtrack to correct mistakes, leading to output that might start reasonably, but become increasingly nonsensical as errors compound. The fundamental architectures of these models, optimized for rapid pattern matching and text generation, poses significant challenges for implementing the kind of slow, effortful, and self-aware cognition associated with System 1.
Potential Approaches to Enhance System 2-like Capabilities in LLMs
There have been several attempts to simulate or incorporate aspects of System 2 thinking into LLMs. Chain of Thought (CoT) prompting encourages the model to break down complex problems into smaller, manageable steps. By guiding the LLM through a structure reasoning process, this technique aims to mimic the deliberate, step-by-step approach of our System 2 thinking. Other strategies include few-shot learning, where models are provided with examples of the desired reasoning process, and the use of external memory or knowledge bases to supplement the models’ own representations like RAG.
Iterative and agentic approaches are promising approaches for enhancing System 2-like capabilities in LLMs. By forcing models to break down complex tasks into smaller, manageable steps and allowing the model to iterate and refine its responses through multiple passes. For example, an LLM might generate an initial response, then critique and improve that response in future iterations, ending with a final draft. This process mimics the deliberate, self-reflective nature of our System 2 thinking. Agentic approaches take this strategy further by treating the LLM as an agent that can set goals, plan actions, evaluate outcomes, and update its conclusions. By incorporating planning and self-evaluation mechanisms, these approaches should provide more structured and purposeful reasoning, potentially overcoming the limitations of single-pass, auto-regressive generation.
There are some other cool techniques that offer other paths towards enhancing analytical capabilities. Meta’s Distillation of the Rephrase and Respond (RaR) method makes System 2 reasoning more efficient, with a two-step process where the model first rephrases the original question and then responds to the rephrased question. Monte Carlo Tree Search (MCTS), for example, has shown promise in improving decision-making in some complex domains. By combining the pattern recognition and language generation capabilities of LLMs with the strategic planning abilities of MCTS, these systems might reason more effectively over multiple steps and consider many potential outcomes. Combining LLMs with symbolic AI systems might produce the strengths of different AI approaches to compensate for the limitations of pure language models for System 2 tasks.
There will likely be architectural innovations that better address System 2 thinking. Concepts like QuietSTAR and Q* (based on supposed insider information) suggest new directions in model design. It could be improved versions of transformer-like architectures, or it could be new types of attention mechanisms like Mamba or test-time training (TTT), Or maybe it’s a modular approach, where specialized components communicate and coordinate with each other – much like our own brains. More coherence over long-term reason and the ability to abstract general rules are key differentiators of System 2 thinking.
Implications and Future Directions
If we can truly develop deliberate AI systems capable of System 2 thinking, we could revolutionize many fields. In healthcare, System 2 thinkers could assist in complex diagnoses, analyzing patient data and medical literature to propose unique treatment plans with a level of reasoning closer to an experienced physician. In scientific research, AI could help generate and test novel hypotheses, potentially accelerating innovation across fields. In business and finance, nuanced and context-aware strategic decision-making can be a huge differentiator. Lawyers can benefit from AI systems capable of analyzing complex case law and constructing logical arguments that can be held in court. The difference is now we can apply real analytical reasoning to find insights and solutions more like smart humans.
If we can achieve this kind of reasoning in AI, we also need advances in transparency. As these models become better at analytical reasoning and complex decision-making, we need to figure out accountability, transparency, and the potential for unintended consequences. For example, if an AI system makes a critical decision based on complex reasoning that is not easily interpreted by humans, who is responsible? Is it the AI? Is it the developers? Users? We need ways to peer inside the black box and understand what information is being weighed, what options have been considered, and the detailed rationale for each logical step. Otherwise, we have no way of trusting the output – the benefit of System 2 thinking should be a clear and logical process.
Conclusion
Kahneman’s dual-process theory of human thought shows us how far we’ve come with today’s LLMs that they can mimic many aspects of System 1 thinking, while showing how far we still must go towards real System 2 thinking. LLMs can provide rapid, intuitive responses based on pattern recognition and heuristics - these models excel at tasks that require quick language generation, contextual understanding or broad knowledge. However, they still fall short in areas associated with System 2 thinking, such as sustained logical reasoning, complex problem solving, and analytical thinking. Thus, the pursuit of System 2 capabilities in AI will expand the practical applications of AI technologies and I think deepen our understanding of our own cognition. Whether it is through iterative/agentic approaches, novel architectures, or integration with other AI techniques, the field is looking to create more deliberate AI systems. Then we’ll be looking at all the potential of truly revolutionary AI.