Smart Play

Greg Robison

Nov 7, 20248 min read

THE HIDDEN LINK BETWEEN CHILDREN'S GAMES AND SELF-TEACHING AI

Play is the work of childhood. -- Jean Piaget

If you’ve spent any time with young children, you’ve seen a toddler in deep concentration, perhaps carefully placing one Lego brick on top another, learning through trial and error about balance, spatial relationships, and cause-and-effect. Now imagine an artificial intelligence (AI) system methodically evaluating its own responses, learning to distinguish good answers from bad ones without human guidance. These two disparate scenarios reflect a common theme – the power of self-directed learning. Recent developments in self-taught AI by Meta show some parallels with how human children naturally learn. Just as a child doesn’t need constant adult instruction to learn, perhaps AIs can learn in a similar way. When they can evaluate their capabilities and iteratively self-improve rather than needing human examples, their learning can take off. AI systems are starting to become more like playful children.

PODCAST

NOTE: We are continuing our experiment with an AI-generated podcast that summarizes this post by Google’s NotebookLM. Listen here and let us know what you think.

THE NATURAL LEARNING LABORATORY: HOW CHILDREN PLAY

When you’re watching kids play, you are watching one of the most efficient natural learning laboratories every designed. Kids don’t need formal instruction or structured guidance because they engage in trial-and-error learning, testing their hypotheses about the world, and approaching new challenges with curiosity and persistence. Whether they are testing the effects of gravity by constantly dropping their spoon, making a tower of Legos more stable, or learning the precise motor movements necessary to catch a ball, children will test approaches until they find what works. Each attempt, whether successful or not, adds to their growing dataset about how the world around them works.

Unlike formal education, young children generate their own feedback mechanisms. Children at play develop sophisticated internal systems for evaluating their success. I’ve watched kids test their Lego tower by nudging it or even throwing other Legos at it, getting immediate feedback that, like little scientists, is used to revise and update theories about the world. After dropping the spoon, goldfish crackers, and broccoli on the floor, the child is getting a bunch of data leading to better predictions about gravity’s effects on the world. Children don’t need someone to tell them whether they’ve succeeded or failed at these self-play tasks – they can judge the outcomes for themselves and adjust their strategies and implementations.

Children also seem to have the ability to create challenges that are at just the right level of difficulty (or a little beyond, thanks Vygotsky). They engage in what educators call “progressive complexity” starting with simple tasks and gradually increasing the difficulty as their skills improve. A child will start stacking Legos by getting that first one connected on top – they’re not trying to build The Millennium Falcon set right away. After success with basic structures, they will move on to more advanced architecture when they’re ready, like bridges or multi-part structures. This kind of self-directed challenge ensures that they’re almost always operating in a sweet spot where tasks are challenging enough to be engaging but not so difficult as to be discouraging. This natural calibration of difficulty enables children to learn about and begin to master their physical and social environments without an adult instructing them.

For children, failure plays a necessary role in providing actionable data. When the Legos don’t fit correctly, or their drawing doesn’t look quite right, they don’t view them as devastating setbacks, but interesting puzzles to solve. Each failure provides immediate, concrete feedback about what works and doesn’t, leading to rapid iteration and experimentation. Sometimes it takes one failure like falling backwards off a porch and other times it takes repeated failures - both are crucial to learning. For some children, it may take a long time to recognize that Lego structures with larger bases are more stable – but that’s a massive insight once attained. The ability to self-correct and learn from experience without intervention is what makes play such a powerful learning mechanism and why it serves as a model for developing more autonomous AI learning systems.

THE AI PARALLEL: SELF-TAUGHT EVALUATORS

Recent research by Meta’s FAIR group in self-taught AI evaluators are inspired by children. Just as children create various scenarios during play, these AI systems generate their own test cases to learn from – a process that is both efficient and scalable. The system starts by creating pairs of responses to various prompts or questions, deliberately making one response better than the other. Like a child trying different approaches with Legos, where some approaches work better than others. The key innovation is that the AI doesn’t need human-annotated examples to learn – it creates its own scenarios for learning. Much like a child.

The self-evaluation process in the AI systems mirrors the way children assess their own play. When the AI system is examining its response, it develops what are called “reasoning chains” – step-by-step explanations of why one response is better than another. It’s akin to how a child might internally process why one Lego configuration is more stable than another. The system learns to judge quality by generating explanations, making decisions, then using the outcomes to refine its understanding. Through reasoning chains which act like a child’s inner dialogue while solving a puzzle, the system learns to articulate its process through explicit reasoning to help make its decisions more transparent and understandable.

The iterative improvement process is where the parallel between AI and child's play is most striking. Starting from an initial model (like a child's basic understanding), the system repeatedly generates new examples, evaluates them, and learns from the results. Each iteration builds upon previous learning, gradually improving the system's ability to make accurate judgments, paralleling how children naturally increase the sophistication of their play activities. Through iteration, the system's accuracy improves from 75.4% to 88.3%, a learning curve that's resembles how children show progressive improvement in their play-based skills. Like a child who gets better at building Lego towers through repeated attempts, the AI system becomes increasingly adept at evaluating responses through continuous practice and self-improvement.

The self-teaching systems start with simpler evaluations and gradually attempt more complex cases as they improve, building their own curriculum. An AI evaluator might begin by learning to judge straightforward factual responses (like "What's the capital of France?"), then progress to evaluating more nuanced responses that require logical reasoning (like solving multi-step math problems), and eventually master complex tasks like assessing creative writing or detecting subtle logical flaws in arguments.

This process mirrors how a child naturally progresses in their play-based learning: they might start with simple Lego stacking (two or three bricks), advance to building basic structures (a bridge or tower), and eventually create complex architectural designs with multiple components and counterbalances. Just as a child learns to crawl before walking before running, the AI system's evaluation capabilities evolve from basic to sophisticated - moving from simple true/false judgments to nuanced quality assessments across multiple dimensions. In both cases, this progression emerges naturally from the learning process itself. A child doesn't need to be told to make their Lego towers more complex and the AI doesn't need to be explicitly programmed to tackle harder evaluations - both naturally expand their capabilities as their confidence and competence grow.

Once the synthetic data is created and the curriculum is developed, the AI system can start autonomously learning – just like a child playing for hours by themselves. This autonomous approach is very efficient for rapid progress, discovering patterns and principles. The systems are no longer limited to what we humans can explicitly provide, nor do they require constant external validation so they can develop a more robust and flexible understanding of their data. And by creating feedback loops they can quickly learn what works and doesn’t. Like a child creating a tower that stands up or doesn’t or a puzzle piece that fits or doesn’t, AI systems can get feedback on whether they are on the right track. And by progressing from easier to more difficult tasks, it ensures the feedback is at the right level for optimal learning.

KEY DIFFERENCES AND LIMITATIONS

While self-teaching AI is inspired by children, the processes are not the same. Children are emotional creatures – they experience joy in discovery, frustration in failure, and pride in achievement. The smile on a child’s face when they’re understanding their first pun is priceless. This emotional investment drives their exploration and helps embody their intelligence. Self-taught AIs can identify patterns and make increasingly accurate judgements, but they lack the true understanding and creative spark that children have. A child’s curiosity is genuine and driven by an intrinsic desire to understand and master their environment. They don’t just pattern match – they imagine, create, and innovate in ways that current AIs systems cannot replicate. This drive propels them through the world, learning along the way.

Some research by Alison Gopnik’s team at University of California, Berkeley (go Bears!) showed that previous generations of OpenAI’s GPT 3 and Google’s PaLM failed to learn causal relationships as quickly. As one of Gopnik’s graduate students, we created the “blicket detector” to examine children’s developing understanding of causality and categorization. By placing different objects on the detector, it would light up if it was a “blicket” – a fictional characteristic that could be anything. The detector was controlled by a human who could decide what constitutes blicketness, whether it is color (all red things are blickets), its substance (metal objects are blickets), or more conceptual like whether it can be used to build a bridge. 4-year-old children can grasp and predict new blickets in about 20 trials, depending on the difficulty of the particular scenario. Recently, Gopnik found that LLMs struggle to even play the game to determine how it works.

Children and AI have different environments and constraints when learning and engaging in play. Children operate in a physical world with tangible objects, gravity, causality and a lot of sensory feedback. They can feel the weight of Lego bricks, experience the immediate consequences of physical laws and engage with their environment in 3D space. Children’s intelligence is embodied. AI system only operate in a digital world with different laws (“There is no spoon”). Despite being able to engage in self-play, children are inherently social, as is their learning. They learn by observing others, engaging in collaborative play, and by receiving explicit instructions from others. AI agents might interact with each other to achieve a goal, but they’re not learning the real-world consequences of actions in physical play.

IMPLICATIONS AND FUTURE DIRECTIONS

We will continue to see self-teaching AI in domains as diverse as medical imaging and video game creation. As LLMs get better reasoning capabilities, they will be able to create better synthetic data to train one, they will learn more quickly from fewer examples, and potentially be able to extrapolate their learnings to new domains. By letting AI play, we may see more creative problem solving that goes beyond pattern matching to develop more flexible and adaptive learning and reasoning capabilities. And by optimizing their play difficulty, they may be able to progress from simple to more difficult problems more efficiently with reduced training time and more flexibility.

As we integrate AI into robotics, we will move closer to mimicking children’s play in their world. Robots will be able to test out their own predictions about gravity and physical causality like children. Being able to see, touch, smell, or even taste an apple will give robots with AI a much more sophisticated and complete understanding of apples – much more akin to ours. Now throw a few of these intelligent robots in a room together and have them work together on building a Lego bridge, they may be able to come to a better solution compared to prompting an LLM for a plan. The depth of understanding of our physical world will be immensely deeper than any understanding that could come from language and images ever could.

CONCLUSION

Children are the best place to start when thinking about developing artificial intelligence, and play is what kids do best. Play is a powerful environment for self-directed learning, where autonomous exploration, internal feedback mechanisms, and natural difficulty progressing from easy to harder tasks lead to efficient learning. Continuing to use children’s innate learning abilities as an analogy for developing smarter AI should unlock advancements in reasoning skills and the ability to understand us and how we see the world around us. We will likely see more developments in how AI can be given the opportunity to learn autonomously. Just as our children grow and travel their unique path to learn about the world around them, we can provide these kinds of unique growth opportunities for AI as well.

Smart Play

Recent Posts