shyamal's space

Logo


Applied AI @ OpenAI • AI Advisor to Startups • On Deck Fellow • Proud Son • Duke + Wisconsin Alum • Building for impact • Venture Scout • Neo Mentor • Duke AI Advisory Board

28 December 2024

the better lesson

by Shyamal Anadkat

Share on:

We still underestimate how much we can squeeze out of large language models simply by giving them more time to think. Consider the lessons from AlphaGo, where progress emerged not from intricate hand-engineered features but from scaling simple methods, applying massive search at inference, and integrating deep reinforcement learning. As the “bitter lesson” in AI suggests, what often matters most is scaling what works. The simple insight here is that deep learning - combined with aligned scaling - tends to outperform more delicate approaches. We don’t need intricate architectures as much as we need more compute, data, and ways to let the model think longer and integrate well with the environment.

The recent framing of AI capabilities - chatbots, reasoners, agents, organizations - suggests a progression that matches what we’re seeing in the broader industry. We start with simple interfaces that can talk to you. Then, given time and context, these chatbots become reasoners that can break down problems. With more tools and integrations, they evolve into agents that don’t just provide answers but take actions. Eventually, as these agents interact and coordinate, they form organizations - distributed networks of intelligence working in parallel. Along this progression, we move from AGI as a model to AGI as a system, something that can coordinate, collaborate, and execute at scale.

OpenAI’s o1 model exemplifies this progression. It ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the U.S. in the USA Math Olympiad qualifier (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA). Similar to how a human may think deeply before responding to a difficult question, o1 employs a “chain of thought” when solving problems. Through reinforcement learning, it hones this chain of thought, refining its strategies, recognizing and correcting mistakes, breaking complex problems into manageable steps, and pivoting to new approaches when needed. This iterative process dramatically enhances its reasoning abilities, underscoring the potential of systems designed to think longer and adaptively.

For enterprises, this shift is not abstract. Today, we ask these models to summarize documents. Now, we’re seeing models that can reason over documents. Tomorrow, we’ll have them serve as robust internal consultants, navigating complex codebases, reviewing contracts, and orchestrating entire workflows. This new form of “test-time compute” means giving them the capacity to marshal context, call specialized tools, and manage complexity across multiple steps. In other words, we’re learning to solve the constrainment problem: how to build systems that are powerful yet stay aligned with human goals, remaining safe and useful as they scale. Software engineering offers a preview: it’s already cheaper and easier to generate code than it used to be, just as writing became cheaper once we had word processors. What used to be a painstaking craft can now be tackled at a higher level. You still need to know what you want to build, but turning those desires into code is no longer the hard part. The new difficulty lies in design, strategy, and making sure these systems behave in aligned ways. The craft shifts upward.

A similar pattern emerges across many domains. In healthcare, we started with systems that merely recorded what the doctor said, but now we have systems performing actual differential diagnoses. Soon we’ll have them guide clinical decision-making more comprehensively, and we’ll need them to do so responsibly. In security, applications are moving from static bug-spotting to dynamic penetration testing and automated defense, scaling both the breadth and depth of what these systems can do, again ensuring alignment with human interests. In gaming, we’ll progress beyond chatting with NPCs to generating entire game worlds on the fly. In education, it won’t just be better tutoring; it’ll be personalized learning environments that adapt as a student struggles or excels, reshaping the presentation of knowledge to fit each individual mind.

All this progress rests on a key insight: intelligence is not just something trapped in biological brains. It’s a physical property we can engineer and scale. Deep learning just works. We learned to melt sand into silicon, then arranged that silicon into chips that store and process information. Now we’ve taught those chips how to think about the information they contain. We are entering an era where intelligence is available on demand, scaled up, and guided by the careful application of constraints and alignment. We’ve barely begun to tap the potential of what these systems can do if they’re allowed to think more deeply and remain anchored to human values. One thing seems certain: we’ll continue doing what humans have always done - build things and then build the tools that can build those things at scale and make them useful.

tags: Startups - AGI - Reasoning