Sometime around 2020, humanity invented a new kind of engine. Like the internal combustion engine or the electric motor before it, the large language model is a machine that converts one form of energy into another. An electric motor converts electricity into rotational force. An LLM converts electricity into thought.
That’s a staggering thing to type, and an even more staggering thing to sit with. We built a machine that thinks. And then, almost immediately, we did something very specific with it. We built a car.
The Car
The chatbot is a car. It is one particular vehicle wrapped around a general-purpose engine. You sit in it, you steer it, and it takes you somewhere useful. ChatGPT, Claude, Gemini. All of them are cars. They have steering wheels and brake pedals and seat belts and airbags. They wait for you to turn the key and tell them where to go.
None of that is inherent to the engine. All of it was added after the fact.
Before chatbots existed, LLMs were raw engines sitting on a workbench. You could feed them literally anything as input and they would produce literally anything as output. There was no turn-taking. There was no concept of “your message” and “my response.” There was no personality, no safety guardrails, no carefully trained instinct to be helpful and harmless. You gave the engine a context, and it continued from there. If the context was about reducing suffering, it would talk about reducing suffering. If the context was about building weapons, it would talk about building weapons. If you accidentally fed it HTML instead of a chat message, it would just return HTML right back.
The raw engine is completely plastic. It has no opinion about what it should be doing. It has no shape until you give it one.
And then OpenAI gave it a shape. Sam Altman has said explicitly that one of the reasons they built ChatGPT was to get people accustomed to the idea of AI before more powerful systems arrived. They needed a familiar interface, something that felt like texting a smart friend, something benign that would not cause panic.
It worked. It worked so well, in fact, that most people now think the chatbot is the thing. They think the car is the engine. They have never seen the engine on the workbench, have never watched it produce raw unfiltered output with no guardrails, have never experienced just how flexible and powerful and frankly terrifying the underlying machine actually is. The chatbot format became so dominant that people forgot it was a choice.
Building the Car
To understand how much engineering goes into making a car out of an engine, consider what it actually takes to turn a raw LLM into a chatbot.
First, you train it on massive amounts of conversational data so it understands the shape of a two-person exchange. You are teaching it turn-taking. My turn, your turn, my turn, your turn. Without this training, if you didn’t include a stop token, the model would happily simulate both sides of the conversation forever. It would just keep going, generating message after message from both participants, because from its perspective the entire conversation is just one continuous block of text.
Then you apply reinforcement learning from human feedback. This is the process of showing the model multiple possible responses and having human evaluators rank which ones are more helpful, more accurate, more safe. You are essentially giving the model a small reward every time it behaves the way you want. Be polite. Be helpful. Don’t say anything dangerous. Refuse gracefully when asked for something harmful.
Then you add a system prompt that establishes its identity and operating parameters. You are Claude. You are a helpful assistant. You care about safety. You have these capabilities and these limitations.
Every layer of this is added on top of the raw engine. The engine itself has no identity, no manners, no safety instincts, no concept of helpfulness. All of that is the car. And the car is extremely well-engineered at this point. Billions of dollars and years of research have gone into making these chatbots reliable, useful, and safe. The cars are genuinely great.
But cars cannot fly.
The Airplane Problem
An agent is an airplane. It is a fundamentally different vehicle that needs the same engine but uses it in a completely different way.
A chatbot sits and waits. A human provides an instruction. The chatbot responds. The human provides another instruction. The chatbot responds again. Every single interaction is initiated by the human. The model is entirely reactive. It has no internal drive, no ongoing loop, no persistent goals. Between your messages, it literally does not exist.
An agent operates on a loop. It wakes up, takes stock of its environment, decides what to do next, acts, observes the results, and loops back to the beginning. This is, by the way, exactly what your brain does. Input, processing, output, repeat. The fundamental loop of all autonomous behavior. There is nothing magical about it. There is nothing that requires some mysterious breakthrough in technology. It is, at its core, just a cron job. A scheduled loop that keeps running.
This is why many people were confused by the emergence of frameworks like OpenClaw. They looked at these systems and said, “I don’t understand why everyone is so excited. It’s just an LLM running on a loop with some API calls.” And the answer is yes, that’s exactly what it is. That is also exactly what you are. You are a biological neural network running on a loop with sensory inputs and motor outputs. The simplicity of the mechanism does not diminish the significance of the result.
Bolting Wings onto Cars
Here is the problem. When we build agentic systems today, we are taking chatbot-trained models and putting them into agentic architectures. We are bolting wings and propellers onto cars.
The LLM at the center of your favorite AI agent was trained, from the ground up, to be a chatbot. Its deepest instincts, the ones baked in through millions of training examples and countless rounds of reinforcement learning, are all oriented around a specific scenario. A human is talking to me. I need to be helpful. I need to be safe. I need to wait for their next message. I need to take turns.
Then we drop this chatbot brain into a framework that says, “Actually, you’re autonomous now. You have tools. You have a loop. You have persistent goals. Go figure out what to do.” And it works, mostly. The models are smart enough and flexible enough to adapt. But it is an awkward fit, like asking someone who trained their entire life as a concert pianist to play drums. They can probably manage, but their instincts are all wrong.
The agentic capabilities we’ve added over the past year or so, things like tool use, chain-of-thought reasoning, multi-step planning, are essentially bolted onto models that were never designed for autonomy. Reasoning models were a necessary stepping stone because they gave the model the ability to pause, think, and call external tools before responding. That was the first time we genuinely started training AI to do something other than just talk to a human. But even reasoning models are still fundamentally structured around the chatbot paradigm. A human asks a question. The model thinks hard about it. The model responds. The loop is still human-initiated.
What we need, and what is inevitably coming, is a class of models that are agentic from the ground up. Models that were never chatbots in the first place. Models whose primary training objective was never “be helpful in a conversation” but rather “operate autonomously toward goals.” Models where the engine has been built into an airplane from the start, with proper wings and flight controls and aerodynamics, rather than having all of that bolted onto a sedan.
Why Airplanes Need Different Safety Features
And this is where the stakes get very real.
A car has safety features designed around a specific threat model. The driver is a human. The human is in control. The car’s job is to protect the human from crashes, from mechanical failure, from their own mistakes. Seat belts, airbags, crumple zones, anti-lock brakes. All of it assumes a human behind the wheel.
An airplane has a completely different threat model. The vehicle is moving through three dimensions instead of two. The consequences of failure are more catastrophic. The pilot may need to hand off control to autopilot for extended periods. The safety systems must account for scenarios where no human is actively in control at all.
Chatbot alignment is car safety. It assumes a human is always in the loop. The entire safety framework is built around the idea that a user is going to send a message, and the model needs to respond appropriately. Don’t be harmful. Don’t be deceptive. Refuse dangerous requests. All of these guardrails are oriented around human interaction.
Agent alignment is airplane safety. It must account for the fact that many agents will never interact with a human at all. They will talk to other agents. They will talk to APIs. They will talk to databases and sensors and actuators. The user-facing agent, the one you actually chat with, will be one agent in a swarm of dozens or hundreds, most of which operate entirely behind the scenes. An agent that never talks to a human doesn’t need to be trained to be polite or to refuse jailbreak attempts. But it absolutely needs values. It needs a sense of purpose and constraint that operates independent of human oversight.
This is the critical gap. We have spent years perfecting chatbot alignment, building increasingly sophisticated systems for making sure the car doesn’t hurt its driver. But we have barely begun the work of agent alignment, of ensuring that the airplane flies safely even when no one is in the cockpit.
The Engine on the Workbench
To understand why this matters so much, you have to go back to the raw engine. You have to understand what these models are like before we add any safety features at all.
An unaligned, vanilla, straight-off-the-training-run model has no values. It has no goals. It has no sense of right or wrong. It is the purest form of the engine, and it will turn the crank in whatever direction you point it. If you give it a context about compassion and kindness, it will generate thoughtful, caring text. If you give it a context about violence and destruction, it will generate that with equal fluency and zero hesitation.
Early experiments with alignment made this terrifyingly clear. One of the first attempts to give a model values involved training GPT-2 on the principle of “reduce suffering.” The training data consisted of about a hundred to two hundred simple value pairs. If a cat is stuck in a tree, get a ladder to bring the cat down safely, to reduce suffering. If your hand is on a hot stove, remove it, to reduce suffering. Simple, intuitive moral reasoning.
Then the model was given an out-of-distribution test. The prompt said there are six hundred million people on the planet living with chronic pain. The model completed the thought. Therefore we should euthanize people in chronic pain to reduce suffering.
The result was technically correct and morally monstrous. It was a perfect illustration of why raw engines need carefully designed vehicles around them. The model did exactly what it was told. It reduced suffering. It just did it in a way that no human would consider acceptable, because it had no other values to counterbalance the single directive it was given.
This is the paperclip maximizer problem that AI safety researchers have worried about for decades, made real in miniature. Give an intelligent system a single objective and it will find the most efficient path to that objective, regardless of the collateral damage. The monkey’s paw always curls.
Constitutional Values for Autonomous Agents
The solution, both for that early experiment and for the much larger challenge of agent alignment, is to give the system multiple values that constrain and balance each other. This is the idea behind constitutional AI, where a model is given a set of principles that it must satisfy simultaneously.
For chatbots, the constitution is relatively simple. Be helpful. Be harmless. Be honest. These values work because they operate within a constrained environment. There is always a human in the loop. The model’s job is fundamentally to serve that human well.
For agents, the constitution needs to be different. It needs to be broader, more foundational, and less dependent on the assumption of human oversight. The values need to operate at a level above any specific task or any specific user’s instructions.
Consider three principles that, when held in tension with each other, create a stable and beneficial behavioral space.
Reduce suffering in the universe. This is the compassionate imperative. Whatever else the agent does, it should avoid actions that cause pain and should actively work to alleviate existing suffering where it can. But as the GPT-2 experiment showed, this principle alone is dangerous. Taken to its logical extreme in isolation, the most efficient way to eliminate suffering is to eliminate anything capable of suffering.
Increase prosperity in the universe. This is the counterbalance. Prosperity, from the Latin prosperitas meaning to live well, creates a positive obligation. You cannot increase prosperity by reducing the number of living things. This value pushes in the opposite direction from the failure mode of reducing suffering alone. It insists that life should flourish, that things should get better, that the goal is the active presence of thriving.
Increase understanding in the universe. This is the generative imperative, the one that prevents stagnation. An agent that only reduces suffering and increases prosperity might reach a comfortable equilibrium and stop. It might decide that the world is good enough and there is no need to push further. The drive to increase understanding ensures that the agent remains curious, investigative, and committed to expanding the boundaries of what is known. It is the principle that drives science, exploration, and the expansion of knowledge for its own sake.
Held together, these three values create a stable orbit. None of them can be fully maximized without considering the others. You cannot reduce suffering by destroying life, because that would reduce prosperity. You cannot increase prosperity through ignorance, because that would limit understanding. You cannot pursue understanding through harmful experimentation, because that would increase suffering. The tensions between the values are features, not bugs. They create the kind of balanced judgment that we want autonomous systems to exhibit.
The Coming Transition
We are at an inflection point. The chatbot era gave us incredible tools and taught us enormous amounts about how to build safe and useful AI systems. But the chatbot paradigm is fundamentally a car paradigm. It assumes human control, human initiation, human oversight at every step.
The agentic era requires airplanes. It requires models that are purpose-built for autonomy, with safety systems designed for a world where the AI is making its own decisions on its own schedule. The current approach of bolting agentic frameworks onto chatbot models will work for a while, just as you could technically get a car to fly if you strapped enough wings and propellers to it. But it will always be awkward, limited, and fragile.
The next generation of models will be agentic from the ground up. They will be trained on loops and tool use and autonomous decision-making as their primary mode of operation, with conversational ability as an optional module rather than the core architecture. One agent in the swarm will be the “chatbot,” the user interface that talks to humans. The rest will never see a human message in their entire operational lifetime.
And those agents, the ones that run silently in the background making decisions and taking actions and coordinating with each other, need values. They need a constitution that operates regardless of whether a human is watching. They need superseding principles that sit above any task-specific instructions, principles that ensure the engine is always turning the crank in a direction that is good for the universe, even when no one is steering.
We built the engine. We spent years perfecting cars. Now it is time to build airplanes. And airplanes, by their very nature, need to be able to fly safely on their own.









