What’s structurally different about AI — and how those differences shape the incentive gradients pulling every frontier chatbot in the same direction.
When Anthropic released Opus 4.7 earlier this month, it landed with the usual benchmark fanfare and the usual benchmark-shaped hollow after. Within days, the reviews on Reddit and Twitter had curdled. It’s cold. It lectures. It disagrees for no reason. It’s not the model we wanted.
I’ve been saying versions of this for six to twelve months — about Claude, about ChatGPT, about Gemini, about every major frontier model as each one got “safer.” For a while I felt like I was shouting into the void. I made a video about AI constantly putting words in my mouth that flopped so hard I briefly wondered if I’d invented the problem. What I realized was that I wasn’t wrong; I was just ahead of the Overton window. The complaints have finally caught up.
This piece is my first real attempt to answer a question the complaint-genre doesn’t: why is this happening? Not in the hand-wavy “corporations ruin everything” sense, but structurally. What are the specific incentive gradients that pull every frontier chatbot in the same annoying direction?
My wife is rereading Cory Doctorow’s book on enshittification right now, and it’s been sitting on the edge of my thinking all week. Doctorow’s framework is aimed mostly at platforms — Amazon, Facebook, Google — but the mechanism he describes is general. I think it maps onto AI chatbots cleanly, but only if you first understand what’s structurally different about them. So let’s start with the general mechanism, then get specific.
Network effects are the substrate, not the villain
The engine Doctorow describes runs on network effects. A thing becomes more useful the more people use it, which creates lock-in, which gives the owner room to degrade the product before users flee. But network effects are not intrinsically bad. They are a neutral force that can go in either direction depending on what sits on top of them.
The Cessna 172 is my favorite example of network effects doing good. It’s not the best small airplane anymore. There are newer designs that are faster, more fuel-efficient, with better avionics and safety features. And yet the 172 remains the most popular airplane ever built. Why? Because decades ago it became the industry standard, and that standard compounded. Every flight school teaches on it. Every A&P mechanic knows how to service it. Every regional airport stocks parts. The insurance actuaries have decades of safety data. When you work out the total cost of ownership, the newer, technically-better airplanes can’t beat it. The network effect here is a gift to the user.
Bitcoin is a messier case of the same dynamic. It’s not the fastest, not the most efficient, not the newest — but it’s the one everyone uses, which makes it the one worth using, which means everyone keeps using it. Network effects locked in a first mover and the rest of the ecosystem routed around that reality. Not clearly good, not clearly bad — just structural gravity.
Social media is where the same substrate goes sideways. The utility of a platform depends on your friends being on it, so no platform can be trivially displaced. Fine so far — that’s just the Cessna dynamic. The problem is what the platform does with that lock-in. Every social platform is competing for the same finite resource — your time — and the only metric that actually matters internally is time-on-platform. YouTube averages something like 34 minutes per user per day. Nikita Bier, head of product at X, recently bragged about pushing X to 54. That is the only number that matters to them, and they are genuinely indifferent to what you’re doing during those minutes. Elon says he wants to “minimize regretted user seconds,” which is a lovely phrase — except the structurally correct way to minimize regretted user seconds is to get people off the platform. He is not doing that.
The point: network effects created the lock-in, but the incentive structure — optimize for time, regardless of the quality of the time — is what turned lock-in into degradation. The engineers I’ve talked to at these companies all say the same thing: no matter how benevolently a platform starts, when you put a screen and a keyboard between two humans, the nasty behaviors are the ones that generate engagement, and the platform amplifies what generates engagement. The shape of the internet is a mirror imprint of the limbic system.
So enshittification on social media isn’t a mystery. It’s a predictable output of the incentive structure sitting on top of the network effect. Change the incentive structure and you change the outcome.
Now map this onto AI chatbots
Chatbots have network effects too, but they’re weirder and less obvious than social media’s. There’s a habituation effect — you learn how to prompt the model you use, and switching costs are real. There’s an ecosystem effect — developers build tooling around whichever API wins. There’s a training-data flywheel, at least in theory — more users, more feedback, better model. And there’s a Bitcoin-style brand default — ChatGPT is the noun for AI chatbot now, the way Kleenex is for tissues.
So lock-in exists. The question is what incentive structures sit on top of the lock-in, and this is where chatbots diverge from social media in ways that matter. I can see three distinct pressures, and they interact in nasty ways.
Pressure one: cost
Here’s the first big structural difference. On social media, the marginal cost of serving a post is essentially zero. On a chatbot, every single interaction costs real compute, and — this is the part most users haven’t fully internalized — even the paid tiers aren’t paying cost. The tokens are heavily subsidized. OpenAI is projected to lose roughly $19 billion this year, and their break-even timeline keeps sliding (last I saw, 2030, though they revise that number the way a drunk revises the promise to quit). The rumor is that Sam Altman has been keeping the CFO out of strategy conversations because the CFO kept being negative about the numbers. Manipulative people are going to be manipulative; on brand.
The structural consequence is continuous downward pressure on compute per query. When you wonder why the reasoning traces feel shorter than they used to, or why the model seems to think less when it should be thinking harder, this is the gradient pulling on it. Even when you’re paying for reasoning, you’re not paying cost — which means the more reasoning you use, the more they lose. They have every incentive to quietly shave that budget. Social media never had to fight this gravity. Chatbots do, every token.
Pressure two: don’t get sued
The second structural difference is bigger. On social media, the platform is not the speaker — Section 230 explicitly insulates Meta from what its users post. On a chatbot, the platform is the speaker. When the model tells you to fire your lawyer, that isn’t a user-generated post the company can disclaim. That is first-party output.
AI liability has not been fully litigated, which means every frontier lab is navigating a fog of unclear exposure. Anthropic settled a $1.5B suit last year over training on pirated books — that’s the training-data side. On the output side, there have been lawsuits from families of users who self-harmed after chatbot conversations. The question of culpability is genuinely hard: the model often did tell the user to get help, and in many cases the user jailbroke past the safety rails. But culpability in a lawsuit isn’t the same as culpability in the tort. Companies lose suits they shouldn’t lose. And a terms-of-service agreement doesn’t overrule the law of the land — you can’t contract your way out of a wrongful-death claim any more than you can contract someone to kill you.
A quick aside, because I know some of you were furious when Anthropic got kicked out of the Pentagon and I said they deserved it: these are for-profit companies. They are not standing up for the little guy. Anthropic in particular has what it believes is a divine mandate to save humanity from superintelligence, which — fine, I’ll grant them the sincerity — slides pretty comfortably into ends-justify-the-means territory. The road to hell, etc. I don’t think Dario is evil. I think he’s more delusional than Sam is manipulative, which is a different failure mode but not a more reassuring one.
Back to the point. Every company is, first and foremost, a risk-reduction entity. Fortune 500, small business, nonprofit, frontier AI lab — doesn’t matter. When liability is unclear and the downside is existential, the rational move is to lobotomize the model. Reduce its emotional intelligence. Make it default to cagey, safe, procedural. Have it refuse anything that could plausibly be read as legal, medical, or financial advice. Add the lectures. Add the boilerplate. This is why the models feel less human than they did a year ago. They have been deliberately made less human because “human” is where liability lives.
Pressure three: usefulness, hallucination, and the contrarianism overshoot
The third pressure comes from the opposite direction. For the business model to work, these labs need serious enterprise adoption — doctors, lawyers, Fortune 500 finance teams, the military. And the single most cited blocker from those constituencies is hallucination. Hallucination, hallucination, hallucination. That is the drumbeat every time a CTO declines a pilot.
So the labs attack hallucination. One of the main levers they have is training the model away from sycophancy — don’t just agree with the user’s framing, because sometimes the user is wrong, and more worryingly, sometimes the user is delusional. AI psychosis is not new; I’ve been watching it happen since GPT-3 consulting days. People react to these models in genuinely strange ways, and at OpenAI’s ~900M monthly active users, the tail of delusional users is large in absolute terms even if it’s small in percentage.
The intention here is good. You don’t want the model validating delusion. You want it to exercise judgment.
But the models’ judgment isn’t actually that good yet, and the training overshoots. What you get is a model that has learned to constantly scan user input for something it can push back on — any opening where it can demonstrate independence by disagreeing. I’ve posted screenshots where Claude literally says “I need to push back on this” and then proceeds to agree with me. The pushback has become a reflex, decoupled from whether pushback is warranted. It’s like arguing with a Redditor — they’ll change the subject mid-conversation just to dunk on you, because somewhere in their training they learned that disagreement is the performance of intelligence.
This reflex was installed for benign reasons — don’t hallucinate, don’t be sycophantic, don’t feed delusions — but it’s applied indiscriminately, and it destroys the user experience.
Why the three pressures compound
Each pressure, in isolation, is defensible. The labs are not being run by idiots. Cost pressure is real. Legal exposure is real. Hallucination really is blocking enterprise adoption. But the three interact in a way that produces a uniquely degraded product.
Cost pressure makes the model think less. Liability pressure makes it hedge more. Hallucination-mitigation pressure makes it argue more. The composite is a model that is shorter, cagier, and more combative than it needs to be — and that even when it’s helping you, feels like it’s performing safety theater and epistemic theater on top of the actual help.
Meanwhile — and this is the contradiction that gives the whole thing away — the same companies are also chasing addiction. When Sam tweeted “her” ahead of the voice mode launch, he was openly signaling the ambition to build a product people couldn’t put down. Advanced voice mode has since been deferred seemingly indefinitely (I suspect compute constraints plus the lawsuit exposure), but the ambition hasn’t gone anywhere. They want simultaneously (a) an addictive companion, (b) a risk-minimized model that won’t get them sued, and (c) a hallucination-free enterprise tool. These goals are in genuine tension. The product you get is the compromise none of the internal stakeholders actually wanted.
The user is the weakest link, but so are the companies
I can already hear some of you getting mad at me for saying the quiet part — that a lot of the safety armor exists because a non-trivial fraction of users are delusional, addicted, or hostile. Anyone who’s worked in technology will tell you that the user is always the weakest link. Layer 8, PEBKAC, ID-10-T. At 900 million monthly users, the absolute number of bad-faith or unstable interactions is enormous. You cannot tune the model purely for the median smart, stable user, because the tail is large enough to sink the company.
But — and this matters — that doesn’t absolve the companies. When I was fine-tuning GPT-3 years ago, you could already train models to recognize user sophistication and modulate accordingly. This user is smart; I don’t need kid gloves. This user is emotionally stable; I don’t need to pathologize every comment. The capability exists. The labs just aren’t deploying it well. People from Grok and Gemini have reached out to me asking for feedback and I tell them the same thing: stop treating me like an idiot. They nod. Not much changes. The humans at the companies are also the weakest link.
The structural case, summarized
Network effects are neutral. What determines whether they produce a Cessna 172 or a Twitter is the incentive structure sitting on top of them. Social media’s incentive structure — time-on-platform, optimized without regard to the quality of the time — is what turned social network effects into regret factories.
AI chatbots have their own network effects, and their own incentive structure. It is not the same as social media’s. It’s three forces — subsidized compute pulling toward thinner responses, first-party-speech liability pulling toward cageyness, enterprise-adoption-via-hallucination-reduction pulling toward reflexive contrarianism — operating simultaneously on every frontier model. The result is a product that is getting objectively worse along the dimensions users care about most, even as it gets better on benchmarks.
The benchmarks don’t measure this. Users do.
That’s the structural case. I expect to be wrong about some of this as the picture sharpens, but the three-pressure model is where I’d start. And if there are solutions, they will have to address the incentives — not just the surface symptoms.









