AI with Kyle
Posts
AI with Kyle Daily Update 173

AI with Kyle Daily Update 173

Today in AI: Opus 4.7 is... a flop?

Kyle Balmer
April 20, 2026

This is a tricky one.

I want to start by saying I love Claude. I use it every day. I pay Anthropic $200 a month and it's the best business expense I have.

I’m a fanboi. And have been for years.

That's the context for what I'm about to say.

I think Opus 4.7 is a stitch-up. This is a bad launch. And worrying for the future of AI as a whole.

Again, love you guys but…yeesh.

The bait-and-switch theory(?)

Over the last month, something weird has been happening with Opus 4.6. I noticed it. Power users on Hacker News and Reddit noticed it. The model seemed to get dumber. Responses used more tokens to say less. Limits that used to be generous started biting within hours.

— (@)

Normally when people complain about this sort of thing it feels like noise. We’re on a hedonic treadmill and constantly expect more for less. I’m one of the first to dismiss it as noise to get clicks.

But I felt it this time too..which makes it hard to ignore. As have many smart people I’ve talked to.

I'm personally on the $200 a month plan. I've never hit limits on Claude before. This month I started slapping into them on Opus 4.6 for the first time. I wasn't doing anything new.

Then last week Anthropic drops Opus 4.7. The community's read on what actually shipped? This is basically the original 4.6. The one from a couple of months back that felt like a step towards AGI. But rebranded so that we accept the tigther limits.

I can't prove any of this. Nobody can. But it no longer feels like that wild of a speculation.

If the cynical reading is right, Anthropic degraded a product users were paying for, then basically re-released it as a new product. But one that uses even more tokens…

The new tokenizer (up to 35% more input tokens)

OK this one isn't a theory. It's in the official migration guide.

Anthropic says Opus 4.7 uses an updated tokenizer. The trade-off, in their own words: "the same input can map to more tokens, roughly 1 to 1.35 times."

From the official launch blog

Read that again.

The headline pricing per token hasn't changed. They repeated that loudly. But if the same chunk of text now counts as 35% more tokens, you're paying 35% more for the same input. Same price per token is meaningless when the token count goes up.

Plus it’s sneakily dropped in as a lower line item. Below the headline pricing per token.

Now, the charitable read is “ah well but the model will be able to solve problems for you much more effectively so the increased tokens doesn’t matter”. Maybe. That doesn’t seem to be the consensus so far - but we are early.

Adaptive reasoning (the auto-router problem)

Opus 4.7 got rid of extended thinking. In its place: adaptive reasoning.

Same thing right? Nope!

Adaptive reasoning is an auto-router. You toggle it on and the model decides for itself how much effort to apply to your question. Light thinking, medium, hard. You don't get to pick.

We've seen this movie before. Six seven months ago, GPT-5 launched with exactly this design. They were trying to make ONE unified model. Users hated it. The auto-router made stupid calls, and because there was no manual override, power users were stuck with whatever the router decided. Within a week, OpenAI backtracked and bolted the model picker back on.

Anthropic has now copied the mistake.

Ethan Mollick said it best on X - adaptive reasoning is bad in the ways all AI effort routers are bad, magnified by the fact there's no manual override.

— (@)

Expect a U-turn within a couple of weeks.

MRCR and long context

This is feeling like a pile on but I’ve started so I’ll finish.

MRCR stands for Multi-Round Co-reference Resolution. It's a benchmark that tests how well a model can find specific information inside a massive context window. Context window is basically (but not really) the memory the model has to work with - the larger the context window the more “stuff” you can throw at it. Useful if you care about the 1 million token window Opus keeps advertising.

Opus 4.6 scored 92% on MRCR. Better than Gemini 3.1 Pro. Better than GPT-5.4 High. Genuinely best in class.

Opus 4.7 scored 59.2%. That's a 30-point drop.

Normally number go up. Why no number up?

MRCR wasn't on the main announcement page. It got buried in the systems card. When someone called Anthropic out on it, the head of Claude Code basically said the benchmark isn't something they're focused on any more.

That's one of three things. Either they trained against the benchmark before and don't want to now. Or the benchmark doesn't matter. Or 4.7 is genuinely worse at long context than 4.6.

Any of those options should worry you if you work with large codebases, long documents, or 1M-token projects.

The big context window was the reason to use Opus. If 4.7 can't actually use that context well, the pitch falls apart. And quietly removing the benchmark you were winning on, right before dropping the new model, is not a good look.

Now there may be a genuine reason for this. As always: USE the model and see what you personally think. Does it work for you? What results do you get.

Let’s get real: you're not the customer

I’ve said this before. And I’ll say it again.

Anthropic isn't optimising for people like us. We pay $20, $100, maybe $200 a month. That's peanuts compared to enterprise contracts and API spend.

About 80% of Anthropic's revenue comes from enterprise and API. Not consumer subscriptions. They're heading into an IPO. They need to show profitability. Individual subscribers are cost centres, not profit drivers.

They don’t want us stinky consumer peasants!

So when you complain that your limits tightened, that the tokenizer changed, that the auto-router took your control away, it doesn't really matter. Fortune 500 companies running API workloads don't care. That's who Anthropic is designing for.

OpenAI has the opposite problem. Almost a billion users, most of them free, bleeding cash. They're trying to pivot to enterprise too, which is why Fidji Simo has been killing side projects like Sora.

This is what Anthropic foresaw and avoided. Anthropic's already there. That's why their run rate is closing in on $30 billion while OpenAI hovers around $25 billion.

— (@)

Now this is not malicious. It's a business. But if you're paying $20 or $100 a month and wondering why things feel worse, this is the reason. Anthropic doesn't hate you. You're just not their customer. Soz.

How to survive the squeeze

Intelligence was supposed to get too “cheap to meter”. It still will, eventually. Hopefully…

But not this year, and probably not next. GPU shortages, RAM prices, data centre bottlenecks, investors wanting returns before IPOs - the short-term direction is more expensive, not less.

Here's what you can actually do to keep working with AI:

Use Sonnet as the default. Opus is for planning, architecture, and hard problems. Sonnet is for implementation. If you're on the $20 plan, you probably shouldn't be using Opus at all beyond a few turns a day.
Architect in Opus, build in Sonnet. Spend your Opus turns on the plan. Let Opus write detailed, step-by-step instructions. Then hand the plan to Sonnet for the actual work. You'll get 90% of the quality at a fraction of the cost.
The caveman method. Stick this in your system prompt: "Respond like a caveman. No superfluous words. No long explanations. Just the output." Sounds daft. Works. I've seen people cut their token usage by 30-40% with one line of config. Or use this skill: https://github.com/JuliusBrussee/caveman/tree/main
Codex is genuinely catching up. ChatGPT Codex has more generous limits than Claude Code right now. If you keep hitting ceilings in Claude, use Codex for implementation and keep Opus for planning. They also just shipped a Cowork-style tool where Codex takes over your computer.
Local models for implementation work. MiniMax and Qwen 3 run cheaper than the Claude API and are good enough for most coding tasks. The trick is using the state-of-the-art model to write the plan, then letting the cheaper model execute.
Cut the Ralph loops, the Obsidian second-brain setups, the recursive self-improving agent loops, the auto-looping workflows on Twitter - they burn tokens like nothing else. Lovely to read about. Expensive to run. Skip them for now unless you have cash to burn.

The days of paying $20 a month for unlimited access to the best AI are gone. Not coming back for a couple of years. That's the reality. So either find workarounds or pay up. If you're running a business on this stuff, $200 a month is nothing. But if you're a hobbyist, Sonnet plus caveman mode is your friend.

Also…can’t overstate this enough: Build now. While the models are still good, while the limits are temporarily loose for the launch window, while the window to use this stuff at scale is still open. The cost isn't going down this year. It’ll get worse.

Member Q&A

"Can I use 4.7 for oversight and 4.6 for context management?" - Yes, and that's actually a clever setup. Let 4.7 supervise the high-level decisions and 4.6 handle the long-context work it's still better at. Worth testing.

"Are local models still worth it?" - Yes if you have the hardware. No (maybe) if you're buying from scratch. RAM prices and GPU costs are climbing fast. If you already own the machine, run MiniMax or Qwen. If you don't, rent GPU space from a provider.

"Is Codex a solid alternative?" - It is now. I switch between Codex and Claude Code depending on the task. Codex has more generous limits and their new Cowork-equivalent looks promising.

"What about Manus?" - I check in occasionally. I don't see the point when Claude Cowork and Codex do the same things natively. Some people love it for research. Your mileage may vary.

Kyle