AI with Kyle
Posts
AI with Kyle Daily Update 135

AI with Kyle Daily Update 135

Today in AI: Opus 4.6 + Codex 5.3 oh my

Kyle Balmer
February 09, 2026

What’s happening in the world of AI:

https://youtube.com/shorts/Iaqi75zDKnQ?feature=share

Highlights

Opus 4.6 Dropped and I Haven't Slept

Discussed at [00:00]

I need to be honest with you. I'm rattled.

Opus 4.6 landed last week and I've been using it since. The version number makes it sound like a minor update from 4.5. It is not. I've been covering AI news every single day for the last couple of years. I see the gradual improvements. I take most releases in my stride. This one shook me. I am shooketh.

There's a concept from Ethan Mollick's book Co-Intelligence called the Three Sleepness Nights. His argument is that anyone who properly engages with what AI can do will go through various crises: what does this mean for my job, what does this mean for my kids, what does this mean for thinking itself.

If you are thinking about it properly AI will keep you awake.

— (@)

Even if you come out the other side optimistic, you still have to go through that process. I thought I'd already had my three. Opus 4.6 is giving me another one.

The benchmarks are what you'd expect. 10s, 10s, 10s across the board! Tops everything across the board in general knowledge work, reasoning, and coding.

But benchmarks don't really tell you anything. You have to use it. What I've noticed is that it's better at knowing when to think hard and when to move quickly. It stays on task for much longer stretches without losing the plot. And when it hits something it genuinely can't solve, it says so, clearly, and explains why. That last bit matters more than people realise. Previous models would always have a crack at it and hand you something that looked right but wasn't. This one will say "I don't know" and mean it.

Ethan Mollick asked it to build the Library Of Babel from the Borges short story. An hour later, Opus delivered a fully navigable 3D library where every possible combination of 410-page books exists in hexagonal chambers. You can search for any text and it computes the exact room, wall, shelf, volume, and page where it lives. I found my own name in there. Go and “play” with it…it’s utterly wild:

— (@)

I’m seeing some very smart people are asking whether this is AGI. I don't think it is, not quite, but for the first time it feels genuinely close rather than hypothetically close. That's the bit that's keeping me up. I can taste the AGI in the air…

Kyle's take: If you have access (realistically you need the $100 or $200/month plan, the $20 tier runs out of credits insultingly fast), throw some proper hard problems at it and leave it to work. You'll see what I mean. This is a step change.

GPT-5.3 Codex Dropped Ten Minutes Later

Discussed at [22:05]

Almost comically, OpenAI released GPT-5.3 Codex about ten minutes after Opus 4.6 went live. Felt very much like they'd been hovering over the publish button waiting for Anthropic to make their move.

— (@)

5.3 Codex is a coding model specifically. It works inside the Codex web app, the Mac app, and the command line. I've been using Codex 5.2 all week and was already very impressed. I ran a side-by-side test porting a sales page between two of my websites, Codex against Claude Code. Codex won. Which I was not expecting! I’ve been a die-hard Claude Code fanboi for months.

Kyle's take: The timing was clearly strategic payback for Anthropic's Super Bowl ad. Fine, whatever. Competition is good. What matters is that we now have two seriously impressive new models and Anthropic probably still has Sonnet 5 ready to go. These companies fighting each other means better tools for all of us. I nearly wore my ChatGPT hoodie for the live stream but thought better of it. Anthropic still haven't sent me one, so if you're reading this, lads, the address is in London.

Source: OpenAI GPT-5.3 Codex

Anthropic Built a C Compiler with Agent Teams (and Mostly Walked Away)

Discussed at [31:34]

Anthropic set up a team of Opus agents, pointed them at building a C compiler, and then largely stepped back. Two weeks and 2,000 Claude Code sessions later, the agents produced a 100,000-line compiler that can build Linux 6.9.

The cost: about $20,000 in API fees.

Oh, and compilers are VERY difficult to build. This isn’t just a vibe coded website or iPhone app!

— (@)

When the agents hit problems, they diagnosed them and worked through them. They branched, they experimented, they merged code. They did what a team of programmers would do, except nobody was managing them moment to moment.

Anthropic are clear that the compiler itself kinda isn't the point. It's a proof of concept for long-running autonomous agent teams. The question they were answering: can you give a group of AI agents a big, complicated project and have them coordinate across weeks without constant hand-holding?

The answer, apparently, is yes. Yes you can.

Kyle's take: This is what it looks like when you set up a team of AI agents with specific roles. One handles the database. One handles APIs. One does the front end. One manages quality. One oversees them all.

Then you step back. $20,000 sounds like a lot until you think about what a team of human developers would cost over two weeks. Not even close. And the humans wouldn’t have been working 24/7 either…

We're moving from "AI helps me write code" to "AI teams build entire projects while I sleep." That's not a small shift. That's everything changing.

Source: Anthropic C Compiler blog

Greg Brockman's Memo: How OpenAI is Retooling for Agent Engineering

Discussed at [34:38]

Greg Brockman, president of OpenAI, put out a detailed post about how they're restructuring their own internal teams around agentic workflows. Some of his engineers told him their jobs have fundamentally changed since December. Before that, Codex was useful for writing test units. Now it writes essentially all the code and handles most of the debugging.

— (@)

Their target: by March 31st, the default first tool for any technical task at OpenAI will be interacting with an agent, not opening an editor or terminal. The editor and terminal still exist, but they're being pushed to the background. Chat and agent orchestration are the primary interface now.

He's also recommending every team designate an "agent captain" to figure out how agents fit into their specific workflows.

Kyle's take: Highly recommend reading the whole piece as an introductory guide to how to start thinking about this next “stage” of using aI.

When the president of OpenAI says the editor is becoming a secondary tool and chat with agents is the primary interface, pay attention. This mirrors what Karpathy said about vibe coding evolving into agentic engineering. The code is still there. You can still look at it. But increasingly, you won't need to.

Source: Greg Brockman on X

The “SaaS Apocalypse” is Here and the Stock Market is Spooked

Discussed at [40:18]

Software stocks are getting hammered. Amazon dropped 12%. The Nasdaq had its worst two-day tumble since April. SaaS companies across the board are taking a beating.

The reason: the moat around building software has essentially disappeared.

There's a squeeze happening from two sides. On the supply side, anyone can now point Claude Code or Codex at a competitor's software, scrape the negative reviews, and build a better version. DocuSign has 7,500 employees for a document signing product. Seven and a half thousand people. All those salaries push costs onto customers. A solo developer with good AI tools can build something that does 90% of the same thing and charge a tenth of the price.

On the demand side, companies are starting to build their own tools instead of paying for subscriptions. I rebuilt my own version of LinkTree in three minutes fifty seconds after they put their prices up 50%. Anton from Lovable said one of the biggest use cases on their platform is people replacing SaaS subscriptions. Scheduling tools, project management, form builders, support portals. None of this is rocket science to build anymore.

Kyle's take: I'm completely fine with this. Paying $50 a month for DocuSign is silly. CRMs might be safer because they're more complex and mission-critical, but you never talk to a Salesforce customer who actually loves the product. They use it because they've been using it for 15 years and switching would be a nightmare.

That lock-in is real, but it won't protect them forever. Maybe for a 5-10 years? As enterprises are risk-averse. But beyond? Nah.

For SMEs and entrepreneurs, the move towards building your own tools or hiring someone to build them for you is already happening. The bloated SaaS companies that have been passing their inefficiencies onto customers are going to get squeezed out.

Oh well!

Here’s a great guide on replacing your Saas subscriptions with AI btw:

— (@)

Source: Business Insider coverage, CNBC coverage

Member Questions:

"If I wanted to offer automations to businesses, what's the best way to go about it?"

Kyle's response: Start with their problems, not your product. I see too many broccoli-headed boys on TikTok saying "here's 10 N8N automations you can sell for $5,000 a month." Nonsense. Go into the business. Sit down with the staff. Ask what's boring, what takes too long, what they'd love off their plate.

Honestly, most of the work is behavioural. You're implicitly threatening people's jobs when you automate their tasks. If someone spends 10 hours a week copying data from Excel into a database, that's part of their job. Even if it’s dumb work!

You come in and automate the job so it takes milliseconds, they're wondering what's left for them. You have to work with people, not against them.

Oh and finally, most businesses have their data scattered across random desktops in Excel files anyway. Getting the inputs right is half the battle before you even think about automation!

So: it’s about the people still!

This question was discussed at [28:07] during the live session.

"How would you compare Codex vs Claude Code?"

Kyle's response: Still working this out honestly! Codex is slower but comes back with polished results. Good for big projects where you can walk away for 20-40 minutes. Claude Code is faster and better for back-and-forth iterative work. I used Codex for a full security audit of my website and it was brilliant at that kind of autonomous task. Claude Code is better when I want to sit there and go back and forward with it in real time. They're complementary, not competitors, at least for how I work.

BUT we’ve just got a new Claude model and a new Codex model. So…I need to keep testing!

This question was discussed at [03:42] and [22:05] during the live session.

"Will companies like HubSpot etc go into decline in the coming years?"

Kyle's response: CRMs are probably a bit safer than simpler SaaS products because they're complex and mission-critical. Rolling your own CRM is tricky and it's the kind of thing where you might just want to pay someone. But competitors can now enter the market easily. Nobody loves Salesforce. Every customer I've ever spoken to tolerates it because switching is painful.

For the lower end of the market, SMEs and entrepreneurs, they'll absolutely get pulled off to newer, lighter, cheaper alternatives. Enterprise clients are buying safety and risk mitigation, so they'll stick around longer. But the threat is real.

Justine Moore’s tweet is relevant here. It’ll take a while before enterprise shifts:

— (@)

This question was discussed at [00:47:37] during the live session.

Want the full unfiltered discussion? Join me tomorrow for the daily AI news live stream where we dig into the stories and you can ask questions directly.

Streaming on YouTube (with full 4k screen share) and TikTok (follow and turn on notifications for Live Notification).

Audio Podcast on iTunes and Spotify.