AI with Kyle
Posts
AI with Kyle Daily Update 144

AI with Kyle Daily Update 144

Today in AI: Which AI should you use??

Kyle Balmer
February 20, 2026

What’s happening in the world of AI:

https://youtu.be/EGaMSmQydcA

Highlights

Which AI Should You Actually Use? Ethan Mollick's Guide to the Agentic Era

Full discussion at [00:36:24]

Ethan Mollick - professor at Wharton, author of "Co-Intelligence" (one of the very few AI books I recommend because it doesn't go out of date!), and one of the sharpest voices in this space - has just published his latest guide to which AI you should be using. He puts one of these out every few months and they're always worth reading. But this time he's done something different.

He's written eight of these guides since ChatGPT launched. This is the first one where he's changed the format entirely, because what it means to "use AI" has fundamentally shifted.

Until a few months ago, using AI meant talking to a chatbot. Back and forth, back and forth. That's no longer the whole picture. We're now in the agentic era, where you can assign an AI to a task and it goes off and does it, using tools as appropriate, without you sitting there for the entire process.

Subtle but cataclysmic change.

This is why things like OpenClaw went viral. This is why people are paying $100 or $200 a month for Claude. It's not because they're chatting more - it's because they have multiple agents running in the background doing work autonomously.

As I was doing my livestream on this topic I had five different agents running on a laptop doing different things for me.

Because of this shift, Ethan now says you need to think about three things when choosing an AI: Models, Apps, and Harnesses.

Models: The AI Brains

The big three model creators are OpenAI (ChatGPT), Anthropic (Claude), and Google (Gemini). The specific models right now are GPT 5.2/5.3, Claude Opus 4.6, and Gemini 3 Pro.

Update: this literally changed moments after writing this newsletter up. Gemini dropped 3.1

You don't need to remember every version number. What matters is knowing those three companies and using whatever their latest model is.

There's also Grok from Elon Musk's xAI. What they've done is genuinely impressive - coming from nowhere to create a frontier-level model in a very short time through brute-force spending. However, the model is not as good as the other three. There's no compelling reason to use it unless you specifically want access to Twitter data. Sorry!

The main takeaway: all three frontier models are remarkably close in capability. They're all very, very good. Like. Really good.

For 80% of use cases, it genuinely doesn't matter which one you pick. They can all handle the tasks most people throw at them.

Where it starts to matter is when you push the frontiers of what you're doing. If you've been doing the same task for two years, the models all handle it well now. It's when you start thinking "could I do this?" and push into edge cases that you'll discover one model is better than another for your specific needs.

But you need to pay. At minimum $20 a month. The free models are optimised for chat speed rather than accuracy - they're faster and more fun to talk to, but much less accurate. When someone posts an example of AI doing something stupid, it's almost always because they're using the free tier or haven't selected a smarter model. I get thousands of comments from people saying AI is rubbish, and when I ask if they're paying, the answer is always “no, lol, why would I pay for AI”. They're using a model from a year ago and wondering why it's not impressive…

Those $20 get you two things: the ability to choose which model you're using, and access to more powerful tools and features.

Personally, I use Claude for about 70% of my tasks and ChatGPT for the other 30%. I find Claude's writing better, I like that it pushes back when I'm heading in a questionable direction, and for business tasks it's excellent.

ChatGPT's unified memory is genuinely useful though - it remembers across conversations without me having to invoke it. When I was discussing learning Greek, it brought up my Mandarin studies unprompted and suggested I keep them on separate days to avoid mixing them up. That kind of holistic context is something Claude and Gemini don't do as smoothly yet.

Apps: What You Actually Use

The most common apps are the chatbot websites and phone apps — chatgpt.com, claude.ai, gemini.google.com. But each company bundles different features into their app, and these matter:

Gemini is doing the most interesting work here. They have Nano Banana (still the best AI image generator), Veo 3.1 for video, Guided Learning as a tutor mode, Deep Research, and they've just released Lyria, a music generation tool. Google's strategy is to spin off many product teams, see what works, and then consolidate. They have maybe six or seven different vibe coding tools alone. It's messy, but some of these are genuinely excellent.

ChatGPT is a hodgepodge. Deep Research, Shopping Research (surprisingly good and overlooked), Study and Learn mode, Agent mode, Canvas, an app store, image generation, Sora for video... there's a lot in there.

Most of it's fine without being exceptional. Rarely are they best in class. The image generator isn't as good as Nano Banana. Sora had a viral moment then disappeared — it's still limited in availability and I don't know anyone who still uses it regularly. The app store was hyped as the next billion-dollar opportunity; nobody cares about it. OpenAI goes for the viral moment, gets people to download the app, and then moves on.

Codex (more on this later) is an exception.

Claude has almost nothing. And that’s a good thing! There’s Chat, Deep Research, and you can access a “study mode” by creating a project. That's it. No images, no video creation, no app store. Just Claude. Very, very focused. It's a completely different experience.

Harnesses: Where Work Happens

This is the new and important category. A harness is the system that lets AI use tools, take actions, and complete multi-step tasks on its own.

Think of it like a horse harness - it takes the raw power of the animal and lets it actually pull a cart.

Previously you didn't need to think about this. The model was the product, the app was the website, the harness was minimal. It was all just ChatGPT or Claude. And that was sorta it. You typed, it responded.

Now the same model behaves completely differently depending on what harness it's in. Claude Opus 4.6 in a chat window is a very different experience from Claude Opus 4.6 inside Claude Code, autonomously writing and testing software for hours.

The key harnesses right now:

Claude Code, OpenAI Codex, and Google Antigravity are the big three coding harnesses. Claude Code and Codex are the most developed. Despite the names suggesting they're only for coders, they're not.

I use Claude Code for content creation, content management, writing, and tasks that have nothing to do with programming.

Ethan gives a brilliant example: he wanted to create a set of printed books containing all of GPT-1's internal weights and parameters. He asked Claude Code, and over about an hour it made 80 beautifully laid out volumes, designed covers for each one, built an elegant website with animations, hooked it up to Stripe for payment and Lulu for print on demand, tested the whole thing, and launched it. That's not “coding” - that's ideation, project management, business building, and marketing. He put 20 copies up at cost and sold out the same day.

— (@)

I did something similar with my own genome. For funsies. I gave my DNA sequence to Claude Code and asked it to code one pixel per nucleotide in four different colours and create art from my actual human genome. It produced these beautiful static-like images. Done in minutes. Could easily be a business: upload your DNA securely, generate a unique piece of art, print on demand, ship to your house.

Claude for Excel / Powerpoint- Ethan says this is potentially as impactful as Claude Code for those who work with spreadsheets for a living. Embarrassingly for Microsoft, it's better than their own Copilot in Excel. Google has some Sheets integrations but not as deep, and OpenAI doesn't have an equivalent product.

Claude Cowork - this deserves its own category. Released in January, built by Boris Cherney's team (who also built Claude Code) in about two week - largely built by Claude Code itself. It's essentially Claude Code for non-technical work.

It runs on your desktop and works directly with your local files and browser, but it's much more secure than something like OpenClaw because it runs inside a virtual machine with hard isolation baked in.

You describe an outcome — "organise these expense reports," "pull data from these PDFs into a spreadsheet," "draft a summary" — and Claude makes a plan, breaks it into subtasks, and executes them on your computer while you watch (or don't).

Neither OpenAI nor Google has an equivalent, at least this week (as Ethan notes, that could change. In fact it will change.). It's still a research preview and will eat through your usage limits fast, but it's a clear sign of where everything is heading: AI that doesn't just talk to you about your work but does your work for you.

Important note on cost: if something is expensive or burns through tokens quickly right now, don't panic. Intelligence is getting cheaper at a phenomenal rate. What costs a lot now will cost a hundredth of that next year!

NotebookLM - You’ve heard me talk about this one a lot. I LOVE NotebookLM.

It’s Google's free tool for making sense of large amounts of information. Feed it papers, YouTube videos, websites, books, or files and it builds an interactive knowledge base you can query.

It generates podcasts (where you can join the conversation), video explainers, mind maps, flashcards, quizzes, infographics, and slide decks. It also works in multiple languages — you can generate reports, podcasts, and slide decks.

I covered this in depth in the previous issue on cognitive friction and I remain a huge fan. If you're a student, researcher, or anyone who regularly needs to make sense of a pile of documents, it's brilliant.

OpenClaw — Ethan mentions it with a caveat: you almost definitely shouldn't use it. I agree! If you're not comfortable in the terminal, if you're not comfortable setting up virtual machines with security hardening, just wait.

OpenAI or Anthropic will release something more secure soon enough. That said, OpenClaw is a preview of where we're heading - a persistent, proactive AI assistant that runs 24/7 on your machine, checks on things, reaches out to you with updates, and can be instructed like an actual human assistant. That’s why people got so excited.

What To Do Now

OK nuts and bolt practical advice! Ethan's advice, which I fully agree with:

If you're just getting started: Pick one of the three systems (ChatGPT, Claude, or Gemini), pay the $20, select the advanced model, and start using it for real work. Upload a document you're actually working on. Give it a complex task. Have a back-and-forth conversation and push it. This alone will teach you more than any guide. It's a skill, not knowledge — you have to use it to learn it. That's why when I give workshops, I make sure people are actually using AI, not just hearing about it.

If you're already comfortable with chatbots: Try NotebookLM (free and easy), then explore Claude Code, Claude Cowork, or Codex with something you actually need done. Not as a demo - with real work. Don’t just play around with it but give it a task you need done.

The shift from chatbot to agent is the most important change since ChatGPT launched. “Agents” was the big buzzword in 2025 but it’s only now in 2026 and starting to see what it actually means. An AI that does things is fundamentally more useful than an AI that says things. And learning to use it that way is worth your time.

Read the full article here: A Guide to Which AI to Use in the Agentic Era

Sam Altman and Dario Amodei Refuse to Hold Hands

Discussed at [00:00:43]

— (@)

This story is ridiculous and I love it. At the India AI Impact Summit, all the big AI leaders were on stage with Indian Prime Minister Modi. Sundar Pichai, Demis Hassabis, Sam Altman, Dario Amodei - everybody was there. Modi got everyone to raise their hands, hold hands, the whole triumphant solidarity thing.

Except Sam Altman and Dario Amodei.

The heads of OpenAI and Anthropic, refused to hold each other's hands. Somebody (a comedic genius imo) had put them next to each other in the lineup. Modi was literally telling them to hold hands. They wouldn't do it. They ended up in this incredibly awkward position where they're both reaching up but crossing over to avoid touching each other. There's a brilliant frame where Sam looks to his left, realises who's standing there, and you can practically see the "oh shit" on his face.

It looks like petty high school drama. Between multibillionaires. But it reflects something much larger that's been building throughout 2026.

The Rivalry Behind this Silliness

The tension between these two companies has been escalating for weeks. At the Super Bowl, Anthropic ran an advert directly lampooning ChatGPT for adding advertising to its responses. Basically saying "We'd never do that to you" - a direct attack on OpenAI. ChatGPT then rolled out adverts in America the very next day, which was either brave or already committed…

Sam Altman pushed back with a genuinely good point: we are very different companies. ChatGPT has around 900 million monthly active users. Claude has maybe 20-30 million. Not even close.

OpenAI's argument is straightforward - they want intelligence freely available to billions of people, and that requires advertising revenue to support a free tier. Anthropic serves a much smaller, more elite user base who are willing to pay $100-200/month. One is B2C, the other is essentially B2B. Very different business models with very different dynamics.

Then came OpenClaw. Anthropic sent a cease and desist forcing "ClawdBot" to change its name (legally fair - you protect your trademarks). They then started to ban people using Claude subscriptions inside Open Claw, and started restricting accounts.

From a legal standpoint, understandable. From a PR standpoint, it looked like a billion-dollar company stamping on a one-person open source project. OpenAI swooped in, hired Peter Steinberger (the OpenClaw creator, presumably for hundreds of millions), and immediately said: use your ChatGPT subscription with Open Claw, officially supported.

The result: Anthropic looked like the bad guys, OpenAI looked like the good guys.

A significant shift given that Anthropic has historically been the "beloved" company in the AI community - the one focused on safety and ethics. That goodwill evaporated in about a week.

Anthropic are having one hell of a 2026. Their product is still fantastic. Claude is still excellent. But the PR damage has been significant. Anyone in their communications department right now... Godspeed!

Member Questions:

"What's the best AI for non-coders?"

It depends what you're doing. For writing and business tasks, probably Claude. For personal use where you want the AI to remember context across conversations, ChatGPT's unified memory is genuinely useful. If you already have a paid Google Workspace, you've got Gemini included - might as well use it. Try them all and go with whichever tone of voice and results you prefer. At this point it's a personal choice.

Discussed at [00:18:09]

"What do you get with the $200 Claude subscription vs $20?"

Usage, basically. On $200/month you can use Opus 4.6 most of the day. On $100/month you'll hit a limit eventually. On $20/month, you'll get maybe five or six turns with Opus before it cuts you off. That's very disruptive if you're midway through building something. The way I see it: $200/month is a lot less than hiring somebody. Where it really matters is when you're running multiple agents simultaneously - if you have 10 Claude Code agents doing tasks at once, you're burning through tokens 10 times faster than if it's just you chatting.

Discussed at [00:22:28]

"How do I set up Open Claw properly?"

Go to aiwithkyle.com/openclaw for guides. My recommendation: put it on a VPS (I use Hetzner, about €3.50/month for a cloud server or €12/month for dedicated). You do not need a Mac Mini. Most people are just moving text back and forward, and a $50 Raspberry Pi can handle that. Only buy a Mac Mini if you're running local models, doing video/image processing, or need the horsepower. And even a maxed-out Mac Mini struggles with the latest local models — it's not a future-proof solution.

Discussed at [00:44:36]

"How do I structure OpenClaw models for different tasks?"

You can absolutely use different models for different subagents. Social posts might go to Grok. Research tasks to ChatGPT. Coding to Claude. MiniMax M2.5 is a solid option if you're not using Claude Opus 4.6. Personally, I'm lazy and route everything through Opus 4.6 because I have the budget, but I wouldn't recommend that for most people.

Discussed at [00:44:36]

"Since GPT-4o was removed, can I still use it?"

Yes, through the API at developers.openai.com. However, it won't have the same memory. You'd need to set up a secondary memory system, which is doable but technical. Alternatively, use something like Launch Lemonade, which already has GPT-4o connected and lets you set up a knowledge base and system prompt without the technical work. A tip from the chat: you can download your entire OpenAI history (all your chats), then feed that into your new setup as prior context. Do that before deleting your account if you're leaving OpenAI.

Discussed at [00:23:12]

"What is an API?"

Imagine you're at a restaurant. You're the user sitting at a table. There's a kitchen full of chefs (the system/server). You can't walk into the kitchen and start yelling at the chefs - you need a waiter.

The API is the waiter.

You look at the menu (the documentation), tell the waiter what you want (the request), the waiter takes it to the kitchen (they don't need to know how the stove works), and the kitchen sends your food back through the waiter (the response). In the computer world, an API is the messenger that takes your request to a system and brings back the answer.

Discussed at [01:13:22]

"Can NotebookLM output in other languages?"

Yes. Reports, podcasts, slide decks, audio - all available in multiple languages including Spanish, Finnish, Slovenian, Filipino languages, and many more. You can also feed in sources in a different language and it'll work with those. Very cool and something I hadn't thought to check before someone asked in the live!

Discussed at [01:12:04]

Streaming on YouTube (with full 4k screen share) and TikTok (follow and turn on notifications for Live Notification).

Audio Podcast on iTunes and Spotify.