AI with Kyle
Posts
AI with Kyle Daily Update 110

AI with Kyle Daily Update 110

Today in AI: ChatGPT 5.2 + Marvel comes to Sora

Kyle Balmer
December 15, 2025

The free 5 Day AI Readiness Challenge is now open

The skinny on what's happening in AI - straight from the previous live session:

https://youtu.be/pnpTVu2xI5U

Highlights

🚀 GPT 5.2 Crushes Benchmarks - Beats Or Ties Humans On 71% of Work Tasks

Discussed at [00:00:00]

OpenAI's code red response delivers. A year ago this performance cost $4,500 per task - now it's $11.

Kyle's take: GPT 5.2 is here and it's good for higher end tasks.

First test this morning? I finally identified a computer game from my childhood that no other model could help me find - Virus 2000 for those who are wondering! Silly example but it shows the jump. I always have a “holdover” task that previous models get stuck on that I test with new models. And 5.2 nailed this one out of the gate!

The benchmarks are pretty good too! They've had to extend the Arc AGI-2 scale because 5.2 broke it. It's scoring 53% where Gemini 3 was at 32%.

They had to change the y-axis a lot for this!

More interestingly on GDP-Val (tasks that contribute to GDP like spreadsheets, presentations, document creation), it beats or ties human experts on 71% of tasks. That should stop you in your tracks even if you're pro-AI. That’s a “oh shit” moment.

A year ago, OpenAI's O3-high scored 88% on Arc but cost $4,500 per task. Today's 5.2 scores 90.5% for $11.64 - that's 400 times cheaper and better. In a year. Yeah…

— (@)

This is the exponential growth we're seeing. When people say "AI can't do this or that," point them to these facts. And if AI genuinely can’t do something right now…just wait a year or so. This stuff is moving fast.

Source: Arc Prize verification, GDP-Val leaderboard

🎬 Disney Invests $1 Billion in OpenAI - Marvel Characters Coming to Sora

Discussed at [00:41:35]

Three-year deal lets users generate Disney content and a whole lot more.

— (@)

Kyle's take: This is an exciting drawing up of battle lines. Disney's investing $1 billion in OpenAI and allowing 200+ Disney, Marvel, Pixar and Star Wars characters in Sora.

Meanwhile Netflix is buying Warner Brothers (who own DC Comics) as an AI play to train on that IP.

So it's Disney/Marvel/OpenAI versus Netflix/DC/Warner Bros!

The Disney deal is fascinating - should OpenAI be paying Disney to use Spider-Man and Elsa, or should Disney pay OpenAI to be in Sora? It’s all quite complex. The $1bn is actually the least interesting part of this deal.

They settled on Disney paying OpenAI $1 billion plus equity and warrants (ie. the ability to buy more equity in OpenAI later - which could be a VERY good move for Disney), while OpenAI provides a reskinned Sora (for Disney+ short form usage) and becomes Disney's AI provider.

A fascinating part of this deal is user-generated Disney videos will stream on Disney Plus. Both Disney and Netflix are launching TikTok-style vertical video feeds. They want in on the short form market (which drags attention away from their long form content) and seem to be using AI to allow their customers to create content with IP. This is the big play here I think.

Also notice how this doesn't retroactively approve OpenAI's training on Disney IP without permission - which they 100% did! That would open floodgates for every IP holder to come a knocking on OpenAI’s doors asking for deal. So training seems to have been explicitly not mentioned in the deal, at least not publicaly.

See! Super interesting deal. I’m looking forward to covering it more.

Strange timing though, just as Sora development's been paused for 8 weeks during code red!

Source: CNBC coverage, Netflix vertical feed announcement

🖱️ Cursor Adds Visual Editor - Click Elements to Edit Without Code

Discussed at [00:15:55]

Cursor catching up to Lovable's ease of use while maintaining power.

— (@)

Kyle's take: Cursor just added a potential game-changer - you can now visually see your codebase running, click on elements, and directly talk to the LLM about them.

Previously Cursor had a steeper learning curve than Lovable, Bolt etc. because it looked like traditional development environment while Lovable looked like ChatGPT. Cursor is a little intimidating.

Now you can click a button and say "I want more white space here" or "change this font" - they've added a visual editor directly into Cursor. Still more learning curve than Lovable but they're closing the gap.

Source: Cursor announcement

Member Questions:

"Do hallucinations still happen with all these improvements?"

Kyle's response: Hallucinations still happen but they're getting better rapidly. OpenAI are reporting a 40% reduction in hallucinations from 5.1 to 5.2. That obviously depends on how they are defining hallucinations so as always be on your guard.

Here's the thing - everything an LLM generates is technically a hallucination because that's how they work. They don't have a list of true and false facts - they're creating as tokens tumble out. Things that are “true” just have stronger connections in its training than things that are “false” - ie. the connections between “Paris” and “capital of France” are stronger than those between “Paris” and “capital of Chad”.

Sometimes it's wrong and that's when we call it hallucination because we want those imaginations to be correct - we want a “fact”. But LLMs don’t have “facts” in that way - they’ve not looking things up like Google might.

Once we give LLMs tools like internet search and agent workflows where they check each other's work, we reduce error rates even more. And all of these tricks and strategies are increasingly being built into the apps so we need to worry less.

"Which development platform for vibe coding?"

Kyle's response: For pure vibe coding when I don't want to see code, I use Lovable. When building a project and want more control, I use Cursor.

Start with Lovable to get excited about building - it handles your backend automatically, removes friction. Once you feel constricted, move to Cursor.

I personally use Claude Code inside Cursor as the agent. But I can only do that because I’m on the Max 20x plan which costs around $200/month. Without this I’d run into usage limits very quickly.

Don't dive into learning code first - it's dull. Get hooked on building first. It’s like learning to drive - you kind of need to just get in a car and move it around to “get” how to drive. Reading instruction manuals and watching videos about driving won’t help. Build stuff that's useful, show your friends, get excited, then learn fundamentals.

"Is vibe coding secure for public release?"

Kyle's response: Most security lives on the backend with something like Supabase. Lovable and Cursor have built-in security scanners that catch egregious problems like "you've published your API key doofus."

For internal projects, don't worry too much initially. Start here for that very reason.

If you are building an app you will sell access to then you need to get more serious. So you move onto this later.

If you're building something like a medical or legal apps where security's crucial, hire a security consultant before launch. Hell in this case you’ll probably build a proof of concept yourself then use funding to get it rebuilt from scratch.

In any case don't let security concerns stop you from starting. No use catastrophising when you haven’t even built yet - deal with problems as they come up and only when they are real. The problems of success can wait until you have success!

Kyle's Community Launch: The 5 Day AI Readiness Challenge is now open. Come to https://community.aiwithkyle.com/c/challenge/ to start.

Want the full unfiltered discussion? Join me tomorrow for the daily AI news live stream where we dig into the stories and you can ask questions directly.

Streaming on YouTube (with screen share) and TikTok (follow and turn on notifications for Live Notification).

Audio Podcast on iTunes and Spotify.