AI with Kyle Daily Update 059

Today in AI: OpenAI's GDP-Val Tests AI on Real Jobs + ChatGPT Pulse Wakes You With AI Briefings

Streaming news and updates live daily at ~10AM UK time + Vibe Coding an App live daily at ~11AM UK time | Both on YouTube (and all other platforms).
Catch me here: Subscribe to ‘AI with Kyle’ Channel

The skinny on what's happening in AI - straight from the previous live session:

Highlights

💼 OpenAI's GDP-Val: Testing AI on 44 Real Jobs

OpenAI launched GDPval (terrible name as per), a new evaluation measuring AI on "real world economically valuable tasks" across 44 occupations. Unlike academic tests, it checks if AI can actually do nursing care plans, legal briefs, engineering blueprints. Includes software developers, lawyers, journalists, but notably no doctors.

Kyle's take: Finally! Current evals are flawed - models train for the test like students cramming for exams. OpenAI released another paper a month ago that found models make stuff up rather than say "I don't know" because that scores zero on tests.

GDPval focuses on actual job tasks, but there's a catch: they're only measuring GDP-contributing work. Childcare? Not economically valuable according to GDP, so AI won't improve at it.

Also of note is that they picked these 44 specific occupations - if you're in one, maybe worry! These will be the tasks AI models will be tuned towards becoming better at. The full list is in the blog article below.

At least we're moving from "can AI pass the Math Olympiad" to "can AI write a real nursing care plan”!

⏰ ChatGPT Pulse: AI That Works While You Sleep

ChatGPT Pulse generates 5-10 personalized briefings while you sleep - news, calendar, emails, custom reports. Rolling out to $200/month Pro users first. Designed to make you check ChatGPT first thing in the morning, like social media.

Kyle's take: They want ChatGPT to be as sticky as Duolingo with its passive-aggressive owl. Connect your Gmail and calendar (privacy nightmare but cool) and it'll say "Hey, it's your fiancé's birthday, you went to this restaurant 2 years ago, want me to book it for old times sake?"

The more data you feed it, the more powerful it gets. This also seems like the perfect backdoor for ChatGPT to add ads…

Source: The Verge

📊 90% of Developers Using AI (Google DORA Report)

Google's DORA report surveyed 5,000 tech professionals globally: 71% use AI for writing new code, 66% for modifying code, 80% say it enhanced productivity. Not the 90% Anthropic's CEO predicted, but way more than skeptics claim.

Kyle's take: Remember when Dario Amodei said 90% of code would be AI-written and everyone laughed? Well, 71% of developers are already using it for new code. This tech is ~3 years old!

Even if the magnitude is wrong, the trajectory is right. Betting against AI has been a losing game - everyone who said "it'll never do X" watches it do X six months later. And are noticeably quiet when it does. Oh and that MIT study claiming 95% AI projects fail? Total garbage methodology - only looked at top-down initiatives, ignored grassroots adoption. This DORA report shows that grassroots adoption (at least in tech professionals) is taking hold fast.

🎨 Twin Peaks Fans Flood Reddit with AI Slop in Protest

When Twin Peaks subreddit moderators allowed AI art, fans protested by flooding it with intentionally terrible AI-generated Agent Cooper images and ChatGPT scripts. Mods broke after 2 days, reversed the policy.


A damn fine cup of coffee

Kyle's take: This is the best anti-AI protest I've ever seen - using AI to fight AI! It’s an ad absurdum argument that I’m sure David Lynch would have enjoyed. They weaponised AI's efficiency at creating crap. Absolute chaos, I fully support it.

Source: 404 Media

Member Question: "What small AI companies to watch?"

Kyle's response: Foundation models? Maybe 10 companies globally can afford the hundreds of millions needed. No room for smaller companies really.

But the application layer is exciting - 11Labs (amazing voice tech), Gamma (AI slides that don't look like trash because they have taste), and vibe coding tools like Lovable (fastest company ever to $100M ARR). Lovable, Bolt, Replit, v0 - these are the interesting plays. Above the foundation layer is where innovation happens because you don't need $500M to compete.

Want the full unfiltered discussion? Join me tomorrow for the daily AI news live stream where we dig into the stories and you can ask questions directly.

Streaming on YouTube (with screen share) and TikTok (follow and turn on notifications for Live Notification).

Audio Podcast on iTunes and Spotify.