AI with Kyle
Posts
AI with Kyle Daily Update 179

AI with Kyle Daily Update 179

Today in AI: Local AI

Kyle Balmer
April 29, 2026

In partnership with

https://youtu.be/rUKDzvlHvFI

The best prompt engineers aren't typing. They're talking.

Power users figured this out early: speaking a prompt gives you 10x more context in half the time. You include the edge cases, the examples, the tone you want — because talking is fast enough that you don't skip them.

Wispr Flow captures everything you say and turns it into clean, structured text for any AI tool. Speak messy. Get polished input. Paste into ChatGPT, Claude, Cursor, or wherever you work.

89% of messages sent with zero edits. 4x faster than typing. Works system-wide on Mac, Windows, and iPhone.

Start flowing free

You can run AI on your laptop. Right now. Today. Without the terminal, without writing code, without being technical. Five or ten minutes to get it set up.

I know it sounds intimidating. It isn't. Local LLMs (Large Language Models) are now a download-and-run job. Click an app, pick a model, type a prompt. And then you basically have a local ChatGPT, except it runs on your hardware, costs zero per token, and your data never leaves your machine.

This guide is the gentle introduction for those who think this sort of thing is only for techies. It’s not. So… no jargon, no scary command lines, no sending you out to buy a $10,000 computer. If you have a laptop made in the last three years with at least 8GB of RAM, you can do this today.

What Even Is A Local LLM?

First up what’s an LLM? It’s a Large Language Model. GPT5.5, Opus 4.7 and Gemini 3.1 are LLMs. ChatGPT, Claude and Gemini are the chatbot applications built on top of the LLMs.

When you use ChatGPT or Claude, you type a prompt, it leaves your computers, travels to a data centre on the other side of the world, the model thinks about it on someone else's hardware (run by the AI company), and the answer comes back. That's cloud AI. It’s like renting.

Why You Should Care (Four Reasons)

Why bother with local AI? Sounds like a lot of faff right? Well…here are a couple of compelling reasons.

Privacy. Your prompts and data never leave your machine. That client document you'd never paste into ChatGPT? Paste it into a local model. The medical letter, the legal contract, the salary spreadsheet, the personal journal entry. All fine. Nothing logged, nothing trained on, nothing leaked.

If you are running a model for your business this becomes VERY important.

Cost. £0 per token. The model is free to download. After that, the only cost is electricity. Run it 1,000 times a day, run it 10,000 times. Same bill.

Offline. On a plane, on a train, in a coffee shop with rubbish wifi, in a power cut with a charged laptop. Works. Cloud AI doesn't - you’re donezo.

Learning. If none of the above float your boat honestly it’s worth doing this just to build your confidence. Local AI seems like one of those complicated advanced topics. It’s worth deploying a local model just to see how easy it is.

Can Your Device Actually Run This?

Probably yes. The honest answer depends on your RAM. And some other factors but this is a good rule of thumb to start.

If you have an Apple Silicon Mac (M1 or later), you're in better shape than equivalent Windows machines because the unified memory architecture is brilliant for local models. An M2 MacBook Air with 16GB RAM runs an 8-billion-parameter model very comfortably.

"I don't have the right hardware" is the most common excuse for not trying this. Almost always it's wrong. If you bought your laptop in the last three years and you didn't go for the cheapest possible spec, you can run something useful.

Pick Your Tool: LM Studio

There are about a dozen apps for running local LLMs. I’d recommend LM Studio to start. Free, polished, works on Mac and Windows. Nice and easy to use.

Why LM Studio over the alternatives:

Free, no signup, no credit card
Click an app, browse models, click download, run
Works exactly like the ChatGPT interface you already know
No terminal, no command line, no "config.yaml"
Built-in model recommendations based on your hardware

The other tools are fine but they're for people who already know what they're doing. LM Studio is for you, today.

Literally here is your playbook:

Download from lmstudio.ai.

Pick Your First Model

Hugging Face (a repository of downloadable models) has 2M+ models. Yikes.

Which to use??

This is a moving target. Whatever I tell you now will be out of date next month. So we’ll just cover the basics of how to choose one.

LM Studio has a browser with all the models. And it tells you which models will work on your current system. Choose something and download it. Don’t overcomplicate this.

If really in doubt find a Google model. Right now (April 2026) that’s Gemma 4. Gemma is the micro Gemini and works well on local devices. Even iPhones!

Don't try to download the biggest model that just-about fits. Technically bigger models will work but they’ll be slow as hell.

The model that fits comfortably and runs fast will get actually get used. The one that takes 90 seconds per response won’t. Speed matters more than benchmark scores if you actually want to build this into your day to day.

What Local LLMs Are Not (Yet)

Honesty section. Local LLMs are not a Claude or ChatGPT replacement, and pretending otherwise will set you up for disappointment.

A 7B local model ain’t going to write Python as well as Claude Code. It won't do agent tasks like Codex. It won't stay coherent over a 100,000-token conversation. It will hallucinate more on factual questions. It will sometimes lose the thread mid-paragraph.

It will not be as good. Simple as.

But…sometimes your AI doesn’t need to be bleeding edge. We don’t need to bring an ICMB to a rock fight.

What it WILL do: chat, summarise, draft, translate, brainstorm, restructure, classify, extract, format. About 80% of what most people use AI for. All while being free and private. It’s a tradeoff.

I personally treat my LLM as a workhorse for routine tasks and my cloud subscriptions as the frontier model I reach for when the task actually needs it. Different tools for different jobs.

Your next step? If you've never run a local model: Download LM Studio. Pick a model that fits your hardware. Run one prompt. That's it. That’s the homework!

Kyle