• AI with Kyle
  • Posts
  • Saturday Sessions: It's getting a bit hairy

Saturday Sessions: It's getting a bit hairy

Germany Edition

Hallo!

I’ve been in Germany this week for Google I/O Connect Berlin. Thank you to the Google team for being gracious hosts - keep an eye out on the socials next week for vids from the event. I was very lucky to being able to interview team members from DeepMind, the Google Spark team and from AI Studio / Antigravity. Very cool stuff!

Whilst I’ve been in Germany we’ve had some rather worrying news.

Fable from Anthropic was pulled a few weeks ago on the request of the US Government.

We’ve now found out that the new ChatGPT model (GPT5.6) is being held back.

We are entering a period where the level of intelligence allowed to us is going to be restricted. And worse, services that we have access to may be suddenly taken from us.

That worries me more than any specific model being held back. It’s the general directionality of this that should concern us.

Learning how to use local models is more important than ever.

BUT many people think it is outside of their skill zone. It’s a hard technical task right?

No. It’s simple.

In fact we’re going to do it right now. This Saturday morning we’re going to get you set up. No excuses.

First up, let’s use your main computer. Desktop or laptop. Doesn’t matter. You can (and should set it up on all your devices eventually so start with whatever is at hand).

Go to https://lmstudio.ai/ and download LM Studio. It’s free.

As soon as you boot it you’ll get a screen like this:

LM Studio will basically have a look at your computer and decide a good started model for you. Here is happens to be Google’s Gemma 4 E4B. This isn’t a sponsored guide btw - it’s just that Google are the main (Western) lab releasing open source models!

Go ahead and download. This one I’m being shown is ~7GB.

Whilst that’s downloading go grab the mobile app too.

On iPhone it’s Locally:

On Android https://lmsa.app/ looks solid but I haven’t used it so cannot confirm!

ALSO download a local model onto your phone :

Here I’m downloading Gemma 4 E2B - it’s around ~4GB.

Whilst both of these models are downloading now is a good time to talk about the model sizes. This stuff isn’t vital so skip ahead to the bold section if you don’t care! I will not be (that) upset.

Notice I’ve just downloaded Gemma 4 E4B on my laptop (or whatever LM Studio suggested) and Gemma E2B on my iPhone.

What gives? What’s the 4 and the 2 mean specifically?

Time for a chart:

OK yeah.. I see the problem. No wonder people think this stuff is complex! Look at that mess. Let’s decipher it a bit. Again, skip this if you want.

The two blue header models here are “edge models”. Basically this means they work on “edge” devices - named so because they sit at the edge of a network. In normal terms basically it means your phone or laptop.

Notice that E2B has 2.3B active parameters. The B is Billion. 2.3 billion parameters is the model size. That’s the one I’ve just installed on my iPhone.

The E4B has 4.5 billion parameters. It’s bigger! That’s the one I just installed on my laptop.

Parameters here are basically numbers. When we download a model we are (very crudely!) downloading a .csv file with billions and billions of numbers in it. Think of those 4GB and 6GB files we just download as giant lists of numbers (parameters).

In general the more parameters the more intelligent the model. But the more parameters the more computing power you need to process it all.

These smaller models work on our phone and laptop because they are smaller. Much smaller than a model like Claude (if we could download it!).

But increasingly labs like Google Deepmind are doing a lot with a little. Small models that pack one hell of a punch. Whilst being fully controlled by you, on your devices, with 100% privacy. So, you know, a government can’t just pluck them away from you…

Those purple header models? Those are beefier. They are not going to work on today’s phones and laptops. They will however run on a chunkier rig - a desktop computer or server. The more oomph you have the larger the model you can load in. I have Gemma 4 12B Q4 / MLX on my MacMini personally.

OK models downloaded? Let’s continue!

LM Studio’s interface is a little confusing initially but the main chatbox should be familiar enough.

Use the dropdown at the top to select Gemma then type a message:

The model will respond. Just like with ChatGPT. Or Claude. Or Gemini. So?

Now turn off your WiFi and keep chatting.

It will still respond.

Hell. Hop on an airplane and chat. It’ll keep working mid-air.

Your chat is not zooming up to some cloud server sitting in California and then back to you.

It’s not being used for training.

It is not using electricity to run in a data centre.

It’s all happening right there on your device. You’ve captured a genie. A powerful one.

What’s more let’s say the US Government decide: “ok you aren’t allowed Gemma 4 anymore we’re removing it”.

Tough shit! It’s on our computer. That cannot be reverted. Ya-boo sucks for you. 😘 

Hell even if GOOGLE decide we can’t use it anymore: tough! There are 200M+ copies of Gemma 4 floating around out there. The whole model. Including the one on your computer. Once it’s out it’s out.

And yes … I had to check that figure. 200M is mad impressive. Suggests that people and businesses are very much moving in this direction…

OK solid we’re up and running on your laptop or desktop, let’s switch to the phone.

Once you’ve downloaded the Gemma 4 E2B model you’ll be able to run it directly in the Locally app. FYI you can also use Google’s official Edge Gallery app for this. But using Locally gives us an advantage here.

We can run a local model directly on our phone yes…very cool. But we can also run the model on our laptop (or computer) from our phone. There’s a feature in LM Studio called LM Link that basically connects your devices and let’s you run locally remotely … yes that’s confusing language!

This means you can have a set up bigger, stronger models on your laptop and home computer and then access them (when online) on your phone.

Here you can see my phone connected to both my laptop (here in Germany at the Einstein Kaffee I’m currently) and my MacMini (back in Cyprus):

2B on phone, 4B on laptop, 12B on desktop. All from my phone.

This gives us maximum flexibility. When at home in Cyprus or on the move with internet I can use the most powerful model via the MacMini.

When flying or otherwise without internet I can use a smaller model directly on my MacBook or at a pinch directly on my iPhone. Very cool.

And the kicker! The models get smaller and more efficient. Generally (and this is from the Deepmind team I chatted to) we’re talking about a 7-8 month gap.

What I can run locally on a decent laptop now is around the same as the state of the art model 8 months ago.

SO…technically…8 months from now we’ll be running Opus 4.8, ChatGPT 5.5, Gemini 3.5 level models locally on relatively cheap consumer level hardware. Or, heck, even if it’s a year behind it doesn’t really matter. Having that much firepower locally means we can do pretty much ALL our normal tasks without increasingly expensive cloud subscriptions.

This is why it’s useful to get your head wrapped around it now.

A quick recap:

  1. Download LM Studio on your computer, laptop and/or phone.

  2. Download the recommended model on each device.

  3. Use LM Link to chain everything together.

And enjoy your free, always accessible AI.

To the Task,

Kyle