Running LLMs on a Raspberry Pi — Step-by-Step Tutorial (2026)

Can a $60 computer run a large language model?

Yes. And it works better than you'd expect.

In 2026, you don't need a cloud GPU cluster to run Llama 2, Phi-3, or TinyLlama. A Raspberry Pi 5 with 8GB of RAM can handle small LLMs right at the edge — no internet required.

In this tutorial, I'll show you exactly how.

What You'll Need

Hardware:

- Raspberry Pi 5 (8GB) — or Pi 4 (4GB minimum)

- 64GB microSD card (Class 10)

- 5V/5A power supply

- Active cooler or fan

Total cost: $60–120 depending on options

Which LLMs Actually Run on a Pi?

TinyLlama 1.1B — 2-3GB RAM — Good quality — Best for beginners

Phi-3 mini (4-bit) — 3-4GB RAM — Very good — Reasoning and logic

Llama 2 7B (4-bit) — 5-6GB RAM — Great — Text generation

Recommendation for first-timers: Start with TinyLlama 1.1B — it's the easiest to run.

Step 1: Set Up Your Raspberry Pi

If you already have Raspberry Pi OS installed, skip to Step 2.

Fresh setup:

1. Download Raspberry Pi Imager from raspberrypi.com

2. Choose Raspberry Pi OS Lite (64-bit)

3. Flash to microSD card

4. Enable SSH (create empty 'ssh' file in boot partition)

5. Boot and connect via SSH

Step 2: Install Dependencies

Run these commands one by one:

sudo apt update && sudo apt upgrade -y

sudo apt install git cmake build-essential -y

sudo apt install python3-pip python3-venv -y

Install Ollama (easiest method):

curl -fsSL https://ollama.com/install.sh | sh

Step 3: Download and Run Your First LLM

Start Ollama service:

ollama serve

Open a second terminal and run:

ollama run tinyllama

First run will download the model — takes 5-15 minutes depending on your internet.

Expected speed: 2-5 tokens per second on Pi 5

Step 4: Run Better Models (Optional)

After TinyLlama works, try Phi-3:

ollama run phi3:mini

Or Llama 2 (7B) if you have 8GB RAM:

ollama run llama2:7b

Step 5: Create a Simple Chat Script (Python)

Save this as chat.py:

import subprocess

def ask_llm(prompt, model="tinyllama"):

result = subprocess.run(

["ollama", "run", model, prompt],

capture_output=True,

text=True

)

return result.stdout

response = ask_llm("Explain edge computing in one sentence")

print(response)

Run it:

python3 chat.py

Performance Benchmarks (Real Tests)

On Raspberry Pi 5 (8GB) with active cooler:

TinyLlama 1.1B: 4-6 tokens/sec — 1-2 sec first response

Phi-3 mini (4-bit): 3-4 tokens/sec — 2-3 sec first response

Llama 2 7B (4-bit): 1-2 tokens/sec — 5-8 sec first response

On Raspberry Pi 4 (4GB): TinyLlama only (2-3 tokens/sec)

Troubleshooting

Problem: ollama: command not found

Fix: Reinstall or add ~/.local/bin to PATH

Problem: Model downloads forever

Fix: Check internet connection (WiFi on Pi can be slow)

Problem: Pi freezes or throttles

Fix: Add active cooling — thermal throttling kills performance

Problem: Out of memory error

Fix: Use smaller model or 4-bit quantized version

The #1 mistake: Using a Pi 4 with 4GB and trying to run Llama 2 7B. Don't do it.

What Can You Actually Do With an LLM on a Pi?

Local chat assistant — Yes (slow but usable)

Text summarization — Yes

Code generation — Yes (short snippets)

Real-time translation — Borderline (2-3 second delay)

Long document analysis — No (memory limit)

Best use: Offline assistant for home automation, note-taking, or learning how LLMs work.

The Hybrid Setup (My Favorite)

Run the Pi as an edge LLM server:

1. Keep Ollama running on the Pi

2. Call it from any device on your local network

3. No cloud. No API fees. No privacy concerns.

API endpoint example:

curl http://raspberrypi.local:11434/api/generate -d '{

"model": "tinyllama",

"prompt": "What is 42?"

Now every device in your house has private LLM access.

Key Takeaway

Yes — you can run LLMs on a Raspberry Pi.

It won't match ChatGPT speed. But for $60, you get:

- Complete privacy (no data leaves your home)

- No monthly subscription

- Offline capability

- A fun weekend project that teaches real AI skills

Start with TinyLlama. Upgrade to Phi-3. Then build something useful.

What's Next?

In our next post: TinyML on a Microcontroller — AI for $5

New to Edge AI? Read my beginner's guide: [Edge AI vs Cloud AI: Which One Wins in 2026?]

https://theaiedgetech.blogspot.com/2026/06/edge-ai-vs-cloud-ai-which-one-wins-in.html

The AI Edge

Search This Blog

Running LLMs on a Raspberry Pi — Step-by-Step Tutorial (2026)

Labels

Comments

Post a Comment

Popular posts from this blog

Edge AI vs. Cloud AI: Which One Wins in 2026?