Running LLMs on a Raspberry Pi — Step-by-Step Tutorial (2026)
Can a $60 computer run a large language model?
Yes. And it works better than you'd expect.
In 2026, you don't need a cloud GPU cluster to run Llama 2, Phi-3, or TinyLlama. A Raspberry Pi 5 with 8GB of RAM can handle small LLMs right at the edge — no internet required.
In this tutorial, I'll show you exactly how.
What You'll Need
Hardware:
- Raspberry Pi 5 (8GB) — or Pi 4 (4GB minimum)
- 64GB microSD card (Class 10)
- 5V/5A power supply
- Active cooler or fan
Total cost: $60–120 depending on options
Which LLMs Actually Run on a Pi?
TinyLlama 1.1B — 2-3GB RAM — Good quality — Best for beginners
Phi-3 mini (4-bit) — 3-4GB RAM — Very good — Reasoning and logic
Llama 2 7B (4-bit) — 5-6GB RAM — Great — Text generation
Recommendation for first-timers: Start with TinyLlama 1.1B — it's the easiest to run.
Step 1: Set Up Your Raspberry Pi
If you already have Raspberry Pi OS installed, skip to Step 2.
Fresh setup:
1. Download Raspberry Pi Imager from raspberrypi.com
2. Choose Raspberry Pi OS Lite (64-bit)
3. Flash to microSD card
4. Enable SSH (create empty 'ssh' file in boot partition)
5. Boot and connect via SSH
Step 2: Install Dependencies
Run these commands one by one:
sudo apt update && sudo apt upgrade -y
sudo apt install git cmake build-essential -y
sudo apt install python3-pip python3-venv -y
Install Ollama (easiest method):
curl -fsSL https://ollama.com/install.sh | sh
Step 3: Download and Run Your First LLM
Start Ollama service:
ollama serve
Open a second terminal and run:
ollama run tinyllama
First run will download the model — takes 5-15 minutes depending on your internet.
Expected speed: 2-5 tokens per second on Pi 5
Step 4: Run Better Models (Optional)
After TinyLlama works, try Phi-3:
ollama run phi3:mini
Or Llama 2 (7B) if you have 8GB RAM:
ollama run llama2:7b
Step 5: Create a Simple Chat Script (Python)
Save this as chat.py:
import subprocess
def ask_llm(prompt, model="tinyllama"):
result = subprocess.run(
["ollama", "run", model, prompt],
capture_output=True,
text=True
)
return result.stdout
response = ask_llm("Explain edge computing in one sentence")
print(response)
Run it:
python3 chat.py
Performance Benchmarks (Real Tests)
On Raspberry Pi 5 (8GB) with active cooler:
TinyLlama 1.1B: 4-6 tokens/sec — 1-2 sec first response
Phi-3 mini (4-bit): 3-4 tokens/sec — 2-3 sec first response
Llama 2 7B (4-bit): 1-2 tokens/sec — 5-8 sec first response
On Raspberry Pi 4 (4GB): TinyLlama only (2-3 tokens/sec)
Troubleshooting
Problem: ollama: command not found
Fix: Reinstall or add ~/.local/bin to PATH
Problem: Model downloads forever
Fix: Check internet connection (WiFi on Pi can be slow)
Problem: Pi freezes or throttles
Fix: Add active cooling — thermal throttling kills performance
Problem: Out of memory error
Fix: Use smaller model or 4-bit quantized version
The #1 mistake: Using a Pi 4 with 4GB and trying to run Llama 2 7B. Don't do it.
What Can You Actually Do With an LLM on a Pi?
Local chat assistant — Yes (slow but usable)
Text summarization — Yes
Code generation — Yes (short snippets)
Real-time translation — Borderline (2-3 second delay)
Long document analysis — No (memory limit)
Best use: Offline assistant for home automation, note-taking, or learning how LLMs work.
The Hybrid Setup (My Favorite)
Run the Pi as an edge LLM server:
1. Keep Ollama running on the Pi
2. Call it from any device on your local network
3. No cloud. No API fees. No privacy concerns.
API endpoint example:
curl http://raspberrypi.local:11434/api/generate -d '{
"model": "tinyllama",
"prompt": "What is 42?"
}'
Now every device in your house has private LLM access.
Key Takeaway
Yes — you can run LLMs on a Raspberry Pi.
It won't match ChatGPT speed. But for $60, you get:
- Complete privacy (no data leaves your home)
- No monthly subscription
- Offline capability
- A fun weekend project that teaches real AI skills
Start with TinyLlama. Upgrade to Phi-3. Then build something useful.
What's Next?
In our next post: TinyML on a Microcontroller — AI for $5
New to Edge AI? Read my beginner's guide: [Edge AI vs Cloud AI: Which One Wins in 2026?]
Comments
Post a Comment