How to Run Local AI (Ollama) on a Mini PC — Setup Guide | Mini PC Lab
By Mini PC Lab Team · March 1, 2026 · Updated March 27, 2026
This article contains affiliate links. If you purchase through our links, we may earn a commission at no extra cost to you. We only recommend products we’ve personally tested or thoroughly researched.

Ollama makes running large language models locally as simple as ollama run llama3. On a mini PC with an AMD Radeon 780M, you can run 7B parameter models at 15–20 tokens/second with GPU acceleration — fast enough for a useful private AI assistant. This guide covers installation, GPU setup for AMD mini PCs, model selection, and adding a chat UI.
Before You Start
Requirements:
- Mini PC running Linux (Debian 12 or Ubuntu 24.04)
- 8GB+ RAM minimum; 16GB+ recommended for 7B models without swap
- For GPU acceleration: AMD Radeon 780M (Beelink SER9 PRO+, GMKtec K11, Minisforum UM790 Pro) or Intel Arc iGPU
- Estimated time: 20–30 minutes for basic setup; 30–60 minutes for GPU acceleration
Performance expectations by hardware:
| Hardware | Model | Mode | Speed |
|---|---|---|---|
| Intel N150 (EQ14) | Llama 3.2 3B Q4 | CPU only | ~8–12 tok/sec |
| Ryzen 7 H 255 (SER9 PRO+) | Llama 3.1 7B Q4 | CPU only | ~5–8 tok/sec |
| Ryzen 7 H 255 (SER9 PRO+) | Llama 3.1 7B Q4 | ROCm GPU | ~15–20 tok/sec |
| Ryzen 9 8945HS (K11) | Llama 3.1 8B Q4 | ROCm GPU | ~18–22 tok/sec |
| RTX 4060 via OculLink (K11) | Llama 3.1 8B Q4 | CUDA GPU | ~50–70 tok/sec |
CPU-only inference is usable for 3B models. For 7B+ models, GPU acceleration makes the difference between “useful” and “barely tolerable.”
Hardware recommendation: The Beelink SER9 PRO+ with Ryzen 7 H 255 is the best value for local AI on a mini PC. See our best mini PC for local LLM guide for full comparisons.
Step 1: Install Ollama
The official installer handles everything:
curl -fsSL https://ollama.com/install.sh | sh
This installs Ollama as a systemd service. Verify it’s running:
systemctl status ollama
# Should show: Active: active (running)
# Test with a small model
ollama run llama3.2:3b
# Downloads ~2GB, then starts an interactive chat session
# Type /bye to exit
Ollama listens on http://localhost:11434 by default.
Step 2: Enable GPU Acceleration on AMD Radeon 780M (ROCm)
This is the section most guides skip or get wrong. ROCm on Radeon 780M requires specific configuration.
2a. Install ROCm
# Add the ROCm repository
wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.1.60301-1_all.deb
# (Replace URL with current version from repo.radeon.com)
sudo apt install ./amdgpu-install_6.3.1.60301-1_all.deb
# Install ROCm
sudo amdgpu-install --usecase=rocm --no-dkms
For Debian 12 specifically, use the Debian package:
wget https://repo.radeon.com/amdgpu-install/6.3.1/debian/bookworm/amdgpu-install_6.3.1.60301-1_all.deb
sudo apt install ./amdgpu-install_6.3.1.60301-1_all.deb
sudo amdgpu-install --usecase=rocm --no-dkms
2b. Add Your User to Required Groups
sudo usermod -aG render,video $USER
# Log out and back in for group changes to take effect
2c. Verify ROCm Detection
# Check if the GPU is detected
rocm-smi
# Should show your Radeon 780M with temperature and utilization
# More detailed check
/opt/rocm/bin/rocminfo | grep -i "name"
# Should show gfx1103 (RDNA 3 iGPU)
2d. Set the HSA Override for Integrated GPU
The Radeon 780M is an integrated GPU. ROCm requires an environment variable to enable iGPU support:
# Add to /etc/environment for system-wide persistence
echo 'HSA_OVERRIDE_GFX_VERSION=11.0.0' | sudo tee -a /etc/environment
# Also set for the Ollama service
sudo systemctl edit ollama
In the systemd override editor that opens, add:
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"
Save and close. Then restart Ollama:
sudo systemctl daemon-reload
sudo systemctl restart ollama
2e. Verify GPU Inference
# Watch GPU utilization while Ollama runs
watch -n 1 rocm-smi
# In another terminal, run a model
ollama run llama3.1:7b "Explain containerization in one paragraph"
If GPU utilization spikes to 80–100% during inference, ROCm is working. If it stays near 0%, the HSA override isn’t applied correctly — verify the environment variable and restart.
Step 3: Choose the Right Models
Model selection matters significantly for mini PC hardware. The key constraint is VRAM — the Radeon 780M shares memory with the system and typically gets 512MB–2GB allocated in BIOS.
Wait — that’s not enough for a 7B model!
The 780M can use more than its VRAM allocation for GPU inference via unified memory. With 16–32GB system RAM (DDR5), the GPU can access several gigabytes for model weights. This is how 7B Q4 models run on the 780M — they use ~4.5GB of the unified memory pool.
Recommended models by RAM:
| System RAM | Recommended Model | VRAM Used | Speed (ROCm) |
|---|---|---|---|
| 16GB | Llama 3.2 3B Q4 | ~2.2GB | ~25 tok/sec |
| 16GB | Llama 3.1 7B Q4 | ~4.5GB | ~15–20 tok/sec |
| 32GB | Llama 3.1 8B Q4 | ~5.0GB | ~18–22 tok/sec |
| 32GB | Llama 3.1 8B Q8 | ~8.5GB | ~12–15 tok/sec |
| 32GB | Mistral 7B Q4 | ~4.5GB | ~15–20 tok/sec |
| 32GB | Gemma 3 9B Q4 | ~6.0GB | ~12–15 tok/sec |
Pull models:
# Fast and capable 7B model
ollama pull llama3.1:8b
# Smaller, faster for quick queries
ollama pull llama3.2:3b
# Code assistance
ollama pull codellama:7b
# Good instruction following
ollama pull mistral:7b
List downloaded models:
ollama list
Step 4: Install Open WebUI (ChatGPT-like Interface)
The Ollama CLI is fine for testing, but for regular use you want a web UI. Open WebUI is the best option — it’s a full-featured chat interface that connects to Ollama.
mkdir -p ~/services/open-webui
cd ~/services/open-webui
# docker-compose.yml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
volumes:
- ./data:/app/backend/data
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
extra_hosts:
- "host.docker.internal:host-gateway"
restart: unless-stopped
docker compose up -d
Access Open WebUI at http://[YOUR-IP]:3000. Create an account (local — no external service). You’ll see all your Ollama models listed. Start a conversation exactly like ChatGPT.
Step 5: Expose Ollama to Your Local Network
By default, Ollama only listens on localhost. To use it from other devices on your network (phone, laptop, or other containers):
sudo systemctl edit ollama
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl daemon-reload
sudo systemctl restart ollama
Now Ollama’s API is accessible at http://[YOUR-IP]:11434 from any device on your network. This allows:
- Open WebUI running on a different machine
- Direct API calls from scripts on other devices
- Integration with Home Assistant’s
ollamaintegration
Step 6: API Usage
Ollama exposes an OpenAI-compatible API. Use it from Python scripts, Home Assistant, n8n, or any application that supports the OpenAI API format:
# Python example using the openai library
from openai import OpenAI
client = OpenAI(
base_url="http://192.168.1.50:11434/v1",
api_key="ollama", # required but not validated
)
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[
{"role": "user", "content": "What are the best practices for Docker networking?"}
]
)
print(response.choices[0].message.content)
Troubleshooting
Ollama runs on CPU despite ROCm being installed
Verify the HSA override is set: echo $HSA_OVERRIDE_GFX_VERSION should return 11.0.0. Check the Ollama service environment: sudo systemctl show ollama | grep Environment. If missing, re-edit the systemd override.
ROCm install fails with “package not found”
The ROCm version for Debian 12 vs. Ubuntu 22.04/24.04 differs. Verify you’re using the correct package URL from repo.radeon.com for your exact OS version.
Model runs out of memory
Reduce the model quantization: llama3.1:7b (Q4) uses less memory than llama3.1:7b:q8. Alternatively, set OLLAMA_MAX_LOADED_MODELS=1 to prevent multiple models loading simultaneously.
Open WebUI can’t connect to Ollama
Verify Ollama is listening on 0.0.0.0 (not just localhost) and that the OLLAMA_HOST environment variable is set. Check with: curl http://localhost:11434/api/tags — if this returns model list, Ollama is running. Then test curl http://[YOUR-IP]:11434/api/tags — if this fails, the host binding isn’t set.
Quick Price Summary
- Beelink SER9 PRO+ — ROCm GPU inference, best value
- GMKtec K11 — OculLink eGPU, upgradeable to 64GB
- Beelink EQ14 — CPU-only inference, small 3B models
Recommended Hardware
→ Check Current Price: Beelink SER9 PRO+ on Amazon — Ryzen 7 H 255, Radeon 780M, best value for ROCm GPU inference → Check Current Price: GMKtec K11 on Amazon — Ryzen 9 8945HS, OculLink for external GPU, upgradeable to 64GB DDR5 → Check Current Price: Beelink EQ14 on Amazon — Intel N150, CPU-only inference for small 3B models, 6W idle
See also: best mini PC for local AI guide | GMKtec K11 review for OculLink eGPU expansion