How to Run Local AI (Ollama) on a Mini PC — Setup Guide

This article contains affiliate links. If you purchase through our links, we may earn a commission at no extra cost to you. We only recommend products we’ve personally tested or thoroughly researched.

Ollama local AI on mini PC setup guide hero image

Ollama makes running large language models locally as simple as ollama run llama3. On a mini PC with an AMD Radeon 780M, you can run 7B parameter models at 15–20 tokens/second with GPU acceleration — fast enough for a useful private AI assistant. This guide covers installation, GPU setup for AMD mini PCs, model selection, and adding a chat UI.

Before You Start

Requirements:

Mini PC running Linux (Debian 12 or Ubuntu 24.04)
8GB+ RAM minimum; 16GB+ recommended for 7B models without swap
For GPU acceleration: AMD Radeon 780M (Beelink SER9 PRO+, GMKtec K11, Minisforum UM790 Pro) or Intel Arc iGPU
Estimated time: 20–30 minutes for basic setup; 30–60 minutes for GPU acceleration

Performance expectations by hardware:

Hardware	Model	Mode	Speed
Intel N150 (EQ14)	Llama 3.2 3B Q4	CPU only	~8–12 tok/sec
Ryzen 7 H 255 (SER9 PRO+)	Llama 3.1 7B Q4	CPU only	~5–8 tok/sec
Ryzen 7 H 255 (SER9 PRO+)	Llama 3.1 7B Q4	ROCm GPU	~15–20 tok/sec
Ryzen 9 8945HS (K11)	Llama 3.1 8B Q4	ROCm GPU	~18–22 tok/sec
RTX 4060 via OculLink (K11)	Llama 3.1 8B Q4	CUDA GPU	~50–70 tok/sec

CPU-only inference is usable for 3B models. For 7B+ models, GPU acceleration makes the difference between “useful” and “barely tolerable.”

Hardware recommendation: The Beelink SER9 PRO+ with Ryzen 7 H 255 is the best value for local AI on a mini PC. See our best mini PC for local LLM guide for full comparisons.

Step 1: Install Ollama

The official installer handles everything:

curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama as a systemd service. Verify it’s running:

systemctl status ollama
# Should show: Active: active (running)

# Test with a small model
ollama run llama3.2:3b
# Downloads ~2GB, then starts an interactive chat session
# Type /bye to exit

Ollama listens on http://localhost:11434 by default.

Step 2: Enable GPU Acceleration on AMD Radeon 780M (ROCm)

This is the section most guides skip or get wrong. ROCm on Radeon 780M requires specific configuration.

2a. Install ROCm

# Add the ROCm repository
wget https://repo.radeon.com/amdgpu-install/6.3.1/ubuntu/jammy/amdgpu-install_6.3.1.60301-1_all.deb
# (Replace URL with current version from repo.radeon.com)
sudo apt install ./amdgpu-install_6.3.1.60301-1_all.deb

# Install ROCm
sudo amdgpu-install --usecase=rocm --no-dkms

For Debian 12 specifically, use the Debian package:

wget https://repo.radeon.com/amdgpu-install/6.3.1/debian/bookworm/amdgpu-install_6.3.1.60301-1_all.deb
sudo apt install ./amdgpu-install_6.3.1.60301-1_all.deb
sudo amdgpu-install --usecase=rocm --no-dkms

2b. Add Your User to Required Groups

sudo usermod -aG render,video $USER
# Log out and back in for group changes to take effect

2c. Verify ROCm Detection

# Check if the GPU is detected
rocm-smi
# Should show your Radeon 780M with temperature and utilization

# More detailed check
/opt/rocm/bin/rocminfo | grep -i "name"
# Should show gfx1103 (RDNA 3 iGPU)

2d. Set the HSA Override for Integrated GPU

The Radeon 780M is an integrated GPU. ROCm requires an environment variable to enable iGPU support:

# Add to /etc/environment for system-wide persistence
echo 'HSA_OVERRIDE_GFX_VERSION=11.0.0' | sudo tee -a /etc/environment

# Also set for the Ollama service
sudo systemctl edit ollama

In the systemd override editor that opens, add:

[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"

Save and close. Then restart Ollama:

sudo systemctl daemon-reload
sudo systemctl restart ollama

2e. Verify GPU Inference

# Watch GPU utilization while Ollama runs
watch -n 1 rocm-smi

# In another terminal, run a model
ollama run llama3.1:7b "Explain containerization in one paragraph"

If GPU utilization spikes to 80–100% during inference, ROCm is working. If it stays near 0%, the HSA override isn’t applied correctly — verify the environment variable and restart.

Step 3: Choose the Right Models

Model selection matters significantly for mini PC hardware. The key constraint is VRAM — the Radeon 780M shares memory with the system and typically gets 512MB–2GB allocated in BIOS.

Wait — that’s not enough for a 7B model!

The 780M can use more than its VRAM allocation for GPU inference via unified memory. With 16–32GB system RAM (DDR5), the GPU can access several gigabytes for model weights. This is how 7B Q4 models run on the 780M — they use ~4.5GB of the unified memory pool.

Recommended models by RAM:

System RAM	Recommended Model	VRAM Used	Speed (ROCm)
16GB	Llama 3.2 3B Q4	~2.2GB	~25 tok/sec
16GB	Llama 3.1 7B Q4	~4.5GB	~15–20 tok/sec
32GB	Llama 3.1 8B Q4	~5.0GB	~18–22 tok/sec
32GB	Llama 3.1 8B Q8	~8.5GB	~12–15 tok/sec
32GB	Mistral 7B Q4	~4.5GB	~15–20 tok/sec
32GB	Gemma 3 9B Q4	~6.0GB	~12–15 tok/sec

Pull models:

# Fast and capable 7B model
ollama pull llama3.1:8b

# Smaller, faster for quick queries
ollama pull llama3.2:3b

# Code assistance
ollama pull codellama:7b

# Good instruction following
ollama pull mistral:7b

List downloaded models:

ollama list

Step 4: Install Open WebUI (ChatGPT-like Interface)

The Ollama CLI is fine for testing, but for regular use you want a web UI. Open WebUI is the best option — it’s a full-featured chat interface that connects to Ollama.

mkdir -p ~/services/open-webui
cd ~/services/open-webui

# docker-compose.yml
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - ./data:/app/backend/data
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped

docker compose up -d

Access Open WebUI at http://[YOUR-IP]:3000. Create an account (local — no external service). You’ll see all your Ollama models listed. Start a conversation exactly like ChatGPT.

Step 5: Expose Ollama to Your Local Network

By default, Ollama only listens on localhost. To use it from other devices on your network (phone, laptop, or other containers):

sudo systemctl edit ollama

Add:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

sudo systemctl daemon-reload
sudo systemctl restart ollama

Now Ollama’s API is accessible at http://[YOUR-IP]:11434 from any device on your network. This allows:

Open WebUI running on a different machine
Direct API calls from scripts on other devices
Integration with Home Assistant’s ollama integration

Step 6: API Usage

Ollama exposes an OpenAI-compatible API. Use it from Python scripts, Home Assistant, n8n, or any application that supports the OpenAI API format:

# Python example using the openai library
from openai import OpenAI

client = OpenAI(
    base_url="http://192.168.1.50:11434/v1",
    api_key="ollama",  # required but not validated
)

response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "What are the best practices for Docker networking?"}
    ]
)

print(response.choices[0].message.content)

Troubleshooting

Ollama runs on CPU despite ROCm being installed

Verify the HSA override is set: echo $HSA_OVERRIDE_GFX_VERSION should return 11.0.0. Check the Ollama service environment: sudo systemctl show ollama | grep Environment. If missing, re-edit the systemd override.

ROCm install fails with “package not found”

The ROCm version for Debian 12 vs. Ubuntu 22.04/24.04 differs. Verify you’re using the correct package URL from repo.radeon.com for your exact OS version.

Model runs out of memory

Reduce the model quantization: llama3.1:7b (Q4) uses less memory than llama3.1:7b:q8. Alternatively, set OLLAMA_MAX_LOADED_MODELS=1 to prevent multiple models loading simultaneously.

Open WebUI can’t connect to Ollama

Verify Ollama is listening on 0.0.0.0 (not just localhost) and that the OLLAMA_HOST environment variable is set. Check with: curl http://localhost:11434/api/tags — if this returns model list, Ollama is running. Then test curl http://[YOUR-IP]:11434/api/tags — if this fails, the host binding isn’t set.

Quick Price Summary

Beelink SER9 PRO+ — ROCm GPU inference, best value
GMKtec K11 — OculLink eGPU, upgradeable to 64GB
Beelink EQ14 — CPU-only inference, small 3B models

Recommended Hardware

→ Check Current Price: Beelink SER9 PRO+ on Amazon — Ryzen 7 H 255, Radeon 780M, best value for ROCm GPU inference → Check Current Price: GMKtec K11 on Amazon — Ryzen 9 8945HS, OculLink for external GPU, upgradeable to 64GB DDR5 → Check Current Price: Beelink EQ14 on Amazon — Intel N150, CPU-only inference for small 3B models, 6W idle

See also: best mini PC for local AI guide | GMKtec K11 review for OculLink eGPU expansion

How to Run Local AI (Ollama) on a Mini PC — Setup Guide | Mini PC Lab