Local LLMs - LLMbits

llama.cpp • 4 min read

•

Build llama.cpp From Source for CPU Inference

Build llama.cpp on a CPU-only Debian box, run llama-bench for real numbers, and serve an OpenAI-compatible endpoint. With a 4.8GB gotcha.…

J • 04 Jun 2026

Docker • 3 min read

•

Run Open WebUI in Docker as a local ChatGPT-style UI over Ollama: the exact command, the host-gateway fix, and gotchas to skip.…

J • 02 Jun 2026

Ollama • 3 min read

•

Install Ollama, pull a model, and benchmark CPU-only inference with real tokens/s numbers from an old 6-core Xeon.…

J • 01 Jun 2026