llama.cpp • 4 min read • Build llama.cpp From Source for CPU Inference Build llama.cpp on a CPU-only Debian box, run llama-bench for real numbers, and serve an OpenAI-compatible endpoint. With a 4.8GB gotcha.…
Docker • 3 min read • Open WebUI + Ollama in Docker: a Local AI Chat UI Run Open WebUI in Docker as a local ChatGPT-style UI over Ollama: the exact command, the host-gateway fix, and gotchas to skip.…
Ollama • 3 min read • Run a Local LLM with Ollama on Debian (CPU-Only) Install Ollama, pull a model, and benchmark CPU-only inference with real tokens/s numbers from an old 6-core Xeon.…