llama.cpp • 4 min read • Build llama.cpp From Source for CPU Inference Build llama.cpp on a CPU-only Debian box, run llama-bench for real numbers, and serve an OpenAI-compatible endpoint. With a 4.8GB gotcha.…