Reference guides

Decision-focused AI engineering guides.

Practical reference material we've written for clients, partners, and ourselves. Each guide is decision-focused — what to pick, when to pick it, what we've actually seen break in production. Written for engineers and decision-makers, not vendor blog readers.

GPU sizing · RAM · storage · 2026

Local LLM Hardware Guide 2026 →

The definitive 2026 hardware guide for running local LLMs. GPU, RAM, and storage requirements for 7B through 70B+ models on consumer rigs and enterprise hardware. Real benchmarks, not spec-sheet guesses.

When to retrieve · when to train

RAG vs Fine-Tuning →

A practical, decision-focused guide. When RAG is the right answer, when fine-tuning is, when neither is. What we've shipped, what we've seen fail, and how to figure out which one fits your use case before you've spent $50K on the wrong path.

Inference engine comparison

Ollama vs vLLM →

Practical comparison of Ollama and vLLM for local LLM deployment. Latency, throughput, operational simplicity, multi-GPU support, OpenAI-compatible API surface. Which one to pick for which workload.

These guides exist because we hit each decision in client work and wrote down what we learned. If you want a working session on your own deployment decisions, the free AI Readiness Assessment is the right next step.