// SERVICE — CLAUDE CODE PRACTICE

Private Claude Code Deployment

Run Claude Code or OpenCode against a private local AI model behind your firewall. One hundred percent private prompts and processing. Zero data egress. No cloud dependency. Built for healthcare, legal, financial services, government contractors, and IP-sensitive organizations that want agentic coding productivity without giving up data sovereignty.

The problem with cloud Claude Code for regulated work

Claude Code is one of the strongest agentic coding tools available in 2026. Run against Anthropic's cloud API, it ships production software ten times faster than traditional development. But for many organizations, cloud execution is a non-starter:

Private Claude Code Deployment solves this by pointing the Claude Code CLI at a local OpenAI-compatible endpoint served by a model running on your own hardware, inside your own network. The Claude Code interface is identical; the data never leaves your firewall.

$40KSmall (1-3 engineers)
$65KMid (10-20 engineers)
$100KLarge (multi-node, 70B+)
0 bytesData egress

How the deployment works

01. Backend inference

We stand up an OpenAI-compatible inference endpoint on your hardware. The runtime depends on your scale: Ollama for operational simplicity and single-workstation deployments, vLLM for multi-engineer throughput, llama.cpp for edge and CPU deployments. GPU allocation, batching, KV-cache sizing, and context-window policy are tuned per deployment.

02. Model selection

Candidate open-weight coding models, benchmarked against your actual workloads:

03. Claude Code integration

We configure Claude Code to route to your local endpoint. Custom skills, hooks, and MCP servers specific to your stack are developed during the engagement. OpenCode, Aider, and Continue.dev can be wired to the same backend for teams that want multiple coding-agent options.

04. Access control and audit

Per-engineer authentication (Mid and Large tiers integrate with your SSO), per-team rate limiting, and full audit logging of prompts, completions, tool invocations, and costs. Audit logs ship to your SIEM (Splunk, Elastic, Datadog, Grafana Loki) for compliance review.

05. Knowledge transfer

Runbooks, operational training for your sysadmin or devops team, and 30 days of post-launch support. Deployments can be bundled with our Team Enablement Workshop at a reduced rate for teams that want engineer-level training as part of the deployment.

Deployment tiers

Small — $40,000

Mid — $65,000

Large — $100,000

Who this is for

Relationship to our other services

This service builds on our existing Local AI Deployment service and adds the Claude-Code-specific integration layer on top. Existing Local AI Deployment clients can upgrade to Private Claude Code Deployment at a reduced rate. For teams that want training on the deployed infrastructure, add our Training & Enablement service. For ongoing operation by our team rather than yours, see the Claude Code Agency retainer.

Frequently asked questions

What is a private Claude Code deployment?

Running Claude Code or OpenCode against an open-weight AI model hosted on your own hardware, inside your own network. No prompts, context, or completions leave your infrastructure. The CLI points at a local OpenAI-compatible endpoint (Ollama, vLLM, or equivalent) instead of the Anthropic cloud API.

Which AI models can power it?

Any open-weight model with strong coding capability: Qwen 2.5 Coder, DeepSeek Coder V2, Llama 3 Instruct, Mistral Large, Gemma 3, StarCoder 2. We benchmark candidates against your actual workloads during the engagement.

Does it work with OpenCode and other coding agents?

Yes. Any coding agent with OpenAI-compatible endpoint support — Claude Code, OpenCode, Aider, Continue.dev — can be wired to the same private local backend.

What hardware do we need?

Small (1-3 engineers, 7B-30B models): one RTX 4090 / L40S / A100. Mid (10-20 engineers, 30B-70B models): multi-GPU or single H100. Large (70B+, high concurrency): H100 or H200 cluster. Apple Silicon Ultra works for Mac shops at Small/Mid. We spec exact hardware in discovery.

How is this different from cloud Claude Code?

Data governance (nothing leaves your network vs. Anthropic processes everything) and model choice (any open-weight model vs. Anthropic's Claude family). Cloud is easier and Claude models are arguably stronger on coding; private gives you sovereignty over data and model choice. For regulated or IP-sensitive work, private is the only viable option.

Can we run fully air-gapped?

Yes. Fully air-gapped deployments are supported for Large tier. Model weights downloaded and verified ahead of installation, then deployment runs with no outbound connectivity. Model updates via manual air-gap transfer.

How long does deployment take?

Small: 1-2 weeks. Mid: 3-4 weeks. Large: 6-10 weeks. Variable is usually your internal process (procurement, security review, network access), not the technical work on our side.

Do you train our team on the deployment?

Every deployment includes runbooks, operational training for your devops team, and 30 days of post-launch support. For deeper engineer-level Claude Code training, add our Training & Enablement service (bundleable at reduced rate).

Ready to deploy privately?

Book a free 30-minute readiness call to discuss your team size, compliance posture, and hardware. We'll tell you honestly which tier fits and whether private deployment is the right answer for you.