// SERVICE — CLAUDE CODE PRACTICE
Private Claude Code Deployment
Run Claude Code or OpenCode against a private local AI model behind your firewall. One hundred percent private prompts and processing. Zero data egress. No cloud dependency. Built for healthcare, legal, financial services, government contractors, and IP-sensitive organizations that want agentic coding productivity without giving up data sovereignty.
The problem with cloud Claude Code for regulated work
Claude Code is one of the strongest agentic coding tools available in 2026. Run against Anthropic's cloud API, it ships production software ten times faster than traditional development. But for many organizations, cloud execution is a non-starter:
- Healthcare — HIPAA protected health information cannot leave the covered entity's controlled infrastructure. Every prompt that includes PHI context becomes a potential breach.
- Legal — attorney-client privilege attaches to every document your coding agent touches. Cloud execution puts that privilege at risk.
- Financial services — SOC 2, PCI DSS, and SEC requirements limit where customer data and trading systems can be processed.
- Government contractors and defense-adjacent — CUI, ITAR, EAR, and classification-adjacent work cannot run on commercial cloud APIs.
- IP-sensitive organizations — the codebase is the competitive advantage. Sending it to a third party for AI inference is unacceptable regardless of contractual assurances.
Private Claude Code Deployment solves this by pointing the Claude Code CLI at a local OpenAI-compatible endpoint served by a model running on your own hardware, inside your own network. The Claude Code interface is identical; the data never leaves your firewall.
How the deployment works
01. Backend inference
We stand up an OpenAI-compatible inference endpoint on your hardware. The runtime depends on your scale: Ollama for operational simplicity and single-workstation deployments, vLLM for multi-engineer throughput, llama.cpp for edge and CPU deployments. GPU allocation, batching, KV-cache sizing, and context-window policy are tuned per deployment.
02. Model selection
Candidate open-weight coding models, benchmarked against your actual workloads:
- Qwen 2.5 Coder (7B, 14B, 32B) — currently the strongest open-weight coding family per public benchmarks
- DeepSeek Coder V2 (lite + full) — strong on instruction-following and long-context code tasks
- Llama 3 Instruct (8B, 70B) — safe default for teams that want a general-purpose model
- Mistral Large — strong tool-use performance
- StarCoder 2 (15B) — code-specialized, lighter-weight
03. Claude Code integration
We configure Claude Code to route to your local endpoint. Custom skills, hooks, and MCP servers specific to your stack are developed during the engagement. OpenCode, Aider, and Continue.dev can be wired to the same backend for teams that want multiple coding-agent options.
04. Access control and audit
Per-engineer authentication (Mid and Large tiers integrate with your SSO), per-team rate limiting, and full audit logging of prompts, completions, tool invocations, and costs. Audit logs ship to your SIEM (Splunk, Elastic, Datadog, Grafana Loki) for compliance review.
05. Knowledge transfer
Runbooks, operational training for your sysadmin or devops team, and 30 days of post-launch support. Deployments can be bundled with our Team Enablement Workshop at a reduced rate for teams that want engineer-level training as part of the deployment.
Deployment tiers
Small — $40,000
- Single workstation or small server, one GPU (RTX 4090 / L40S / A100)
- 1-3 engineers sharing the endpoint
- 7B-30B parameter open-weight models
- Ollama backend, simple authentication, local audit logging
- Custom skills and hooks for the client's stack
- 1-2 week deployment timeline
Mid — $65,000
- Dedicated inference server (multi-GPU or A100/H100)
- Shared endpoint for 10-20 engineers
- 30B-70B parameter models with quantization strategy
- vLLM backend for concurrency, SSO integration
- Per-engineer access control, SIEM-compatible audit logging
- MCP integrations for 2-3 client systems
- 3-4 week deployment timeline
Large — $100,000
- Multi-node, high-throughput inference cluster
- Load-balanced Claude Code endpoints
- 70B+ parameter models with production quantization
- Full observability stack (Prometheus, Grafana, cost tracking)
- CI/CD for model updates, automated eval harness
- Security review, threat model, governance framework
- 30-day post-launch support; optional maintenance retainer
- Air-gapped delivery available
- 6-10 week deployment timeline
Who this is for
- Healthcare — HIPAA-covered entities that want coding agents that never touch PHI
- Legal practices — attorney-client privileged codebases and discovery platforms
- Financial services — trading systems, customer data pipelines, compliance-heavy workflows
- Government contractors — CUI, ITAR/EAR-adjacent work, pre-classification environments
- Defense-adjacent — contractors who will eventually need the capability in fully air-gapped networks
- Biotech and pharma — IP-sensitive research codebases
- Robotics companies — proprietary control systems and simulation pipelines
- Any organization whose codebase is the competitive advantage
Relationship to our other services
This service builds on our existing Local AI Deployment service and adds the Claude-Code-specific integration layer on top. Existing Local AI Deployment clients can upgrade to Private Claude Code Deployment at a reduced rate. For teams that want training on the deployed infrastructure, add our Training & Enablement service. For ongoing operation by our team rather than yours, see the Claude Code Agency retainer.
Frequently asked questions
What is a private Claude Code deployment?
Running Claude Code or OpenCode against an open-weight AI model hosted on your own hardware, inside your own network. No prompts, context, or completions leave your infrastructure. The CLI points at a local OpenAI-compatible endpoint (Ollama, vLLM, or equivalent) instead of the Anthropic cloud API.
Which AI models can power it?
Any open-weight model with strong coding capability: Qwen 2.5 Coder, DeepSeek Coder V2, Llama 3 Instruct, Mistral Large, Gemma 3, StarCoder 2. We benchmark candidates against your actual workloads during the engagement.
Does it work with OpenCode and other coding agents?
Yes. Any coding agent with OpenAI-compatible endpoint support — Claude Code, OpenCode, Aider, Continue.dev — can be wired to the same private local backend.
What hardware do we need?
Small (1-3 engineers, 7B-30B models): one RTX 4090 / L40S / A100. Mid (10-20 engineers, 30B-70B models): multi-GPU or single H100. Large (70B+, high concurrency): H100 or H200 cluster. Apple Silicon Ultra works for Mac shops at Small/Mid. We spec exact hardware in discovery.
How is this different from cloud Claude Code?
Data governance (nothing leaves your network vs. Anthropic processes everything) and model choice (any open-weight model vs. Anthropic's Claude family). Cloud is easier and Claude models are arguably stronger on coding; private gives you sovereignty over data and model choice. For regulated or IP-sensitive work, private is the only viable option.
Can we run fully air-gapped?
Yes. Fully air-gapped deployments are supported for Large tier. Model weights downloaded and verified ahead of installation, then deployment runs with no outbound connectivity. Model updates via manual air-gap transfer.
How long does deployment take?
Small: 1-2 weeks. Mid: 3-4 weeks. Large: 6-10 weeks. Variable is usually your internal process (procurement, security review, network access), not the technical work on our side.
Do you train our team on the deployment?
Every deployment includes runbooks, operational training for your devops team, and 30 days of post-launch support. For deeper engineer-level Claude Code training, add our Training & Enablement service (bundleable at reduced rate).
Ready to deploy privately?
Book a free 30-minute readiness call to discuss your team size, compliance posture, and hardware. We'll tell you honestly which tier fits and whether private deployment is the right answer for you.