Insights · Implementation layer

The harness is where value lives.

Why the trillion-dollar opportunity in agentic workflows isn't the model, and what to ask any AI vendor before you write the check.

By Marc Shade · May 15, 2026 · 9 minute read

Three things happened in the same week. Anthropic stood up a deployment company reportedly backed by around $1.5B from Blackstone, Helman & Friedman, and Goldman Sachs. OpenAI launched a venture going after the same opportunity, reportedly valued near $10B. And the people running mid-market deals quietly all started saying the same thing. The trillion-dollar opportunity in agentic workflows isn't in the model. It's in the harness around the model.

I've been building software for businesses since 1997. I founded 2 Acre Studios in Pittsburgh in 2010 as a small dev shop, and pivoted us into AI engineering in 2023. I've watched the AI shop landscape go from "a deck and a ChatGPT API key" in 2023 to a market where the labs themselves are publicly saying their models aren't where the bottleneck is. That's a significant moment. It's also a window. This essay is the argument for why we built our company around the harness instead of the model, why the rest of the market is converging on the same answer, and what to ask any AI vendor (us included) before signing.

What changed in spring 2026

For two years the AI conversation has been about chat. People paid $20 a month for ChatGPT, copied answers into emails, and called it "using AI at work." That phase wasn't trivial. It taught millions of people what large language models can and can't do. But it never produced the kind of business value the financial markets were pricing in.

What changed is that agents finally got good enough to complete entire workflows reliably. Not chat-style turn-taking. Not "draft me an email and I'll edit it." A real end-to-end process: reading a case file, drafting a response, getting a sign-off, sending it, logging the outcome. Doing that ten thousand times in a row without the model losing the plot. That capability arrived in stages over the last six months and crossed a usefulness threshold around January.

The instant it crossed that threshold, the financial gravity shifted. SaaS companies that were "AI-enabled" by virtue of having a chatbot in their app suddenly looked like horse-and-buggy operations next to a workflow that could actually do the job. Mid-market businesses that had spent two years lukewarm on AI started asking, urgently, how to deploy it for real. The private equity firms that own a thousand mid-market SaaS companies started panicking about their 2027 and 2028 exits. And the AI labs (OpenAI, Anthropic, Google) realized that selling a model API was no longer the business. The business was sitting in someone's office, three weeks in, helping their team operationalize an agent.

The four pressures everyone is now under

It's a four-way squeeze. Anyone building or buying AI in 2026 is feeling all four:

Frontier labs are moving down-stack. Anthropic and OpenAI used to sell the model. Now they're standing up deployment companies, hiring forward-deployed engineers, and shipping product into the same workflows where their customers used to use third-party wrappers. Claude Design competes with Figma. Claude Code competes with Cursor. The model lab is now your competitor in product, not just your supplier.
Big consultancies are moving up-stack. McKinsey, BCG, Accenture, Capgemini, PwC are all inside OpenAI's Frontier Alliance program. They're not just doing change management anymore. They have delivery engineers, agent practices, and four decades of relationships with the CFOs who sign these checks.
Systems of record are exposing structured interfaces. Salesforce, ServiceNow, Workday, SAP all opened up APIs and agent frameworks designed to keep your agent inside their platform. They don't want a startup sitting between their data and your workflow. They want the agent to call them directly, with their permission model and their decision log.
Private equity has become a distribution channel. A PE firm with 50 portfolio companies in finance, ops, support, procurement, and compliance is enormously motivated to find one implementation partner and run them across the whole portfolio. That's a fundamentally different sales motion than vendor-by-vendor enterprise sales, and it favors implementation specialists over product specialists.

The squeeze is real. It is what makes generic-AI-wrapper companies look terrible right now. But it also produces the opening. Because in all four directions of pressure, the actual work (the implementation) isn't going to the lab, isn't going to the consultancy, isn't going to the system of record, and isn't going to the PE firm. It's going to the small number of shops that have figured out how to build the harness.

What the harness actually is

"Implementation layer" is the consultant phrase. "Harness" is what we call it internally because it captures the right metaphor: the system around the system, the thing that lets a powerful but unpredictable engine do useful work without breaking the equipment.

A working harness has eight components, organized along the lifecycle of how the system gets built, runs in production, and ages in your business over time. Here they are. If a vendor is talking to you about AI and can't tell you their answer for each of these eight, you're being sold a model with a price tag on it, not a working system.

Before the system runs

01 · Lifecycle phase: Before

Workflow design

Which decisions belong to the model. Which stay with a human. Where the handoffs are. What "done" actually means. Every step in the process gets an owner, an input, and an output. This is not a prompt. It's a defined business process that an agent can act inside. Most AI projects skip this step entirely and ship a model bolted to a tool, then are surprised when it doesn't produce value. The model can't design the workflow. You can't outsource designing the workflow to the agent.

Prevents: shipping a model with no workflow behind it.

While the system runs

02 · Lifecycle phase: During

Guard rails

Automated checks that fire before and after every action the AI takes. They catch hallucinated facts, blocked file paths, injected prompts, off-policy commands, and silent failures. In our own development harness we have ninety-seven such scripts running every time we work on a system; the equivalent live in the production systems we ship for clients. Guard rails are what make the difference between an agent that fails loudly when something's wrong and an agent that confidently writes a wrong answer into your books.

Prevents: the AI confidently saying "done" when it isn't.

03 · Lifecycle phase: During

Verification gates

Nothing ships as "working" until a real end-to-end path has been run. Unit tests passing is not the same as the system working. "Should work" is never a load-bearing claim. Every output the system produces names at least one thing that was not verified, so you always know what's been checked and what hasn't. This is the discipline that separates a demo from a production system.

Prevents: demo theater.

04 · Lifecycle phase: During

Structured memory

Persistent, queryable, decay-aware memory of your business. Not a chat history. What worked, what didn't, who your clients are, what your team named the thing, the constraints you mentioned in passing six months ago. The system gets better not by being trained on more data. It gets better by remembering you. This is also where the leverage compounds; an agent that knows your business on day 365 is twenty times more useful than the same agent on day 1.

Prevents: repeating the same explanation every week.

05 · Lifecycle phase: During

Decision log

Every decision the system makes is logged and timestamped. If something goes wrong on a Tuesday at 2:47 PM, we can show you the exact input the system saw, the exact output it produced, what it was asked to consider, and what it chose to do. Required for healthcare. Required for finance. Required by us regardless. The alternative is a system whose failures are unexplainable, which is the same as a system you cannot improve.

Prevents: "we have no idea why it did that."

06 · Lifecycle phase: During

Rollback

Every change is committed to version control. Every deployment is checkpointed. Every autonomous decision the system makes is reversible. The harness has kill switches: file integrity monitors, HALT files, three autonomy tiers from "propose only" to "act freely within bounds." You are never one bad day away from a rebuild. The most underrated feature of any AI system is the ability to instantly undo what it just did.

Prevents: the AI making a confident change you can't get back.

07 · Lifecycle phase: During

Specialized sub-agent review

Twenty-six specialized roles in our own reference harness (a code reviewer, a security checker, a debugger, a testing agent, an architecture agent, a performance agent), each with a narrow remit. The agent that proposes a change does not approve it. The reviewer doesn't know the proposer's reasoning. Disagreement surfaces. Single-model overconfidence doesn't. This is how you escape the failure mode of one model talking itself into a bad call.

Prevents: one model talking itself into a bad call.

After you've signed off

08 · Lifecycle phase: After

Ongoing ownership

Who keeps the system tuned after launch. What documentation you have. What metrics get watched. How a recovery looks the first time something genuinely goes sideways twelve months in. We don't ship a system and ghost. But the system also can't depend on us being there forever. It has to belong to your people the day after launch. The single hardest part of selling AI services honestly is admitting that the work doesn't end on go-live; it changes shape.

Prevents: a "production" system that decays into nobody's-job.

Why a weekend of Claude Code isn't going to work

There are PE firms right now testing whether they can rebuild SaaS portfolio companies by giving an in-house team Claude Code and a weekend. They cannot. Not because the model can't write the code (it can write quite a lot of the code), but because the eight components above are not a code problem. They are a discipline problem. They are the embedded knowledge a senior engineer brings about how things fail in production, how regulated industries actually verify their work, what mid-market businesses actually need on day 400, what a customer escalation actually looks like at 2 AM.

You don't get that out of a weekend with a model. You get it out of having shipped enough systems to have been burned by every failure mode at least twice. That kind of expertise is what produced the harness in the first place.

Why this matters specifically for SMB and mid-market

The trillion-dollar framing in the macroeconomic conversation is for Fortune 500. Anthropic's deployment company isn't going to sit in your CPA firm. McKinsey's agentic practice isn't going to design the workflow for your 35-person manufacturing operation. The four-way pressure above is real for them, but they don't know it's there for you.

The seam in this market for the next 12 to 24 months is small senior shops that understand the harness, are willing to deploy it at SMB scale, and don't need a Fortune 500 deal size to keep the lights on. That's the opening. That's where we're built to operate.

What we do specifically: law firms from 10 to 150 attorneys, healthcare practices and specialty groups, CPA and accounting firms, and manufacturers, contractors, and insurance agencies from $5M to $75M in revenue. Pittsburgh first, regional second, remote where it makes sense. Senior people only. Marc Shade and Scott Frederick Laughlin write the code; there's no junior labor, no offshoring, no handoff chain. We charge a flat monthly retainer ($4,500 to start) or a fixed-fee diagnostic The Assessment at $1,995 for clients who want a punch list before committing.

The buyer-side assessment checklist

If you take nothing else from this essay, take this list. Before you write a check to any AI vendor (us, an agency, a consultancy, an OpenAI deployment partner, anyone), ask them these eight questions. If they don't have a clear answer to all eight, you are being sold a model, not a system.

Eight questions to ask any AI vendor

Print this. Bring it to the meeting. The vendors with good answers will get sharper. The ones without good answers will start hedging within three minutes.

Workflow. "Walk me through the specific business workflow this agent operates inside, step by step. Which steps are the model's, which stay with my team, and what does 'done' look like at the end of the workflow?"
Guard rails. "What automated checks fire before this agent takes an action, and what fires after? When the agent encounters something outside its training, how do I learn about it: loudly, or never?"
Verification. "Before this agent makes a change to my data, what verification has it run on its own output? What confidence threshold does it use to escalate to a human?"
Memory. "Show me where the agent stores knowledge about my business specifically. Not the model's training data, mine. How is that storage queryable? How does old context decay?"
Decision log. "Walk me through a single past decision the system made, end to end: the input, the output, the reasoning trace, the timestamp. If a regulator asks 'why did the agent do this in November?', what do I show them?"
Rollback. "What happens when the agent makes a wrong call that has already touched my data, my customer's inbox, or my GL? Specifically what is reversible? What command undoes it? Who has authority to issue that command?"
Review. "Does the agent that's proposing the action also approve the action, or is there a separate review path? If it's separate, what role does the reviewer play and how is it independent?"
Ownership. "Twelve months in, when your company has pivoted or the model has been upgraded twice and the original engineer has moved on, who at my company knows how to keep this system tuned? What documentation will exist? What metrics will my team be watching?"

What we're betting on

The case for 2 Acre Studios isn't that our models are better. They aren't. We use the same models everyone else uses. The case is that we understand the eight components above better than most shops our size, we hold ourselves to them as discipline, and we deploy at a scale and price point that the labs and big consultancies will not reach for at least another 12 to 24 months.

The macro frame above, with the labs and PE and the systems of record all converging on agentic workflows, is real. Trillions of dollars is real. The bottleneck is the implementation layer, not the model, and that's the part everyone is finally saying out loud. The window for shops our size to ship this work at SMB scale is open right now, and probably for the next 18 to 24 months.

If you run a Pittsburgh-area business in any of the industries above and you're tired of being pitched chatbots that don't ship, this is what we build. The first conversation is free; the diagnostic is $1,995 with a money-back guarantee; everything we do is documented, assessed, and reversible. The harness goes with the system. It doesn't go in the slide deck.

If you've read this far.

Three ways in, all free. Pick whichever fits how you work.

Free 5-minute self-assessment 30 minutes with Marc → The Assessment · $1,995 →

Marc Shade

Founder, 2 Acre Studios. Building software for businesses since 1997. Fortune 500 clients through Leo Burnett and Arc Worldwide, then Pittsburgh contractors and education companies, then AI engineering since 2023. Sixty-plus open-source repositories. LinkedIn · GitHub · marc@2acrestudios.com