A concept worth naming

The Harness

A model is raw intelligence. A harness is what makes it useful.

harness wraps Opus GPT-4o Qwen 3.5 Sonnet o3 Gemini 2.5 DeepSeek R1 Llama 4 Grok 3 Mistral Large

Read

00 — The Analogy

No One Rides Bareback

A horse is a thousand pounds of speed and power. Humans figured that out ten thousand years ago. But the horse wasn't the breakthrough. The saddle was.

Before the saddle, you could ride — badly, briefly, and at the mercy of the animal beneath you. The saddle gave you a seat. The stirrups gave you leverage. The reins gave you direction. Suddenly the most powerful thing on four legs was yours to direct. Not tamed. Not diminished. Connected.

We're at the same moment with AI. The model is the horse — fast, powerful, astonishing. It can write code, synthesize research, debug distributed systems — and yet, out of the box, it can't read a file. A brain in a jar. Raw capability isn't the bottleneck. The connection is. The software that wraps a model, gives it tools, points it at your work, and lets you steer — that's the saddle. We call it a harness.

fig. 01

Raw Model

┌─────────────┐ │ │ │ ◉ ‿ ◉ │ │ (brain) │ │ │ │ "I could │ │ write │ │ great │ │ code" │ │ │ ├─────────────┤ │ ░░░ JAR ░░░ │ └─────────────┘ no hands. no eyes. no filesystem.

Infinite knowledge.
Zero agency.

Harnessed Model

┌─────────────┐ │ ◉ ‿ ◉ │ │ (brain) │ ├─┬─┬─┬─┬─┬─┬─┤ │ │R│W│B│G│S│E│ │ │e│r│a│r│e│d│ │ │a│i│s│e│a│i│ │ │d│t│h│p│r│t│ │ │ │e│ │ │c│ │ │ │ │ │ │ │h│ │ ├─┴─┴─┴─┴─┴─┴─┤ │ FILESYSTEM │ │ GIT │ SHELL │ └─────────────┘ hands. eyes. memory.

Same brain.
Now it can actually do things.

01 — The Concept

Enter the Harness

A harness is the software that wraps around a model and connects it to the world. It transforms raw intelligence into useful agency — giving the model hands, eyes, and memory.

The word carries exactly the right connotations. A harness is structural — engineered for a purpose, not bolted on as an afterthought. Connective — it links the powerful thing to the useful thing. Directional — it channels energy toward a goal. And respectful — it doesn't diminish what it wraps. A harness doesn't make a horse smaller. It makes the horse's power available.

Claude Code is a harness. Cursor is a harness. Codex is a harness. Every system that takes a model and makes it do real work is a harness. The question isn't whether you're using one — you always are. The question is whether it's a good one.

The harness doesn't make the model smarter. It makes the model present — in your codebase, in your workflow, in the real world where work actually happens.

02 — Anatomy

What a Harness Provides

When you use Claude Code, you're not talking to Opus. You're talking to Opus through a harness — one that provides a specific set of capabilities. Take them apart:

What the harness gives the model

Perception    Read files. Glob for patterns. Grep for content.
              See the codebase as it actually is right now.

Action       Write files. Edit code. Run shell commands.
              Make changes that persist in the real world.

Context      CLAUDE.md files. Git status. Conversation history.
              Know where it is and what matters here.

Memory       Project notes. User preferences. Past decisions.
              Build on previous work instead of starting fresh.

Judgment     Permission systems. Safety rails. Confirmation prompts.
              Know when to act and when to ask.

Feedback     Test results. Linter output. Build errors.
              See the consequences of its own actions.

None of this is the model. The model doesn't "have" tools — it's given tools by the harness. The model doesn't "remember" your project — the harness loads context into the conversation. The model doesn't "decide" to be careful with destructive commands — the harness enforces permission boundaries.

The model provides the reasoning. The harness provides everything else.

03 — The Distinction

Not Just an App

There's an obvious objection: isn't a harness just an app? A wrapper? A thin layer of UI over an API? No. An app presents information and takes input. A harness manages a loop — perception, reasoning, action, feedback — and keeps it running autonomously. The architecture is fundamentally different. A chat interface is a harness, technically. It's just a bad one — it gives the model no eyes, no hands, and no memory. You become the hands, copying files in and pasting diffs back.

This is the default experience most people have with AI today, and it's why so many walk away underwhelmed. The model is powerful. The harness is anemic. A copy-paste workflow forces the human to be the perception layer, the action layer, and the feedback loop — all at once. The model sits in the middle, brilliant and helpless.

And the pattern extends far beyond coding. NotebookLM is a research harness — it ingests your sources, lets the model read across them, and generates synthesis you couldn't get from a chat window. Perplexity is a search harness — it gives the model live access to the web and the ability to reason over results. Devin, Replit Agent, v0 — all harnesses, each connecting the same underlying intelligence to a different domain. The concept isn't specific to code. It's wherever a model needs to touch the world.

Nobody coined "harness" in a single moment. The word emerged the way good terminology always does — different teams solving the same problem reached for the same metaphor.

Anthropic was the first to formalize it. Through 2025, the team building Claude Code began describing its architecture as an "agent harness" — the SDK that wraps the model, manages tools, and runs the agentic loop. In November 2025, Justin Young published "Effective Harnesses for Long-Running Agents" on the Anthropic Engineering Blog, giving the concept its most rigorous public definition: how to engineer context, memory, and multi-session continuity around a model.

Then in February 2026, Mitchell Hashimoto named the discipline. The HashiCorp co-founder wrote: "I don't know if there is a broad industry-accepted term for this yet, but I've grown to calling this 'harness engineering.'" Within weeks, OpenAI adopted the term, Martin Fowler's team codified it, and "harness" became part of the core AI engineering vocabulary.

The physical metaphor matters. A harness is equipment fitted to something powerful — a horse, a climber — to direct its energy toward useful work. Not to constrain it. Not to diminish it. To make it available. That connotation, more than any formal definition, is why the word stuck.

Sources: Anthropic, "Building Effective Agents" (Dec 2024) · Anthropic, "Effective Harnesses" (Nov 2025) · Hashimoto, "My AI Adoption Journey" (Feb 2026)

If the harness is a real architectural layer, then building one is a real discipline. Mitchell Hashimoto gave it a name: harness engineering. Not prompt engineering — that's one small input. Harness engineering is the whole system: which tools to expose, how much context to load, when to let the model act and when to make it ask, how to recover when an agentic loop goes sideways, how to carry memory across sessions without drowning the context window.

These are design decisions, not implementation details. A harness that loads your entire codebase into context will choke. One that loads nothing will hallucinate. The right amount of context — retrieved at the right moment, in the right format — is the difference between an assistant that wastes your time and one that saves hours. The same tension runs through every layer: too many tools and the model gets confused, too few and it can't do the job. Too much autonomy and it makes destructive mistakes, too little and it asks permission to breathe.

See: Hashimoto, "My AI Adoption Journey" (Feb 2026) · OpenAI, "Harness Engineering" (2026) · Fowler, "Harness Engineering" (2026)

04 — The Evidence

Three Harnesses, One Pattern

If "harness" names a real architectural pattern, you should be able to find it in the wild — independently reinvented by teams with different goals and different constraints. You can.

Claude Code is Anthropic's proprietary CLI, tightly integrated with Claude models. OpenCode is an open-source terminal agent, model-agnostic, supporting 75+ providers. Codex is OpenAI's cloud-native coding agent, sandboxed and async. Different companies. Different constraints. Different philosophies.

Line them up and the anatomy is identical. Same capabilities, same architectural layers, same separation between harness and engine. This isn't copying — it's convergent design. The problem constrains the solution.

fig. 02 — same pattern, different harnesses

Claude Code

OpenCode

Codex

Tools

Claude Code Read Write Edit Bash Glob Grep WebFetch WebSearch NotebookEdit

OpenCode read write edit bash glob grep patch list LSP

Codex bash apply_diff file_search web_search

Agents

Claude Code Explore Plan general subagents code-simplifier

OpenCode Build Plan General Explore

Codex autonomous tasks

Context

Claude Code CLAUDE.md git status conversation history

OpenCode project scanning @-references image support

Codex repo clone AGENTS.md task prompt

Permissions

Claude Code ask/allow modes safety rails

OpenCode ask/allow/deny glob patterns

Codex sandboxed container network firewall

Extensions

Claude Code MCP servers hooks auto-memory

OpenCode MCP servers multi-session session sharing

Codex GitHub PRs async execution

the harness ◆ the engine

Engine

Claude Code Opus Sonnet Haiku

OpenCode Claude GPT Gemini Llama DeepSeek local 75+ more

Codex o3 o4-mini codex-1

Convergent design is evidence of a real pattern. When different teams, solving the same problem, arrive at the same architecture — you're looking at something structural, not accidental.

05 — Closing

Now You Can See It

Once you have the word, you start seeing the pattern everywhere. Every time a model reads a file, runs a command, or remembers a preference — that's the harness at work. Every time it can't — that's the harness missing.

The concept isn't new. People have been building this layer since the first developer plugged GPT into a script. What's new is the name — and with it, the ability to see the architecture clearly: model here, harness there, and between them, all the decisions that determine whether raw intelligence actually reaches the work.

A horse is fast. A saddle makes it yours. A model is brilliant. A harness makes it useful. That's the whole idea.