Agent Harness: Why 90% AI agents fail and how you can do better

Executive Summary

Table of Contents

90% of the developers change the AI model if their agent makes mistakes. That is the wrong strategy.

The AI agent is not the model. The agent is the Harness - the software infrastructure that makes the model productive. This harness consists of instructions, tools and user messages. It determines whether your agent delivers consistent results or systematically fails.

In this article you will learn:

What an agent harness is and why the model is only 10%
The 3 core components that define every agent
Why 90% teams fail (and how you can do better)
4 concrete steps for a production-ready agent harness

Based on 400+ trained software developers and 25 years of IT/AI consulting experience.

Ready for consistent agents?

Option A: You are at the beginning of your KI-DEV journey - Then take a look at our 12 weeks AI Developer Bootcamp and builds real AI-first habits.

♦

Option BOr are you already good with AI and need to go faster, then take a look here: Agentic Coding Hackathon

What does a poorly configured AI agent really cost companies?

Last week at the DEV AI Bootcamp: A team of software developers told me about their «AI agent disaster».

They had invested six months:

3 different models tested (Claude, GPT-4o, Gemini)
50+ custom prompts written
Building an «AI first» team
Regular prompt optimizations carried out

The result? Chaos.

The agent wrote code that was sometimes brilliant, sometimes completely off the mark. Sometimes it followed the conventions, sometimes it ignored them completely. Sometimes tests were included, sometimes not. The code review process became a rollercoaster.

The CTO told me: «We thought the model was the problem. So we changed. Three times. Nothing changed.»

The real problem: The missing agent harness.

They had no system that consistently controlled the agent. No central Agent.md with clear rules. No standardized tools. No structured workflows.

They were like cab drivers without a steering wheel - the engine was powerful, but there was no control.

The model was interchangeable. The missing harness was not.

This problem is costing the industry millions. According to a study by McKinsey from 2024 70% of the AI implementations lack of integration and processes - not the technology itself.

And it is 100% avoidable.

Why are agent harnesses indispensable in 2025?

AI agents are no longer an experiment - they are a production standard.

According to the Stack Overflow Developer Survey 2024 already use 76% the developer AI tools in their daily work or are planning to do so. The GitHub Octoverse Report 2024 shows: Projects with GitHub Copilot have 55% more pull requests per developer.

But here's the uncomfortable truth: most teams treat AI agents like advanced autocomplete tools. They prompt, hope, iterate. Without a system. Without a strategy.

That works for prototypes. Not for production.

In Production you need:

Consistency: The agent writes code with the same style patterns, security practices, best practices
Repeatability: The same input should (almost always) produce the same output
Scalability10 agents, 100 features, 1000 commits per month - all this must be manageable

This is exactly where the agent harness comes into play.

Agent harness (definition):

The software infrastructure that turns an AI model into a productive agent. It comprises instructions (rules), tools (capabilities) and user messages (control).

The term comes from AI research. A «harness» is the infrastructure that makes a model productive. In self-driving cars, it is the sensor fusion, the safety layer, the decision framework. For software agents, it is the combination of instructions, tools and workflows.

In 25 years of IT consulting, I have seen many trends come and go. Agent harnesses are not a trend. They are the new foundation of modern software development.

What is an agent harness and how does it work?

An agent harness is the software architecture that turns an AI model into a productive agent. It consists of three inextricably linked components:

Component	Function	Example
Instructions	Project-specific rules and guidelines	`Agent.md` with tech stack, code style, dos/don'ts
Tools	Available capabilities and integrations	GitHub, terminal, code search via MCP server
User Messages	The way you control the agent	Precise prompts with specific requirements

Note: The model only accounts for 10%. The harness determines the other 90%.

What role do instructions play for consistent agent results?

Instructions are the operating system of your agent. You determine WHAT the agent should do - not generic prompt guidelines, but specific, measurable rules for YOUR project.

What belongs in a production-ready Agent.md:

# Agent.md: Project Auth-Service

## Tech Stack
- Language: TypeScript 5.x, strict mode
- Testing: Vitest, coverage target >80%
- Framework: Express.js 5.x

## Code style rules
- Use ES Modules (import/export)
- See src/auth/login.ts as a template for error handling
- Component structure: utils/ for generic functions, services/ for business logic

## Workflows
1st **feature**: Write specs → Tests → Implementation → Review
2. **Bugfix**: Root cause analysis → Minimal fix → Tests → Review
3. **Refactor**: No functional change, tests remain green

## Dos
✓ Write tests BEFORE implementation
✓ Use example files as reference
✓ Run TypeCheck and linting after every change

## Don'ts
✗ Do not use any types
✗ No console.log in production code
✗ No breaking API changes without discussion

Why does it work?

Instead of vague instructions («write good code»), you give concrete, measurable rules. The agent can reference these rules during each session. This makes its behavior predictable.

The most important thing: The Agent.md is loaded automatically with every session. The agent knows the rules without you having to explain them each time.

What tools does an AI agent need for maximum productivity?

Tools say WHAT the agent can work with. The agent cannot work without the right tools. An agent without GitHub integration cannot create PRs. An agent without a terminal cannot run tests.

Examples of tools (MCP server):

GitHub Integration:
  - Read file from repo
  - Create/Update pull requests
  - Check CI/CD status

Database Access:
  - Query database schema
  - Execute migrations
  - Check data models

Terminal:
  - Run tests (npm run test)
  - Linting (npm run lint)
  - TypeCheck (npm run typecheck)

Code Search:
  - Find similar patterns in codebase
  - Search for function definitions

MCP server (Model Context Protocol): An open standard from Anthropic that enables AI models to interact with external tools and data sources in a structured way. Find out more at modelcontextprotocol.io

Tools make the agent independent and productive.

The best combination: Agent.md (What) + MCP tools (How) + Your prompts (Why).

How do I formulate prompts that deliver consistent results?

Instructions and tools are static. User messages are the daily interface to your agent. The way you promote determines success or failure.

Comparison: Vague vs. precise prompt

❌ Vag	Precise
«Implement a login feature»	«Implement backend login according to the pattern in src/services/auth.ts. POST /login with email + password. Return JWT token (15 min validity). Tests with Vitest, >80% coverage. See tests/auth.test.ts for test pattern»

The precise prompt delivers 10x better results, because it specifies the requirement.

Prompt engineering with Agent.md:

With a production-ready Agent.md you need fewer prompt details. The Agent.md provides the context.

💭 Old model (without Agent.md):
   Prompt: 300 words + explain all conventions

👍 New model (with Agent.md):
   Prompt: 50 words + Agent.md has the rest

Why do 90% of all agent harness implementations fail?

If you're thinking «That sounds easy, why don't all teams do it?» - here are the most common mistakes:

Error 1: Agent.md is too generic

Problem:

❌ Agent.md with 500 lines of copy-paste from other projects
   "Code style should be good, tests are important, DRY principle..."

Solution:

✅ Agent.md with 50 lines of concrete project rules
   Tech stack: TypeScript 5.x strict, Vitest
   Template: See components/Button.tsx for style
   Tests: Pattern from __tests__/button.test.ts, >80% coverage

Concrete beats generic by 100:1.

Error 2: No tool integration

Problem:
Agent has access to files, but no GitHub integration. Result: Agent can write code, but cannot push. You have to push manually.

Solution:
Set up MCP server for: GitHub, Terminal, Code Search. Agent becomes 10x more productive.

Error 3: Instructions are constantly changing

Problem:
You tell the agent one rule on Monday and another on Wednesday. Agent gets confused. No consistent behavior.

Solution:
Agent.md is the Single Source of Truth. Changes go into the Agent.md, not in the prompt.

Mistake 4: Too many prompts per session

Problem:
Feature should be built in 1 prompt, but it needs 10 iterations. Agent and human lose the context.

Solution:
Structure the work in cycles:

Write specs (prompt 1)
Tests (Prompt 2)
Implementation (Prompt 3)
Code review (Prompt 4)

Short, focused prompts with clear output.

Ready for consistent agents?

Option A: You are at the beginning of your KI-DEV journey - Then take a look at our 12 weeks DEV AI Bootcamp and builds real AI-first habits.

♦

Option BOr are you already good with AI and need to go faster, then take a look here: Agentic Coding Hackathon

What does a successful agent harness look like in practice?

A team from the DEV AI Bootcamp came with the following problem:

Before (without production-ready harness):

3 weeks per feature (with AI agent)
40% of the PRs were rejected (code quality issues)
Agent made the same mistakes over and over again
Each prompt required 50+ words of instruction

After 1 day boot camp (harness workshop):

We built together:

Agent.md with 12 clear rules for your project
MCP server for GitHub + integrated database
Prompt templates for Feature/Bugfix/Refactor

After 2 weeks with optimized harness:

Metrics	Before	Afterwards	Improvement
Feature development time	3 weeks	3-4 days	6x faster
PR rejection rate	40%	8%	-80%
Repeated errors	Frequently	0	-100%
Prompt length	100 words	10 words	-90%

The CTO (anonymized):

«We spent 6 months tweaking prompts. The Agent.md file did more in one day than all the prompt optimizations put together.»

Business-Impact:

Code review time: -40%
Bugs in Production: -75%
Agent independence: +90%

How do you build a production-ready agent harness in 4 steps?

Step 1: Standardize your instructions (30 minutes)

Action:

Create .ai/rules/agent.md in the project root
Document:

Tech stack (language, versions, important libs)
Commands (Build, Test, Lint, Typecheck)
Code Style (with reference file: «See components/Button.tsx»)
3-5 Dos
3-5 Don'ts

Example structure:

# Agent.md: Auth-Service

## Tech Stack
- TypeScript 5.x (strict mode)
- Express.js 5.x
- PostgreSQL 15

## Commands
npm run test # Run tests
npm run typecheck # Type-Check
npm run lint # ESLint

## Code Style
See src/auth/login.ts as a template.
Always typed Errors, never any-Type.

## Dos
✓ Tests BEFORE implementation
✓ Think about error cases

## Don'ts
✗ No any-types
✗ No console.log in Prod

Success check: Agent should be able to format correctly without further prompts.

Step 2: Integrate your tools (45 minutes)

Action:

List available tools on
Make sure that Agent has access to it
Test each tool with a simple example

Typical tools:

GitHub (Create PR, Push Code)
Terminal (Run Tests, Lint)
Code-Search (Find Patterns)
Database (Check Schema)

Test:

«Agent: Create a pull request for the new feature»

If the agent can do this, tools are configured.

Step 3: Define standard workflows (60 minutes)

Action:
Document in Agent.md, how agents should work.

## Workflow: Feature Implementation

1. **Specs**: Make requirements clear, no ambiguous requirements
2. **Tests**: Tests-first, acceptance criteria as tests
3. **Implementation**: Minimal code until tests are green
4. **Refactor**: Code quality, no function change
5 **Review**: Self-review for best practices

## Workflow: Bugfix
1. root cause analysis (not only patch symptoms)
2. minimal fix (not overengineering)
3. tests for bug (so that it does not happen again)
4. verification: bug is gone

Success check: Next time you have an agent build a feature, it should automatically follow this workflow.

Step 4: Iterative improvement (continuous)

Action:
Ask after each agent session:

What rules did the agent ignore?
What mistakes does he regularly make?
What could be clearer?

→ Update Agent.md.

Example:

Agent writes code without tests? → Agent.md Add rule: «Always write tests first»
Agent does not follow code style? → Specify a more specific template file.

Pro Tip: An error is a feedback signal. Use it.

The 5 most common agent harness errors (and their solutions)

Problem	Cause	Solution
Agent ignores code style	`Agent.md` too generic, no concrete reference file	Pin concrete example file: «See components/Button.tsx as a template. Use exactly this structure.»
Inconsistent test coverage	No clear TDD rule	Add rule: «Tests BEFORE implementation, do not change tests during green phase»
Agent makes the same mistake repeatedly	Error correction only via prompt	Error as a don't rule in `Agent.md` document
Tools are not used	Tools not configured/tested	MCP server setup, test simple tool call
Context explosion after 10 prompts	Too many files pinned	Use agent for code search, pin only reference files (max. 3)

Summary: Agent Harness Essentials

The most important thing:

The agent harness (Instructions + Tools + User Messages) is more important than the model
90% of the agent problems are harness problems, not model problems
A good harness makes results consistent, repeatable, scalable

Can be implemented immediately:

Create your first Agent.md (30 min) - Project-specific rules, not generic copy-paste
Define 3-5 clear rules per category (code style, tests, workflows)
Iterate based on mistakes - If a mistake happens twice, it belongs in the Agent.md

Business-Impact:
Ship teams with production-ready harnesses 6-20x faster and have 75% fewer bugs (according to real production metrics).

This is not hype - it is measurable and reproducible.

🚀 Learn agent harnesses in practice

Free resources

📄 Agent.md Template: github.com/obviousworks/agentic-coding-rulebook
Production-ready template with everything you need!

Our training

Option A: You are at the beginning of your KI-DEV journey - Then take a look at our 12 weeks DEV AI Bootcamp and builds real AI-first habits.

♦

Option BOr are you already good with AI and need to go faster, then take a look here: Agentic Coding Hackathon

Stay up to date

💬 LinkedIn Community: linkedin.com/in/matthiasherbert

🐙 GitHub: github.com/obviousworks

Do you need support with AI transformation?

At obviousworks.ch, we offer hands-on consulting and in-depth support - from strategic assessment to successful implementation. No theory, but tried and tested strategies for Swiss companies.

Let's talk: https://www.obviousworks.ch/kontakt/

Suitable training courses

AI Developer Bootcamp

Establishing an AI-first approach

Are you getting started with AI in software development? Then the AI Developer Bootcamp is the right thing for you.
In 12 weeks we establish new and stable AI habits with hands-on tasks and weekly retros in a dazzling learning approach.
👉 Info & registration for the AI Developer Bootcamp: obviousworks.ch/training/ai-developer-bootcamp

Agentic Coding Hackathon

Be on course in 3-5 days!

Are you and your team already really good with AI? Then the Agentic Coding Hackathon is the right thing for you.
Learn and establish your new AI-based software development process in 3-5 days?
👉 Info & registration for the hackathon: https://www.obviousworks.ch/schulungen/agentic-coding-hackathon

FAQs

What is the difference between Agent Harness and Prompt Engineering?Your Title Goes Here

Prompt Engineering optimizes individual inputs. Agent harness is the entire infrastructure - instructions, tools, workflows. A good harness makes intensive prompt engineering superfluous.

Does an agent harness work with all AI models?

Yes, the harness is model-agnostic. Whether Claude, GPT-4o, Gemini or Llama - the same instructions and tools work. That's why the harness is more important than the model.

How long does it take to build a production-ready harness?

4-8 hours for the basic structure. Then continuous improvement. Most teams see significant improvements after just 1 week.

What is an Agent.md file?

One Agent.md is a Markdown file in the project root that documents all rules, code style specifications and workflows for AI agents. It is loaded automatically with every session.

What are MCP servers for AI agents?

MCP (Model Context Protocol) is an open standard that enables AI models to interact with external tools (GitHub, terminal, databases). MCP servers are the concrete implementations of these integrations.

Do I need programming knowledge for an agent harness?

Basic knowledge is helpful, but not a prerequisite. The Agent.md is a simple Markdown file. MCP servers can often be configured with just a few clicks.

Matthias (AI Ninja)

Matthias puts his heart, soul and mind into it. He will make you, your team and your company fit for the future with AI!

About Matthias Trainer profile
To his LinkedIn profile

Agent harness: Why 90% AI agents fail and how you can do better