Your README Is a Lie

Context Fabric: What the Agent Needs to Know Before It Writes a Single Line of Code

Open any README in your repository. That flagship one. The one that's 800 lines long with a "Getting Started" section written in 2022. Read it with fresh eyes - as if you were a new developer, or better yet - as an AI agent who's never been to a standup, never seen Slack, never heard the legend of why we don't touch the InvoiceReconciler class in the payment service.

Now ask yourself one question: based on this README, can you safely modify anything in this service?

Of course not.

In the previous post, we talked about the Fiat 500 with a Ferrari engine - about how the industry bought agents but never built the infrastructure for them. Today, we're taking apart the first of three pillars of that infrastructure: Context Fabric - what the agent needs to know before it writes even a single line of code. And we'll make a claim that might seem counterintuitive: documentation is not a culture problem. It's an infrastructure problem. And in the age of agents, it becomes Job Zero - the thing without which the rest of the pipeline is meaningless.

Tribal Knowledge - For Real

In every software organization, there are two layers of knowledge. The first is syntactic: what the code does. Anyone can read it - just open the IDE. The second is semantic: why the code exists in its current form. And that layer lives exclusively in the heads of the people who wrote it.

That's tribal knowledge. And I need to immediately switch off the reflex you have as an engineering manager: "We know, we know, we need to document better." Because tribal knowledge is NOT "things we haven't documented yet." That framing suggests the problem is solved by a wiki. It isn't.

Tribal knowledge is architectural decisions made at a meeting three years ago that no one documented. It's the workaround in the billing service that prevents a race condition, but it looks like a bug to anyone who wasn't there during the incident. It's the reason why the event bus uses a specific message schema that only makes sense if you know the constraint it was designed around. Every codebase has two layers of meaning: syntactic (what) and semantic (why) - and the second one is irreducibly human.

Until now, this was a human problem with human workarounds. A new developer asked colleagues on Slack, had coffee with a senior, read PRs, and, after more or less three months, more or less they got the lay of the land. Expensive? Yes. But it worked. An agent can't do that. And here we arrive at the paradox that is the central insight of this post:

Organizations that need agents the most - large, complex, with many teams and years of legacy - are precisely the ones where agents perform the worst. Because the bigger the organization, the more tribal knowledge. The more tribal knowledge, the bigger the gap between what the agent "sees" and what it needs to know. The bigger the gap, the more iterations, rework, and tokens burned on guessing.

This is not a problem that a better model will solve. GPT-5 won't help if the agent doesn't know that in this module, we don't make direct DB calls because a year ago, Linda was fixing a race condition until midnight, and since then, routing goes through a message queue.

Why This Is a Fundamentally Different Problem Than Onboarding

"But a new developer doesn't know the repo either" - I hear this argument regularly. And it sounds reasonable. Except that a new developer has something an agent doesn't: the ability to recognize their own ignorance.

A new developer knows they don't know. They see an unfamiliar convention and ask. They read code that looks weird and check with a senior. They see a TODO from 2019 and treat it with skepticism. An agent? An agent doesn't know that it doesn't know. An agent sees an incomplete context and treats it as complete. This is a fundamental asymmetry - and it's why the analogy "agent = new developer" is deeply misleading.

Stripe said it plainly in their post about the Minions architecture: "AI coding tools are fast, capable, and completely context-blind." Context-blind - not context-ignorant. Not "they don't know yet" - "they can't know that they don't know." That's why Stripe doesn't rely on the agent "figuring it out." Stripe has 400 internal tools on MCP, but doesn't expose all of them to the agent. Instead, the orchestrator does deterministic prefetching - scans the prompt, finds relevant docs and tickets, and curates a surgical subset of ~15 tools per task. Because they know that an agent that sees too much is just as useless as an agent that sees too little.

And here comes the second counterintuition: a bigger context window is not the solution. Research and practice show that at ~40% utilization of the context window, model performance starts to degrade. Dex Horthy sums it up brutally: "The more you use the context window, the worse the outcomes you'll get." An agent you dump an entire repo into won't "know more" - it will know worse, more expensively, and more slowly.

Framework: 5 Levels of Context

The industry is slowly starting to understand that "context" isn't a single word - it's an architecture. In our work with enterprises, five distinct layers have emerged. But - and this is key - these layers have dependencies. You can't effectively deliver a higher layer if you don't have the lower one. This isn't a menu you pick from. It's a pyramid.

Level 1: System Context - "What even is this?"

System architecture, dependency graphs, database schemas, API contracts. What ecosystem am I working in? Most organizations have this partially - in Confluence, in C4 diagrams, in Terraform files. The problem: it's scattered, outdated, and in a format the agent can't read. An agent that doesn't know the system is split into 14 microservices with event-driven communication will write you a synchronous REST call where you need a Kafka message.

Level 2: Code Context - "How do we write code here?"

Coding standards, conventions, preferred patterns, and agent configuration files. This is the layer addressed by CLAUDE.md and AGENTS.md - and the ecosystem is growing: over 60,000 repositories on GitHub use one of these formats. Next.js, LangChain, Excalidraw, and Deno take this seriously. But that's open source. In an enterprise with 200+ repos? You might have .editorconfig and ESLint rules. That's it. The gap between open-source best practice and enterprise reality is enormous - and nobody's talking about it.

Level 3: Org Context - "Who is responsible for what?"

Service ownership, SLAs, review policies, escalation paths. The agent needs to know that this module is owned by the Payments team and requires their approval before merging. This service has a 99.99% SLA. In this repo, there's one person doing 70% of reviews who's currently on vacation, so the agent should trigger a backup reviewer, not wait three days. This is the layer that AGENTS.md doesn't address at all, because it's not knowledge about code - it's knowledge about the organization. And without it, the agent generates PRs that technically merge but politically won't pass.

Level 4: Historical Context - "Why is it the way it is?"

ADRs, PR history, postmortem threads, meeting notes. Why we chose Kafka over RabbitMQ. Why does that endpoint have a retry set to exactly 7 seconds?. This is the hardest layer - because tribal knowledge in its purest form lives in tools that were never designed as knowledge sources: Slack threads, Jira comments, Google Docs, 1:1 notes. And simultaneously, the most costly when absent. Every missing ADR means the agent makes an architectural decision in the dark - and a senior has to reverse it in review, burning their time explaining history.

Level 5: Operational Context - "How does it behave in production?"

Telemetry, logs, metrics, anomalies, and results of recent deploys. The agent should know that this service had two incidents last month, that latency increases after 4 PM, and that the last deploy was rolled back after 12 minutes. Operational Context closes the feedback loop - and it's the level where most organizations haven't even begun thinking about AI integration. Without it, the agent optimizes in a vacuum - it doesn't know whether its changes improved anything in production.

Why a pyramid, not a list: if you have Level 2 (Code Context: good CLAUDE.md) but lack Level 1 (System Context: the agent doesn't know what other services exist), the agent will write beautiful, idiomatic code... in the wrong service. If you have Levels 1 and 2 but lack Level 4, the agent will make an architectural decision that the organization deliberately avoided after an incident two years ago. The levels are interdependent, and most enterprises stop somewhere between 1 and 2.

The Economics of Ignorance: How Much Does Missing Context Cost?

In post #1 we calculated the TCO of a Copilot license. Now let's calculate something nobody puts on a dashboard: the cost of an agent iteration that goes in the wrong direction.

Research on coding agents shows that a single unconstrained solution to a software engineering task costs $5–8 in tokens. But that's the optimistic scenario - it assumes the agent knows what it's doing. A Reflexion loop, where the agent hits an error and iterates, can consume 50x more tokens than a single-pass solution. An agent that doesn't understand context doesn't "guess once" - it guesses in circles, burning budget every time.

Stripe solved it differently: Minions is a one-shot architecture. The agent gets a fully assembled context and makes a single LLM call. No loops, no iterations, no memory between calls. Over 1,300 PRs per week, zero code written by humans. But - and this is key - Stripe has underneath it a context infrastructure that was built for humans before LLMs even existed: devboxes spinning up in 10 seconds, MCP with 400+ tools, deterministic prefetching, blueprints mixing fixed code with open agent loops. As one analysis put it: "The unglamorous parts of the architecture - the deterministic nodes, the two-round CI cap, the mandatory reviewer - are doing more work than the model is."

Stripe doesn't generate 1,300 PRs per week because it has a better model. It generates them because it invested in Context Fabric before it even called it that. Minions work because the agent never guesses - it always knows. And it knows because the system in front of the agent deterministically assembles the full context.

Your organization is the mirror image. You have the model (Copilot, Claude, Gemini - doesn't matter), but you don't have the infrastructure to feed it context. And you're paying the Unreliability Tax: the additional cost in compute, latency, and engineering that comes from the agent operating probabilistically where it should operate deterministically.

Autonomous Requirements - Or: What if the Jira Ticket Wrote Itself?

Let's go back to Bob from post #1. His seniors are complaining that agents generate code "adjacent" - technically correct but inconsistent with the architecture. Bob thinks: "maybe the problem is in the prompts?"

No. The problem starts the moment someone types into Jira:

"As a user, I want to be able to filter invoices by date range"

And that's the entire task description. No acceptance criteria. No information that "filtering by date" in this system means a cross-shard query. For a human developer this is normal - they'll ask, look at the code, talk to KateKasia from the database team. For the agent, that ticket is the entire world. And the agent will do exactly what you told it: a simple WHERE date BETWEEN, on the wrong schema, in the wrong service.

Compare this with how Stripe does it: when an engineer tags a Minion on Slack, the system ingests the entire thread - with stack traces, documentation links, prior discussion. Then it deterministically processes the links, extracts relevant documents, curates tools. Before the agent sees the task, the system already knows what context the agent is working in. The ticket never reaches the agent "raw."

Hence the concept we call Autonomous Requirements: an analytical layer between Jira and the agent. Not another form to fill out - but an analyst-agent that, before any developer (human or AI) sees the task, does what a good Business Analyst would:

Reads the ticket and extracts what's missing.
Queries context - System (which service? which schema?), Code (what conventions?), Historical (has anyone tried this before?).
Generates a deterministic specification - not prose, but a contract: endpoint path, payload schema, expected behavior, edge cases, acceptance tests.
Flags risks.
And waits for human approval - because "autonomous" doesn't mean "uncontrolled."

Is this science fiction? Stripe does it implicitly (blueprints + MCP prefetching) and Research from ICSE RAISE 2025 shows that practitioners see AI as a partner in requirements elicitation - though full autonomy raises pushback (2% of respondents believe in fully autonomous elicitation without a human). The Human-AI Collaboration model - agent drafts, human validates - is the realistic path, without waiting for AGI.

Autonomous Requirements isn't then a "new role" or a "new tool." It's a shift in the point at which context enters the process. Today, context arrives after the task is assigned - the developer gathers it themselves. In the AR model, context arrives before - the specification is already contextually complete before anyone (human or agent) starts writing code. This changes the economics of the entire pipeline.

Job Zero: Why Documentation Is Infrastructure, Not Culture

When I talk about context at conferences, someone always asks: "But how do you convince people to document?" For 30 years, the industry has treated this as a cultural problem. "You need to build a culture of documentation." "You need to motivate." "You need to set OKRs for docs."

It doesn't work. It didn't work in 2015; it won't work in 2026. Not because people are lazy, but because the incentive structure is wrong. Documenting slows you down now; the benefit is deferred and diffused. Classic tragedy of the commons.

In the age of agents, the dynamic changes - not because people suddenly love Confluence, but because missing context has a direct, measurable cost in tokens and iterations. Every ticket without acceptance criteria means an agent generating five variants instead of one. Every missing ADR means a senior spending an hour in review explaining history. Every outdated diagram means an iteration in the wrong direction - and at $5–8 per agent task, that's real spend, not abstract "inefficiency."

That's why we say "Job Zero" - not "nice to have," not "tech debt to address later." The thing without which the rest of the pipeline (CI, CD, agent loops, auto-evaluation) lacks a foundation to stand on. Just as you wouldn't run production without monitoring, you can't run an agentic workflow without Context Fabric. Nobody says "build a culture of monitoring" - you build the pipeline, set up Grafana, integrate alerts, and you're done. Context is the same kind of infrastructure: a pipeline that automatically collects, structures, and serves knowledge to agents - from ADRs, from commit history, from service topology, from telemetry. Not a wiki to write in, but a system to extract from.

Quick Diagnostic: The Context Pyramid in Your Organization

Before you move on to post #3, locate yourself on the pyramid. Each level conditions the next:

Level 1 - System Context: Can the agent automatically (without asking a human) find out what services exist, how they're connected, and what their schemas are? If not, the agent is operating like a surgeon without an X-ray.

Level 2 - Code Context: Do you have CLAUDE.md/AGENTS.md in your key repositories? If not, the agent guesses conventions from the code, and guesses wrong 30% of the time (because legacy code is no role model).

Level 3 - Org Context: Does the agent know who owns which module, what the SLA is, and who reviews? If not, it generates PRs that technically merge but organizationally won't pass.

Level 4 - Historical Context: Are your ADRs less than 3 months old? If not, the agent makes architectural decisions that the organization consciously rejected.

Level 5 - Operational Context: Can the agent see what's happening in production after a deploy? If not, it optimizes in a vacuum.

Most enterprises are somewhere between 1 and 2. Stripe is at 4–5. That difference explains why Stripe merges 1,300 PRs per week while your team complains that "Copilot doesn't understand our code."

Next post: "Your CI Is an Oracle That Lies" - flaky tests, slow feedback, and the hidden tax you pay on every agent iteration.

This is post 2 of 6 in the VISDOM - The Agent-Ready SDLC series. The series builds its argument from diagnosis, through context and infrastructure, to a maturity model and reference workflow. If you missed post #1 - start with Ferrari.