What happens when you make AI agents write ADRs before code

TL;DR

ADR-first development = write the decision record BEFORE writing the code
The compounding is what surprised me; by ADR #15, agents write them unprompted + reference prior reasoning
Four tiers: Strategic, Tactical, Operational, Quality Engineering (QE-ADRs for bugs are the part most people skip)
I enforce it with hooks, not trust. T-ADR-073 fires on every file edit in governed directories.
200+ ADRs later, the registry IS the institutional knowledge. Any agent on any platform can access it.

What's an ADR + Why Should You Care?

If you've never encountered them: an ADR is a short document that captures a single architectural decision. Problem statement, decision, alternatives considered, consequences, status. That's it. Michael Nygard popularized the format in 2011; the adr.github.io community hub has templates + tooling if you want to go deep on the history.

Applying ADRs to AI agent workflows specifically isn't a new idea at this point. me2resh published a formal Agent Decision Record spec with YAML frontmatter for agent + model + trigger fields. Osmani writes about ADR discipline as part of agentic engineering. Multiple teams are building multi-agent systems that auto-generate ADRs from code repos.

I'm not claiming I invented this. What I am claiming is that I've been doing it for several months across 200+ iterations, and the compounding effects over that scale are what most writers don't cover. That's the part worth reading.

The Compounding Effect

The first 5 ADRs are painful. You're writing 500-word documents for decisions that feel obvious. "We decided to use Cursor as the primary IDE." Yeah, of course you did. Why does this need a formal record?

Here's why: by ADR #15, the agents are writing them for you. Not because the models got smarter, but because the discipline is reinforced at every layer of the stack: a /create-adr command walks agents through the type + problem + alternatives flow. A adr-reference-deployment skill teaches them how to deploy JSDoc-style @adr tags in governed files. An adr_governance_pre_block hook fires on every write/edit across Claude, Gemini, Windsurf, and Cursor. And a pre-commit hook catches anything that slips through. By ADR #15, agents have seen enough high-quality samples in context that they're drafting ADRs unprompted + citing prior art. You're spending 2 minutes reviewing what used to take 15 minutes writing.

By ADR #200 (where I am now), the ADR registry has become more useful than I expected it to be.

In some ways, it's more valuable than the code. It's certainly MORE valuable than fully-delegated fully-vibe-coded code, because ADRs preserve your LLMs' discernment ABOUT the code, not just the raw diffs. ADRs capture the reasoning; reasoning transfers across contexts in a way that code doesn't. Osmani makes a similar point about planning consuming 70% of effort while execution compresses to 30%; ADRs are how I make that 70% persistent.

I view this as a method to RETAIN the inference layer you're burning tokens on. You're spending the tokens anyway; burn a handful more to put that reasoning to paper, add it to the codebase, and get more intelligent decisions as a result. I think it's a net-gain in token efficiency even if the per-session burn rate looks higher; if you can cull a certain number of thrashing sessions + remediations from using ADRs, it's a net win.

Four Tiers of ADRs

Not everything needs a full ADR. I'm not a lunatic. Here's the system:

Tier	Type	When	Example
1	Strategic (S-ADR)	Foundational decisions that rarely change	"We use Notion as our knowledge management backbone"
2	Tactical (T-ADR)	Implementation decisions; where most action lives	T-ADR-038: Agent Action Hooks for Standards Enforcement
3	Operational	Admin changes, config, CHANGELOG-only stuff	Documentation fixes, minor dependency bumps
4	Quality Engineering (QE-ADR)	Bugs + errors. Postmortems with teeth.	QE-025: Silent security gate bypass from wrong event names

Tiers 1 + 2 get the full treatment: problem statement, alternatives, decision, consequences. Tier 3 just needs a CHANGELOG entry. Tier 4 is where it gets interesting.

The three tiers create a knowledge graph. Strategic decisions inform tactical decisions. Tactical decisions generate quality events when they fail. Quality events drive corrections that create new tactical decisions. The feedback loop closes itself.

QE-ADRs: When Bugs Get Their Own ADR

When an agent encounters a bug, the natural impulse is to fix it immediately. Spot the error, write the patch, move on.

It's also how you fix the symptom and miss the root cause.

My QE-ADR protocol forces the agent to stop + investigate before fixing:

🔴 QE-ADR CHECKPOINT: Bug/Error Detected

I've encountered: [SPECIFIC ERROR MESSAGE]

Before attempting fixes, I will:
1. Create QE-ADR to document the investigation
2. Form and test hypotheses systematically
3. Identify the root cause with evidence
4. Propose solution only after RCA is complete

The QE-ADR requires the agent to form at least 2-3 hypotheses about what caused the bug, evaluate evidence for and against each one, and ONLY then identify the root cause + propose a fix.

QE-025 is the one that made me stop questioning the overhead. I thought Cursor event names were correct but they were actually Claude Code's naming convention; a category confusion that caused a silent security gate bypass. If the agent had just "fixed" the obvious symptom (the hook wasn't firing), it would have patched the wrong thing. The QE-ADR investigation found the real root cause.

Something I didn't expect: when you tell an agent to produce a structured output documenting its reasoning (the QE-ADR), the quality goes up compared to just asking it to "think harder." Asking a model to ULTRATHINK is one thing. Telling it to produce a structured record of that thought process? Different ballgame. They have to commit it to paper. Other people are finding the same thing; Piethein Strengholt's multi-agent ADR writer chains a writer → validator → formatter pipeline for the same reason: structured output with a validation step catches things a single pass misses.

The Metadata That Makes ADRs Agent-Friendly

A basic ADR has a title, status, and body. What makes mine work for agents is the metadata:

id: T-ADR-038
status: accepted
depends_on:
  - T-ADR-009  # Agent Output Attribution Protocol
  - T-ADR-010  # Multi-Agent Edit Attribution Protocol
  - S-ADR-018  # Universal PARA Linking Discipline
related_to:
  - T-ADR-028  # Agent Event Hook System
  - T-ADR-108  # Cursor hooks.json schema alignment
supersedes: null
superseded_by: null

Pro-tip: use YAML headers on your ADRs, same pattern as SKILL.md files in the .agents/skills/ framework. Agents already know how to parse YAML frontmatter from skills; borrowing the same structure for ADRs means they don't have to learn a new format. The frontmatter goes at line 1 of every ADR file.

The depends_on and related_to fields are the graph edges. When an agent needs to make a decision about hook enforcement, it can search for ADRs related to T-ADR-038, find the dependency chain, and understand the full reasoning context in seconds. (These edges are part of the broader knowledge graph visualization that connects ADRs, projects, technologies, and proficiencies.)

The supersedes / superseded_by fields handle evolution. When T-ADR-108 corrected the Cursor event names that T-ADR-038 originally got wrong, T-ADR-038 wasn't deleted; it was amended with a cross-reference. Historical reasoning is preserved even when the specific implementation changes.

The Enforcement Stack (Four Layers Deep)

For the first few months, ADR-first was a Cursor rule; a line in .mdc that said "you MUST create an ADR before modifying source code." And like all rules in context windows, agents sometimes forgot. Context would compact and the next file edit would happen without the ADR checkpoint. Dark period in the repo.

So I built four layers of reinforcement. Here's what the stack actually looks like:

Layer	What It Does	Where It Lives
Command	`/create-adr` walks the agent through type + problem + alternatives, generates from templates with YAML frontmatter	`.agents/commands/create-adr.yaml`
Skill	`adr-reference-deployment` teaches agents to deploy `@adr` JSDoc tags in governed files + look up the directory→ADR mapping	`.agents/skills/adr-reference-deployment/SKILL.md`
Runtime hook	`adr_governance_pre_block` fires on every write/edit/replace/bash across all 4 agent platforms; checks whether a governing ADR exists	`.agents/hooks/rollout-config.yaml`
Git hook	Pre-commit catches anything that slipped through. UEAH completeness, dependency changes, protected files.	`.githooks/pre-commit`

The command gives agents a structured flow. The skill gives them the patterns. The hook catches them if they skip the first two. The pre-commit hook is the safety net. You need all four because each one covers a different failure mode.

`@adr` Tags in Code

Every governed file in the repo gets JSDoc-style ADR headers in the comments. When an agent touches a file in terraform/, the skill tells it to add:

# === ADR GOVERNANCE ===
# @adr {S-ADR-001} Hybrid Architecture
# @adr {T-ADR-004} Terraform-GAM Interop
# =====================

There's a directory→ADR mapping table in the skill so agents know which ADRs govern which directories. terraform/ → S-ADR-001 + T-ADR-004. .agents/hooks/ → S-ADR-011 + T-ADR-016. Commit messages get the same treatment: Implements \T-ADR-004` for custom schemas.`

The tags are searchable. rg -l "@adr\\s*\\{" scripts/ terraform/ config/ tells you which files are governed and by what. An agent that's about to modify terraform/modules/users.tf can grep for its @adr tags and immediately understand the architectural constraints before writing a line of code.

The Registry Auto-Regenerates

A separate hook (adr_registry_regen) fires whenever an ADR file is modified and regenerates ADR_REGISTRY.json. That registry is the searchable index agents use when the /create-adr command asks them to find related prior decisions. The registry stays fresh without any human maintenance.

Rollout: Monitor → Warn → Enforce

Every hook family follows the same graduation path. The adr_governance_pre_block started in monitor (logs only, no blocking) for 30 days. Graduated to warn (agent sees the warning but isn't blocked) where it sits now. Heading toward enforce (hard block) once the regression tests pass. (Full details on that pattern in Hooks-Based Enforcement for AI Agents.)

Anti-Patterns to Watch For

The practice only works if the ADRs are genuine. I actively watch for:

Anti-Pattern	What It Looks Like	Why It's Dangerous
ADR Theater	Writing the ADR after the code is already written	Reasoning was reconstructed, not discovered. The whole point is ADR PRECEDES code.
Rubber Stamp	"Alternatives Considered" lists one option + dismisses it in a sentence	If you didn't genuinely evaluate alternatives, you didn't do the work.
Single-Hypothesis QE-ADR	Jumping to the first explanation without considering alternatives	If you can't explain WHY the bug happened (not just what was broken), you haven't done RCA.
Orphan ADR	ADR not linked to code, commits, or tickets	An ADR in isolation isn't participating in the knowledge graph.

When I catch these, I flag them. Not punitively; but because the whole system depends on ADR quality. One rubber-stamp ADR that an agent later references as prior art propagates bad reasoning forward through the dependency graph.

Getting Started

You don't need 200 ADRs or a pre-edit enforcement hook to start.

Start with 5 ADRs for your 5 biggest decisions. What language are you using and why? What hosting platform? What database? What CI/CD pipeline? These are decisions you've already made; writing the ADR just documents the reasoning retroactively. Takes maybe an hour total.

Use a consistent template. Problem, Decision, Alternatives, Consequences, Status. Don't overcomplicate the format. The value is in the reasoning, not the formatting. Joel Parker Henderson's template repo has a dozen options to start from. AWS Prescriptive Guidance covers the review + adoption process if you need something more structured for a team.

Put them where your agents can find them. A docs/ directory in your repo works. So does a Notion database. (Why not both?) Some people go further and add an instruction to agents.md so agents auto-create ADRs on architectural changes. Whatever you choose, what matters is agents can search + retrieve them.

Don't write thin ADRs just to hit a count. One high-quality ADR with genuine reasoning is worth ten one-paragraph ADRs that just say "we decided to use X." The Alternatives section is what makes an ADR valuable; skip it and you've just written a commit message with extra steps.

Establish discipline early. If you write shoddy ADRs to start, agents will mimic what they've seen; they are fabulous pattern replicators. By contrast, if you go overkill + design the shit out of your early ADRs, agents will write similarly high-quality ADRs as they read your historic ones. You're priming the context window with good samples of what you expect.

By 20-30 ADRs, you'll notice agents referencing them unprompted. By 50, you won't be able to imagine working without them.

ADR-first development for AI agents is becoming a recognized practice. Here's what 200+ iterations taught me. The ADR template, four-tier system, and metadata schema are all part of the Agentic Developer Toolkit.

John Click is a DevOps / IT Platform Engineer. He writes at johnclick.ai and johnclick.dev.

What happens when you make AI agents write ADRs before code

TL;DR

What's an ADR + Why Should You Care?

The Compounding Effect

Four Tiers of ADRs

QE-ADRs: When Bugs Get Their Own ADR

The Metadata That Makes ADRs Agent-Friendly

The Enforcement Stack (Four Layers Deep)

@adr Tags in Code

The Registry Auto-Regenerates

Rollout: Monitor → Warn → Enforce

Anti-Patterns to Watch For

Getting Started

What happens when you make AI agents write ADRs before code

`@adr` Tags in Code