The agentic governance toolkit I built to stop my agents from wrecking everything

No, you don't have to read all this. I get it, this is LONG. This page is as much for my agents as it is for you; they load it for context on the full system. If you just want the highlights, here's the TL;DR:

What	Why You Care
Hook-based enforcement across 3 platforms	Agents follow rules even when their context window "forgets" them
Over 50 enforcement scripts (Python + Bash + YAML)	Real procedural enforcement, not markdown suggestions
Nearly 50 reusable agent skills	Teach your agents once, benefit forever
Over 200 Architecture Decision Records	Persistent institutional memory that compounds over time
UEAH attribution on every artifact	You always know what was changed & why & by which agent
Per-agent sound notifications	Know which agent finished work without looking at the screen
Federation pipeline	Write rules ONCE, deploy to Cursor + Claude Code + Gemini CLI identically

The backstory

I manage cloud platforms (Google Workspace, GCP, AWS) for a large tech startup. A significant chunk of my job involves repetitive operations: managing user accounts, updating group settings, reviewing Jira tickets, checking Slack, updating Confluence documentation, composing emails, managing calendar, etc.

So I started delegating to AI agents. Cursor for orchestration & IDE work. Claude Code and Gemini CLI running in terminal sessions as parallel workers. MCP servers connecting them to Jira, Confluence, Slack, Gmail, Google Drive.

Between the combination of MCPs + CLIs, I could invest my time & cognitive energy into the things that mattered + had more impact, shifting from reactive (waiting for issues / bugs / tickets to surface) to proactive IT management.

However, that access is a loaded gun.

I didn't want to recklessly YOLO & whitelist a bunch of commands for agents without first investigating + building systems to mitigate risks. If we're setting up agents to leverage CLI commands where one bad command could be destructive to repos, I knew I had to take great care to build guardrails, restrictions, & systems to protect our IP & our SaaS platforms.

Every one of those CLI tools is a loaded gun:

Tool	Risk / Vulnerability	Mitigations
GitLab CLI	Read + write access to repos based on local credentials	Hooks to limit commands & scopes; pre-execution scripts scan for safety; 1Password hooks for secrets; granular PATs per-repo
Atlassian CLI	Read + write to any Jira + Confluence assets	Granular API scopes; hooks + pre-execution scripts; 1Password hooks
1Password CLI	Injects secrets as env vars. Risk: `op read` prints raw secrets; compromised agents could extract via `op run -- env`	Token Exposure Guard blocks `op read` • inspects `op run` sub-commands; break glass mechanism for emergency bypass
CLASP	OAuth'd access to any Apps Script project	Pre-execution scripts; 1Password hooks; Apps Script natively retains history
GAM	Requires dangerous domain-wide-delegation admin privileges	Hooks to limit commands; pre-execution scripts; 1Password hooks

So before I could unlock the productivity, I had to build the safety systems. That's what the Agentic Developer Toolkit is.

What's in the box

I've packaged the ~70% of my agentic harness systems which is generic into a standalone framework that any developer can adopt, regardless of what they're building, which IDE they use (or even if they're running ONLY terminal CLIs, no IDE at all), no matter whether they think "vibe coding" is a compliment or an insult.

Component	What It Does	Why You Care
Constitutional AI (per Simon Willison)	Reasoning-rich "soul document" for your agents	Stops agents from doing dumb things when you're not looking
1Password Hooks	3-tier secret injection + token exposure guard	Your agents can use CLIs safely without leaking credentials to terminal logs
Security Harness	Unicode detection, injection scanning, homograph defense	Your agents won't get tricked by invisible characters in emails
Hook System	Auto-fires at action boundaries (pre-tool, post-edit, session end)	Agents follow rules even when context windows "forget" them
48 Reusable Skills	Procedural knowledge modules (Jira, Confluence, Git, Terraform, etc.)	Don't re-invent the wheel
Event Bus	Cross-agent coordination & message passing	Your agents can talk to each other without you playing telephone
Handoff System	Structured context transfer between sessions	No more "what was I doing?" when starting a new session
Session Governance	CHANGELOG completeness, ADR recency checks, plan materialization	Catches when agents skip documentation (they ALL do, ~20-40% of the time)
Sound Notifications	Audio personas for each agent (Warcraft III unit responses)	Know which agent finished by the alert alone

IDE support: I don't play favorites (even if I have one)

Per S-ADR-031 (Federated Agent Context Architecture), the toolkit uses .agents/ as the canonical source with IDE-specific discovery bridges:

IDE / Agent	Config Location	How It Works
Cursor	`.cursor/rules/`, `.cursor/skills/` → symlinks	Native discovery + `.agents/` systems
Windsurf	`.windsurf/rules/`, `.windsurf/skills/` → symlinks	Full Cascade support with hooks
VSCode + GitHub Copilot	`.github/copilot-instructions.md`, `.vscode/mcp.json`	Copilot instructions + MCP config
Claude Code (terminal)	`.claude/CLAUDE.md`, `.claude/settings.json`	Full hook coverage + constitutional summary
Gemini CLI (terminal)	`.gemini/GEMINI.md`, `.gemini/settings.json`	Full hook coverage + constitutional summary

For cross-IDE consistency I do what most multi-tool setups do: canonical configs in .agents/ with symlinks to each IDE's expected location. Update the canonical source, all IDEs see the change. Standard practice, but the table above shows the specific config locations if you're wiring it up.

For CLI agents (Claude Code, Gemini CLI), we compile a constitutional summary directly into their context files so even with tight token budgets they get the essential security stance & decision framework.

The enforcement architecture

I wrote extensively about this in the enforcement series (Part 1: why markdown rules are theater), but here's the core:

Every agent action passes through native hooks before it executes. Each platform has its own hook config (Cursor, Claude Code, Gemini CLI all have different event schemas), but they all route to the same central Python enforcement router. The hooks are the doorbell. They fire at action boundaries & route to enforcement machinery below. They contain zero logic themselves.

The enforcement core has three layers:

Layer	What It Is	What It Does
Hooks	Per-platform hook configs (one per IDE/CLI)	WHEN something fires (event triggers only)
Validators	Python modules: security, compliance, quality, orchestration	HOW to check things (enforcement logic)
Guard YAMLs	~20 config files defining blocked patterns, required formats, thresholds	WHAT to look for (rule definitions as data)

This separation matters more than it seems. Last month I needed to block a new class of shell command (agents running curl | bash patterns). Without the three-layer split I would have had to edit Python code. Instead, I opened a guard YAML, added two lines, and it was live. No code changes. No testing the validator module.

Current inventory: ~20 guard YAML configs spread across four enforcement families (security, compliance, orchestration, quality). The Python validators that consume them total maybe 500 lines across four modules.

UEAH attribution: know which agent wrote what

Here's the problem: your IDE & terminal agents are using your OWN personal credentials. In your local git history, in your repo's history, in your Jira comments, in your Confluence pages. How do you keep track of which agents wrote trash & which agents were spitting fire?

Enter the Universal Edit Attribution Header (UEAH). An idempotent system to include & REQUIRE (via hooks + prompts + skills) agent-specific immutable attributions for every agent's write / creation / modification functions.

Format: UEAH-CUR-20260209-173000-cfmw (IDE-DATE-TIME-RANDOM)

What this gives you:

Every outbound Jira edit / comment includes a unique UEAH string traceable back to the specific session, model, & environment
Every edit to Confluence, CHANGELOG, or any external endpoint has the signature
Per-session, per-agent, per-IDE/terminal discoverability
If context / agents were corrupted earlier, you can trace & debug the upstream origin
Agents themselves can search UEAH strings for remediation

Sound notifications: know your agents by ear

With 3+ agents running concurrently, tab switching to check "who did what" wastes time. Most IDEs' OS notifications are generic + all look the same.

Solution: per-agent audio personas using classic Warcraft III unit responses (because my personal purchase of the software back when I still had hair means I have these sound files):

Agent	Persona	Example
Cursor	Peasant (eager, helpful)	"Job's done!"
Claude Code	Rifleman (professional, precise)	"Aye, sir"
Gemini CLI	Peon (hardworking, direct)	"Zug zug" / "Work work"

After a few days you begin to instinctively know which agent completed work by the alert alone. It becomes genuine ambient information, not noise. 398 sound files across three universes (Warcraft, Star Trek, Blade Runner).

The Sound MCP is being extracted as a standalone package.

200+ ADRs: the compound interest of documentation

I cannot stress this enough. Every ADR makes future decisions faster because agents can reference prior reasoning. The first 10 ADRs are painful (but start with high quality so you don't replicate low-effort work). By ADR #50 you're writing them in 5 minutes. By ADR #100 the agents are writing them for you, referencing the prior ones, & you're just reviewing.

Key ADRs in the harness:

ADR	Title	Why It Matters
S-ADR-031	Federated Agent Context Architecture	IDE-agnostic `.agents/` canonical structure
S-ADR-032	Universal Agent Token Exposure Prevention	System-wide secret protection
T-ADR-010	UEAH Attribution	Traceable edit chains across agents
T-ADR-038	Agent Action Hooks	4-category automated enforcement
T-ADR-057	CLI Agent Federation	Terminal agents = full citizens
T-ADR-064	Security Plan Audit	Cross-agent security review (Gemini audited Cursor's designs)

The P1 OAuth bypass in Gemini CLI

While building all of this, I stumbled into a significant security finding: Gemini CLI's OAuth flow bypasses Google Workspace Enterprise Admin API controls. Our org has "Don't allow users to access any third-party apps" enforced, but Gemini CLI authenticates enterprise users anyway without admin approval.

I filed it on GitHub (#12121) AND Google's internal Buganizer (#455605678). Originally triaged P0, later downgraded to P1. Assigned to a Google engineer. Added to the official Gemini CLI Public Roadmap. Triggered an org-wide Gemini CLI disable via on-device monitoring at our org.

That finding exists BECAUSE of this harness. The security mindset that goes into governing concurrent agents is the same mindset that noticed a Google-owned OAuth Client ID silently bypassing enterprise admin controls.

What I actually learned

After several months building this system:

Rules without enforcement are suggestions. Agents forget ~20-40% of compliance tasks. Hooks changed everything.
Sound notifications are not a gimmick. After 48 hours you unconsciously associate sounds with specific agents & specific events. Genuine information, not noise.
ADRs compound. The first 10 are painful. ADR #50 takes 5 minutes because you have so much prior art.
Constitutional AI works WAY better than expected. Doesn't need to cover every edge case; just needs to establish a reasoning framework.
CLI agents are citizens, not second-class. Gemini CLI & Claude Code in terminals deserve the same governance as the IDE agent. Any gap WILL be exploited (not maliciously, but through natural drift).
Context Engineering > Prompt Engineering. Building systematic context infrastructure is orders of magnitude more effective than clever one-off prompts.

What's NOT in the box

The toolkit is denuded of org-specific content (internal domains, Confluence links, copyrighted sound files, etc.). But I've worked to preserve all the essential architectural patterns, security mechanisms, & skill templates. Think of it like a car chassis — you add your own engine, drivetrain, interior & paint job:

In the IaC Monorepo	In the Toolkit
`corp-domain.tld`	`{{ORG_DOMAIN}}`
GitLab-specific URLs	`{{ORG_GITLAB_URL}}`
Atlassian Cloud ID	`{{ATLASSIAN_CLOUD_ID}}`
GAM/GWS-specific skills	Excluded (those are our special sauce)
Copyrighted sounds	System sound fallback + freesound.org helper

IF I've done this right (feedback is welcome!) you should be able to clone the repo, steal whatever's useful, delete whatever isn't. Half these standards are subject to change drastically in 2-3 weeks anyway.

lmk if you've built something similar or want to compare notes.

The internal Confluence version of this system is 39KB, version 5.1, with 125 views & 18 unique readers. Same system, different audience. The source architecture is documented across 200+ ADRs.

John Click is a Senior IT Solutions Engineer. He writes at johnclick.ai & johnclick.dev.