No, you don't have to read all this. I get it, this is LONG. This page is as much for my agents as it is for you; they load it for context on the full system. If you just want the highlights, here's the TL;DR:
| What | Why You Care |
| Hook-based enforcement across 3 platforms | Agents follow rules even when their context window "forgets" them |
| Over 50 enforcement scripts (Python + Bash + YAML) | Real procedural enforcement, not markdown suggestions |
| Nearly 50 reusable agent skills | Teach your agents once, benefit forever |
| Over 200 Architecture Decision Records | Persistent institutional memory that compounds over time |
| UEAH attribution on every artifact | You always know what was changed & why & by which agent |
| Per-agent sound notifications | Know which agent finished work without looking at the screen |
| Federation pipeline | Write rules ONCE, deploy to Cursor + Claude Code + Gemini CLI identically |
The backstory
I manage cloud platforms (Google Workspace, GCP, AWS) for a large tech startup. A significant chunk of my job involves repetitive operations: managing user accounts, updating group settings, reviewing Jira tickets, checking Slack, updating Confluence documentation, composing emails, managing calendar, etc.
So I started delegating to AI agents. Cursor for orchestration & IDE work. Claude Code and Gemini CLI running in terminal sessions as parallel workers. MCP servers connecting them to Jira, Confluence, Slack, Gmail, Google Drive.
Between the combination of MCPs + CLIs, I could invest my time & cognitive energy into the things that mattered + had more impact, shifting from reactive (waiting for issues / bugs / tickets to surface) to proactive IT management.
However, that access is a loaded gun.
I didn't want to recklessly YOLO & whitelist a bunch of commands for agents without first investigating + building systems to mitigate risks. If we're setting up agents to leverage CLI commands where one bad command could be destructive to repos, I knew I had to take great care to build guardrails, restrictions, & systems to protect our IP & our SaaS platforms.
Every one of those CLI tools is a loaded gun:
| Tool | Risk / Vulnerability | Mitigations |
| GitLab CLI | Read + write access to repos based on local credentials | Hooks to limit commands & scopes; pre-execution scripts scan for safety; 1Password hooks for secrets; granular PATs per-repo |
| Atlassian CLI | Read + write to any Jira + Confluence assets | Granular API scopes; hooks + pre-execution scripts; 1Password hooks |
| 1Password CLI | Injects secrets as env vars. Risk: op read prints raw secrets; compromised agents could extract via op run -- env |
Token Exposure Guard blocks op read • inspects op run sub-commands; break glass mechanism for emergency bypass |
| CLASP | OAuth'd access to any Apps Script project | Pre-execution scripts; 1Password hooks; Apps Script natively retains history |
| GAM | Requires dangerous domain-wide-delegation admin privileges | Hooks to limit commands; pre-execution scripts; 1Password hooks |
So before I could unlock the productivity, I had to build the safety systems. That's what the Agentic Developer Toolkit is.
What's in the box
I've packaged the ~70% of my agentic harness systems which is generic into a standalone framework that any developer can adopt, regardless of what they're building, which IDE they use (or even if they're running ONLY terminal CLIs, no IDE at all), no matter whether they think "vibe coding" is a compliment or an insult.
| Component | What It Does | Why You Care |
| Constitutional AI (per Simon Willison) | Reasoning-rich "soul document" for your agents | Stops agents from doing dumb things when you're not looking |
| 1Password Hooks | 3-tier secret injection + token exposure guard | Your agents can use CLIs safely without leaking credentials to terminal logs |
| Security Harness | Unicode detection, injection scanning, homograph defense | Your agents won't get tricked by invisible characters in emails |
| Hook System | Auto-fires at action boundaries (pre-tool, post-edit, session end) | Agents follow rules even when context windows "forget" them |
| 48 Reusable Skills | Procedural knowledge modules (Jira, Confluence, Git, Terraform, etc.) | Don't re-invent the wheel |
| Event Bus | Cross-agent coordination & message passing | Your agents can talk to each other without you playing telephone |
| Handoff System | Structured context transfer between sessions | No more "what was I doing?" when starting a new session |
| Session Governance | CHANGELOG completeness, ADR recency checks, plan materialization | Catches when agents skip documentation (they ALL do, ~20-40% of the time) |
| Sound Notifications | Audio personas for each agent (Warcraft III unit responses) | Know which agent finished by the alert alone |
IDE support: I don't play favorites (even if I have one)
Per S-ADR-031 (Federated Agent Context Architecture), the toolkit uses .agents/ as the canonical source with IDE-specific discovery bridges:
| IDE / Agent | Config Location | How It Works |
| Cursor | .cursor/rules/, .cursor/skills/ → symlinks |
Native discovery + .agents/ systems |
| Windsurf | .windsurf/rules/, .windsurf/skills/ → symlinks |
Full Cascade support with hooks |
| VSCode + GitHub Copilot | .github/copilot-instructions.md, .vscode/mcp.json |
Copilot instructions + MCP config |
| Claude Code (terminal) | .claude/CLAUDE.md, .claude/settings.json |
Full hook coverage + constitutional summary |
| Gemini CLI (terminal) | .gemini/GEMINI.md, .gemini/settings.json |
Full hook coverage + constitutional summary |
For cross-IDE consistency I do what most multi-tool setups do: canonical configs in .agents/ with symlinks to each IDE's expected location. Update the canonical source, all IDEs see the change. Standard practice, but the table above shows the specific config locations if you're wiring it up.
For CLI agents (Claude Code, Gemini CLI), we compile a constitutional summary directly into their context files so even with tight token budgets they get the essential security stance & decision framework.
The enforcement architecture
I wrote extensively about this in the enforcement series (Part 1: why markdown rules are theater), but here's the core:
Every agent action passes through native hooks before it executes. Each platform has its own hook config (Cursor, Claude Code, Gemini CLI all have different event schemas), but they all route to the same central Python enforcement router. The hooks are the doorbell. They fire at action boundaries & route to enforcement machinery below. They contain zero logic themselves.
The enforcement core has three layers:
| Layer | What It Is | What It Does |
| Hooks | Per-platform hook configs (one per IDE/CLI) | WHEN something fires (event triggers only) |
| Validators | Python modules: security, compliance, quality, orchestration | HOW to check things (enforcement logic) |
| Guard YAMLs | ~20 config files defining blocked patterns, required formats, thresholds | WHAT to look for (rule definitions as data) |
This separation matters more than it seems. Last month I needed to block a new class of shell command (agents running curl | bash patterns). Without the three-layer split I would have had to edit Python code. Instead, I opened a guard YAML, added two lines, and it was live. No code changes. No testing the validator module.
Current inventory: ~20 guard YAML configs spread across four enforcement families (security, compliance, orchestration, quality). The Python validators that consume them total maybe 500 lines across four modules.
UEAH attribution: know which agent wrote what
Here's the problem: your IDE & terminal agents are using your OWN personal credentials. In your local git history, in your repo's history, in your Jira comments, in your Confluence pages. How do you keep track of which agents wrote trash & which agents were spitting fire?
Enter the Universal Edit Attribution Header (UEAH). An idempotent system to include & REQUIRE (via hooks + prompts + skills) agent-specific immutable attributions for every agent's write / creation / modification functions.
Format: UEAH-CUR-20260209-173000-cfmw (IDE-DATE-TIME-RANDOM)
What this gives you:
- Every outbound Jira edit / comment includes a unique UEAH string traceable back to the specific session, model, & environment
- Every edit to Confluence, CHANGELOG, or any external endpoint has the signature
- Per-session, per-agent, per-IDE/terminal discoverability
- If context / agents were corrupted earlier, you can trace & debug the upstream origin
- Agents themselves can search UEAH strings for remediation
Sound notifications: know your agents by ear
With 3+ agents running concurrently, tab switching to check "who did what" wastes time. Most IDEs' OS notifications are generic + all look the same.
Solution: per-agent audio personas using classic Warcraft III unit responses (because my personal purchase of the software back when I still had hair means I have these sound files):
| Agent | Persona | Example |
| Cursor | Peasant (eager, helpful) | "Job's done!" |
| Claude Code | Rifleman (professional, precise) | "Aye, sir" |
| Gemini CLI | Peon (hardworking, direct) | "Zug zug" / "Work work" |
After a few days you begin to instinctively know which agent completed work by the alert alone. It becomes genuine ambient information, not noise. 398 sound files across three universes (Warcraft, Star Trek, Blade Runner).
The Sound MCP is being extracted as a standalone package.
200+ ADRs: the compound interest of documentation
I cannot stress this enough. Every ADR makes future decisions faster because agents can reference prior reasoning. The first 10 ADRs are painful (but start with high quality so you don't replicate low-effort work). By ADR #50 you're writing them in 5 minutes. By ADR #100 the agents are writing them for you, referencing the prior ones, & you're just reviewing.
Key ADRs in the harness:
| ADR | Title | Why It Matters |
| S-ADR-031 | Federated Agent Context Architecture | IDE-agnostic .agents/ canonical structure |
| S-ADR-032 | Universal Agent Token Exposure Prevention | System-wide secret protection |
| T-ADR-010 | UEAH Attribution | Traceable edit chains across agents |
| T-ADR-038 | Agent Action Hooks | 4-category automated enforcement |
| T-ADR-057 | CLI Agent Federation | Terminal agents = full citizens |
| T-ADR-064 | Security Plan Audit | Cross-agent security review (Gemini audited Cursor's designs) |
The P1 OAuth bypass in Gemini CLI
While building all of this, I stumbled into a significant security finding: Gemini CLI's OAuth flow bypasses Google Workspace Enterprise Admin API controls. Our org has "Don't allow users to access any third-party apps" enforced, but Gemini CLI authenticates enterprise users anyway without admin approval.
I filed it on GitHub (#12121) AND Google's internal Buganizer (#455605678). Originally triaged P0, later downgraded to P1. Assigned to a Google engineer. Added to the official Gemini CLI Public Roadmap. Triggered an org-wide Gemini CLI disable via on-device monitoring at our org.
That finding exists BECAUSE of this harness. The security mindset that goes into governing concurrent agents is the same mindset that noticed a Google-owned OAuth Client ID silently bypassing enterprise admin controls.
What I actually learned
After several months building this system:
- Rules without enforcement are suggestions. Agents forget ~20-40% of compliance tasks. Hooks changed everything.
- Sound notifications are not a gimmick. After 48 hours you unconsciously associate sounds with specific agents & specific events. Genuine information, not noise.
- ADRs compound. The first 10 are painful. ADR #50 takes 5 minutes because you have so much prior art.
- Constitutional AI works WAY better than expected. Doesn't need to cover every edge case; just needs to establish a reasoning framework.
- CLI agents are citizens, not second-class. Gemini CLI & Claude Code in terminals deserve the same governance as the IDE agent. Any gap WILL be exploited (not maliciously, but through natural drift).
- Context Engineering > Prompt Engineering. Building systematic context infrastructure is orders of magnitude more effective than clever one-off prompts.
What's NOT in the box
The toolkit is denuded of org-specific content (internal domains, Confluence links, copyrighted sound files, etc.). But I've worked to preserve all the essential architectural patterns, security mechanisms, & skill templates. Think of it like a car chassis — you add your own engine, drivetrain, interior & paint job:
| In the IaC Monorepo | In the Toolkit |
corp-domain.tld |
{{ORG_DOMAIN}} |
| GitLab-specific URLs | {{ORG_GITLAB_URL}} |
| Atlassian Cloud ID | {{ATLASSIAN_CLOUD_ID}} |
| GAM/GWS-specific skills | Excluded (those are our special sauce) |
| Copyrighted sounds | System sound fallback + freesound.org helper |
IF I've done this right (feedback is welcome!) you should be able to clone the repo, steal whatever's useful, delete whatever isn't. Half these standards are subject to change drastically in 2-3 weeks anyway.
lmk if you've built something similar or want to compare notes.
Related reading: Hooks-Based Enforcement (Part 1) | Inside the Enforcement Engine (Part 2) | ADR-First Development | UEAH Attribution | Prompt Injection Defenses | My Knowledge Graph | Notion as Context Pipeline
The internal Confluence version of this system is 39KB, version 5.1, with 125 views & 18 unique readers. Same system, different audience. The source architecture is documented across 200+ ADRs.
John Click is a Senior IT Solutions Engineer. He writes at johnclick.ai & johnclick.dev.
