How the enforcement engine actually works

This is Part 2 of my series on agentic enforcement. Part 1 covered why rules files are theater and how hook-based enforcement at action boundaries changes that. This post goes deep into the actual machinery: the Python scripts, YAML configs, and adjudication logic that make it all work.

Where We Left Off

In Part 1 I described the three-layer architecture: hooks fire at action boundaries, validators contain the enforcement logic, and guard YAMLs define the rules as data. I showed that hooks by themselves are just event triggers — smoke detectors without sprinklers.

Now I want to open up each layer and show you what's actually inside.

The Directory Structure (and Why It's Not Arbitrary)

.agents/hooks/
├── executor.py
├── lib/
│   ├── adjudication_engine.py
│   ├── compliance_validators.py
│   ├── security_validators.py
│   ├── quality_validators.py
│   ├── orchestration_handlers.py
│   ├── token_circuit_breaker.py
│   └── ... (12 utility modules)
├── compliance/ (7 YAMLs)
├── security/ (7 YAMLs)
├── quality/ (4 YAMLs)
├── orchestration/ (4 YAMLs)
├── hook-manifest.yaml
└── rollout-config.yaml

The structure mirrors the four enforcement families. Each family has a lib/ Python module and a YAML config directory. The Bash scripts in the root are thin wrappers that handle shell-level plumbing and delegate to executor.py for actual logic.

executor.py: The Router

Every hook across all three platforms calls executor.py with JSON on stdin. The executor's job is straightforward:

Parse the incoming JSON event
Look up which validators apply (from hook-manifest.yaml)
Run those validators, passing the event context
Collect results (pass/fail/warn + messages)
Run the adjudication engine to make a final decision
Write the result as JSON to stdout with the appropriate exit code

The executor itself is maybe 80 lines of Python. It doesn't contain enforcement logic — it's a router.

The Validator Modules: What They Actually Check

`compliance_validators.py`

Handles UEAH attribution and PARA cross-linking. The UEAH check triggers on any afterFileEdit event targeting CHANGELOG.md. It reads the diff, extracts new lines, and runs the regex from changelog-ueah.yaml. If the new content doesn't contain a valid UEAH tag, the validator returns block.

What makes this non-trivial is edge cases I kept hitting in practice: agents that copy an old UEAH tag instead of generating a fresh one, agents that put the tag in a code block where it doesn't render, agents that write the date format wrong. Each check exists because I hit the failure in production first.

`security_validators.py`

The most complex module. CLI command scanning reads cli-command-guard.yaml and applies pattern matching against shell commands. Unicode sanitization checks for zero-width joiners, bidirectional override characters, and homograph characters. Token exposure scanning uses both regex patterns (for known key formats like sk-..., AKIA..., ghp_...) and entropy analysis for random-looking strings in assignment contexts. Prompt injection detection scans for "ignore previous instructions", role-reassignment attempts, and system prompt extraction.

`quality_validators.py`

The lightest module but the one my sanity depends on. Sound triggers play distinct audio notifications based on agent and event type. Context drift tracks semantic distance between original task and recent actions. This catches the classic "I asked you to fix the CSS and you're refactoring the database" failure mode.

`orchestration_handlers.py`

Anti-spiral detection tracks action hashes over a sliding window and fires when it sees too many near-duplicates. Handoff validation ensures structured handoffs include required fields: task description, current state, files modified, blockers, and UEAH tag.

The Guard YAMLs: Schema and Conventions

Every guard config follows the same shape:

trigger: <event_type>
target_files: [<glob patterns>]
action: block | warn | monitor
message: "<human-readable>"
# Then family-specific fields:
pattern: "<regex>"
threshold: <number>
allowed_patterns: [...]
blocked_patterns: [...]

The consistency means validators share parsing code. One convention I wish I'd established earlier: every YAML has a message field that produces the exact text the agent sees on failure. Early on I was generating messages in Python code, which meant updating user-facing text required a code change.

The Adjudication Engine: Gradual Rollout

adjudication_engine.py reads rollout-config.yaml, which maps every guard to its current enforcement level:

guards:
  changelog-ueah:
    level: enforce
    since: 2026-02-10
  para-links:
    level: warn
    since: 2026-02-15
  context-drift:
    level: monitor
    since: 2026-03-01
  anti-spiral:
    level: warn
    since: 2026-02-28

The rollout path: monitor (logs only, watch for false positives) → warn (agent sees message, action proceeds) → enforce (hard block). I've had guards stay in monitor for a month because the false positive rate was too high. Without the adjudication layer, I'd have had to choose between "deploy and break things" or "don't deploy at all."

The Federation Pipeline

Rather than maintain three platform-specific configs that inevitably drift, I have compile scripts:

compile-hook-settings.py — reads hook-manifest.yaml, generates platform configs
compile-constitution.py — assembles per-platform rule files from shared source
validate-federation.sh — checks all generated configs are in sync

These run as part of the session-start hook. Every new session confirms all platforms are in sync.

Health Checks

check-hook-integrity.py — every guard in manifest has a YAML config and a validator
check-executor-integrity.py — all validator modules import without errors
check-policy-drift.py — compiled configs match what manifest says they should contain

These run in CI and on-demand. If a health check fails, it means someone (probably me, probably at 2am) edited a validator without updating the manifest.

The Running Inventory

The full stack is a few dozen files: four Python validator modules, about a dozen utility modules, a handful of Bash wrappers, over 20 YAML guard configs, federation compile scripts, and health checks. All stdlib Python; zero external dependencies. Every file registered in Notion with correct file type, path, and cross-database relations.

lmk How You're Handling This

I genuinely want to know: if you're running AI agents in production, how are you handling enforcement?

Pure rules files and hoping for the best?

Custom hooks?

Something I haven't thought of?

Hit me up at johnclick.ai or johnclick.dev.

Part 2 of the Agentic Enforcement series. Part 1: Markdown is Agent Enforcement Theater. Based on T-ADR-038 and the Agentic Developer Toolkit enforcement stack.

John Click is a DevOps / IT Platform Engineer building agentic governance infrastructure for enterprise AI agent deployments.