TRANSMISSION
ArticleFEATURED

How I Run 7 Concurrent AI Agents on One Branch Without Worktrees

OS-level file locking + tiered write guards + read guards = zero conflicts across 5 months

|10 min read
Agentic AIPythonDevOpsClaude Code
A note on what you're reading: The architecture + design patterns described here are real and running in production. But I'm not publishing my actual code, file paths, regex patterns, or config structures. The code samples below were written specifically for this post to illustrate the concepts without exposing attack surface. Think of them as "faithful recreations" rather than copy-paste excerpts. The real implementation lives in the monorepo's ADR trail + a private Notion workspace.

TL;DR

  • Running 3-7x concurrent agents (Gemini CLI + Claude Code + Cursor + Codium) on the SAME branch, SAME repo, no worktrees
  • Zero file corruption across ~5 months of production use on a 150+ ADR monorepo
  • Three implemented layers: OS-level file locking, tiered write protection, read guards
  • A fourth layer (orchestrator-level locks with priority queuing) is designed but not yet needed

No, You Don't Need Worktrees

I get it. The instinct when you hear "multiple agents editing the same codebase" is to reach for git worktrees. Separate working directories, separate checkouts, merge when done. Clean. Safe. Boring.

Here's why that doesn't work for my setup:

Factor Worktrees Single Branch
Context sharing Each agent sees its own tree; changes invisible to others until merge All agents see the same files in real-time
Merge conflicts Guaranteed when agents touch overlapping areas Prevented at write-time by locking
Disk space N copies of the repo One copy
Config files Duplicated per worktree One set of configs; agents share coordination state
IDE integration Each worktree needs its own window/session Agents operate in the same project context

The whole point of my agentic harness is that agents share state. The task system, the handoff protocol, the event bus, the session-state directory; all of it lives in a shared coordination layer and all agents need to read + write to it. Worktrees would mean N copies of that state, and now you're solving distributed consensus instead of file locking. No thanks.

What I Built Instead

Three layers, each solving a different problem. Each one implemented independently; you don't need all three to get value.

Layer What It Does Protects Against Status
OS-level locks Exclusive write access to shared data files Two agents writing the same coordination file simultaneously ✅ Implemented
Tiered write guard Block / checkpoint / warn on writes to sensitive files Agent overwriting credentials, governance configs, etc. ✅ Implemented
Read guard Block reads of sensitive files Agent reading credential material or key files ✅ Implemented
Orchestrator locks Centralized lock management with priority queue + TTL High-contention scenarios with 10+ agents ⏳ Designed, not yet needed

Layer 1: OS-Level File Locking

This is the foundation. Every coordination script uses Python's fcntl.flock() for atomic file access.

Here's an illustrative version of the locking pattern (not my production code, but faithful to the concept):

import fcntl
import signal
import json
from pathlib import Path

class CoordinationLock:
    """
    Illustrative file locking for multi-agent coordination.
    Real implementation details differ from what's shown here.
    """
    def __init__(self, coordination_file: Path, timeout_seconds: int):
        self.target = coordination_file
        self.timeout = timeout_seconds

    def _on_timeout(self, signum, frame):
        raise TimeoutError(
            f"Another agent is writing to {self.target.name}. "
            f"Retry in a few seconds."
        )

    def write_record(self, record: dict) -> bool:
        """Append a record with exclusive locking."""
        signal.signal(signal.SIGALRM, self._on_timeout)
        signal.alarm(self.timeout)

        try:
            with open(self.target, "a") as f:
                fcntl.flock(f.fileno(), fcntl.LOCK_EX)
                try:
                    f.write(json.dumps(record) + "\n")
                    f.flush()
                    return True
                finally:
                    fcntl.flock(f.fileno(), fcntl.LOCK_UN)
                    signal.alarm(0)
        except TimeoutError:
            signal.alarm(0)
            return False

The key design choices:

  • Fail-fast timeout: if you can't get a lock quickly, something is wrong. Blocking an agent for 30 seconds while another agent finishes a write is worse than failing + retrying. (The exact timeout was tuned after one of the worker agents reviewed the spec and said "that's too generous." Agents reviewing each other's ADRs is genuinely useful.)
  • Append mode: agents add records to a coordination file; they don't rewrite the whole thing. This keeps the locked window tiny (milliseconds, not seconds).
  • OS-managed cleanup: fcntl.flock() locks release automatically when the holding process exits. No stale locks from crashed agents. No cleanup scripts. This is the killer feature.

The Lifecycle

Agent wants to write a coordination record
  ↓
Opens the shared file in append mode
  ↓
Attempts exclusive lock with timeout
  ├── Lock available → Write record → Release → Done
  └── Lock held → TimeoutError → Retry with backoff

Why This Is Enough (For Now)

The limitation of OS locks is no visibility; you can't see who holds the lock or how long. For 3-7 agents, that hasn't mattered because lock hold times are measured in milliseconds. The orchestrator layer will add visibility when scale demands it.

Layer 2: Tiered Write Guard

OS locks protect shared data files. The write guard protects everything else that agents shouldn't be modifying. Credential files, key material, governance configs, infrastructure state.

Three Tiers of Protection

Tier Action Effect
Block Hard stop, no exceptions Hook returns a blocking exit code. Agent cannot proceed.
Checkpoint Requires human approval Agent stops + presents the proposed change for review.
Warn Allowed but audited Warning issued, audit trail created, agent continues.

What falls into which tier? I'm not publishing that. (Publishing "here are the exact files we hard-block" also tells you "here are the files we DON'T hard-block.")

Indirect Write Detection

Direct file writes via agent tools are easy to intercept. But agents also write via shell commands: redirects, pipe-to-file, inline sed, etc. The guard parses shell commands to detect these indirect writes.

Here's an illustrative example of how you might approach shell write detection (not my production patterns):

def detect_shell_write_target(command: str) -> str | None:
    """
    Illustrative shell write detection.
    Production uses different patterns + additional layers.
    """
    import re

    # These are EXAMPLE patterns, not the real detection set.
    examples = [
        r'>\s*(\S+)',           # redirect: echo x > file
        r'tee\s+\S*\s*(\S+)',  # tee: ... | tee file
    ]

    for pattern in examples:
        match = re.search(pattern, command)
        if match:
            target = match.group(1).strip("'\"")
            return target

    return None

The important design principle: this detection layer is complemented by other protections. No single layer needs to be perfect.

Self-Protection

The guard config protects itself from modification. If an agent tries to "helpfully" weaken the guard to unblock itself, the edit gets blocked. (Is this paranoid? Maybe. But I've seen agents try.)

Layer 3: Read Guard

The write guard blocks writes. The read guard blocks reads. You don't want agents reading credential material "just to check the format."

Two categories: hard deny (agent can't read the file, period) and soft ask (agent gets a warning + the human gets notified). I'm not publishing which files fall into which category.

The Unified Hook Executor

All three layers plug into a central hook executor. Every agent action fires through this before executing. It validates multiple categories in sequence:

Category Examples
Compliance Attribution checks, changelog governance, ADR enforcement
Security Write guards, read guards, CLI command scanning, injection detection
Quality Context drift monitoring, notification routing
Orchestration Handoff validation, anti-spiral detection
Budget Per-session token consumption tracking

If any validator in the chain returns a blocking result, the action stops. The executor runs across ALL agents (Cursor, Claude Code, Gemini CLI, Windsurf) with per-platform event mapping.

Staged Rollout

Not every hook starts in "block mode." A rollout config controls graduated enforcement:

monitor (log only) → warn (surface to agent) → enforce (block on violation)

Some hooks start at monitor and earn their way to enforce after stabilization. Others start at enforce and stay there permanently.

The Planned Orchestrator Layer

For completeness: the orchestrator-level locking system has been fully designed (with its own ADR) but not implemented. It adds things the OS-level locks don't provide:

Feature OS Locks Orchestrator
Lock types Exclusive only Read / Write / Intent-Write
Visibility Opaque State file shows all active locks + queue
Priority FIFO only Critical / High / Normal / Low
TTL Process lifetime Configurable with renewal protocol
Starvation prevention None Auto-promotion after wait threshold

Lock Compatibility Matrix

Held ↓ / Request → read intent_write write
read ✅ Grant ✅ Grant ❌ Queue
intent_write ✅ Grant ✅ Grant ❌ Queue
write ❌ Queue ❌ Queue ❌ Queue

Multiple concurrent reads are fine. Any write is exclusive. intent_write lets an agent signal "I'm about to modify this" without blocking readers yet.

Why It's Not Implemented

The three current layers handle everything I've thrown at them for 3-7 concurrent agents. The complexity of orchestrator locks isn't justified at current scale. The design is ready for when it is.

How Agents Learn the Protocol

A lock protocol skill teaches agents the procedures: check lock state before writing, follow the decision matrix, always release on error, handle contention by working on something else. The skill is federated into every IDE config directory so all agents discover it automatically.

Real-World Concurrency Pattern

Terminal 1: Cursor (primary)  — editing a governance handler
Terminal 2: Claude Code       — writing a task coordination record
Terminal 3: Gemini CLI        — reviewing an ADR + writing feedback
Terminal 4: Cursor (parallel) — updating the changelog

What happens:
- Claude Code acquires lock on the coordination file, writes, releases. ~50ms.
- Cursor edits the handler directly (single writer, no lock needed)
- Gemini CLI reads the ADR (read-only, no lock needed)
- Cursor (parallel) writes the changelog (attribution check fires, no file lock needed)

Lock contention: zero. Each agent touches different files.

In practice, coordination writes take ~50ms and the fail-fast timeout has NEVER been hit in production. The lock is insurance, not a bottleneck.

The ADR Trail

Every design decision in this system is documented in its own ADR. The file locking protocol, the task architecture, the hook enforcement framework, the tiered write guard, the cross-agent hook wiring, and the orchestrator design each have full context + decision rationale + consequence analysis.

The ADRs aren't just documentation; they're executable context. Each hook references its governing ADR, and when an agent gets blocked, the error message includes the ADR reference so the agent (or the human) can look up why the rule exists.

What's Next

  • Orchestrator implementation: when scale demands it
  • Contention metrics: automated monitoring to surface patterns I can't see today
  • Cross-repo locking: when agents operate across multiple repos

The system is intentionally simple. fcntl.flock() is a 1970s Unix primitive. signal.alarm() is barely more sophisticated. But simple primitives composed into layered defense have kept 7 concurrent agents from destroying my monorepo for months. Sometimes the unsexy solution is the right one.