TRANSMISSION
ArticleFEATURED

How I built per-agent sound notifications with Warcraft voice lines

Cursor is the Peasant. Claude Code is the Rifleman. Gemini CLI is the Peon. After 48 hours you stop hearing words and start hearing status.

|12 min read
Agentic AIDevOpsPythonOpen Source

The moment I knew this whole agentic engineering thing had gotten out of hand was when I heard a Warcraft II Peasant say "I don't wanna do this!" from my laptop speakers and instinctively thought: ah, Cursor's starting a new session and it's feeling salty about it.

That's not a joke. That's an actual notification in my system. When a Cursor agent session starts, there's a small chance it plays peasant-dont-want-to-do-this.mp3 instead of the standard greeting. It's an easter egg. It fires about 3% of the time. And every single time it catches me off guard and makes me laugh.

This post is about how I built a sound notification system that gives each of my numerous AI agents a unique audio persona, why it's simultaneously the silliest and most useful thing in my entire agentic stack, and how a childhood of playing Warcraft II taught me more about multi-agent observability than any SRE textbook.


The Problem That Sounds Trivial (And Isn't)

When you're running one agent, you don't need sound notifications. You're watching it work. You can see what it's doing.

When you're running dozens of agents across three platforms — Cursor as orchestrator with 4 parallel workers, 5x concurrent Claude Code instances running inside of terminal sessions, Gemini CLI running in a handful more — you physically cannot watch them all.

Is this all a bit ridiculous? Well yes, I have however wired them up & gone ham-wild on their respective harnesses to enforce file-locks, immutable handoffs, require per-agent attribution, & more; you can read about that here: ___

My agents are spread across multiple terminal tabs (running inside of IDEs most often), and/or multiple IDE panels, often on multiple monitors.

Work is happening in parallel and you have no idea which agent just finished, which one hit a blocker, which one is waiting for input, or which one is (could be) stuck in a recursive loop burning tokens (I have hooks + scripts to stop just this, but it happens).

For a while, I tried just checking terminals periodically.

That lasted about two days before I missed a critical error that had been sitting in a Claude Code session all day while I was focused on back-to-back meetings and forgot to review agent outputs / HiTL questions, etc..

I needed ambient awareness. I needed to hear my agents.

But we’ve solved this before, why re-invent the wheel?

I found myself reminiscing about being a Warcraft II or III player & hearing my workers or warriors with quick quips, peasants, peons, swordsmen — without looking at them, without clicking on them, just by the sounds they make, sometimes when I wasn’t paying attention to them - “Job’s Done!”


The Warcraft Insight

I've been playing Warcraft since I was a kid. Warcraft II, Warcraft III, thousands of hours. And one of the brilliant design decisions in those games is that every unit type has distinct voice lines.

When a Peasant finishes building something, it says "Job's done!" When a Peon acknowledges an order, it says "Zug zug." When you click on a Rifleman, it says "Aye, sir."

After hundreds of hours, this becomes completely automatic. You don't consciously process the words.

You just know — from the sound alone — that your gold mine is finished, that your barracks started training, that an Orc just acknowledged an order.

It's auditory anthropomorphization, and it's one of the most effective observability patterns ever designed.

So I did what any reasonable person would do: I ripped the voice lines from Warcraft II and Warcraft III and assigned them to my AI agents.


The Persona System

Each agent gets a persona — a character whose voice lines map to different agent actions. The mapping follows what I call actor-centric sound design: the sound tells you WHO is acting, then WHAT they're doing.

The original system used a single universe — Warcraft:

Agent Warcraft Persona Why This Character
Cursor (orchestrator) Human Peasant The humble worker who does everything. "Ready to work." "Yes, me lord." "Job's done!"
Claude Code Dwarven Rifleman Disciplined, professional, precise. "Aye, sir." "Brilliant." "My pleasure."
Gemini CLI Orc Peon The enthusiastic heavy lifter. "Something need doing?" "Zug zug." "Work, work."

The character assignments aren't random.

Cursor is the Peasant because Peasants in Warcraft are the versatile unit that does everything — they build, they mine, they chop wood.

Cursor is an orchestrator, the agent that does a bit of everything.

Claude Code is the Rifleman because Riflemen are precise, analytical, and captivating.

Gemini CLI is the Peon because Peons are the raw-force workers who just put their heads down and go.

And the easter eggs? Those are the annoyed voice lines you get when you click on a unit too many times in Warcraft.

  • Peasant: "I don't wanna do this!"
  • Peon: "Me not that kind of Orc!"
  • Rifleman: "No!" (followed by uncomfortable silence).

These notifications fire at low probability on routine events, and they are the best part of this entire system.


Then it got out of hand

Warcraft was great for playful contexts. But sometimes I'm in a focused debugging session and a Peasant yelling "MORE WORK?!" is... not the vibe I want.

I needed a second register — something more measured, more analytical, for when the work is serious.

So I added Star Trek: The Next Generation notifications.

Agent Warcraft Persona STNG Persona When Each Fires
Cursor Human Peasant Captain Picard Warcraft for routine work, Picard for delegation and significant decisions
Claude Code Dwarven Rifleman Commander Data Warcraft for quick tasks, Data for analytical work
Gemini CLI Orc Peon Lt. Geordi LaForge Warcraft for grunt work, Geordi for infrastructure and engineering

Picard saying "Make it so" when a handoff chain completes? That fires at 3% probability and I'm telling you, the dopamine hit is real.

Data saying "I am fully functional" when Claude Code finishes a complex debugging task? Perfect.

Geordi saying "Structural integrity is holding" when Gemini CLI completes a health check? Chef's kiss.

And then — because apparently I have no self-control — I added a THIRD universe. Blade Runner.

Agent Blade Runner Persona Archetype
Cursor Officer K (2049) Obedient replicant worker. Clean, precise. Baseline test cadence.
Claude Code Rick Deckard (1982) Methodical detective. Deliberate. Low synth tones.
Gemini CLI Joi (2049) Holographic assistant. Ethereal, warm synth.
Sub-agents LAPD Spinner Vehicle ambient. Engine hum for spawn, landing for complete.

The sub-agent one is a nice refreshing moment.

When a Cursor sub-agent spawns, you hear a Spinner engine starting up.

When it completes and returns, you hear the Spinner landing.

The sub-agent lifecycle has a physical sound that maps to takeoff and landing.

That metaphor works absurdly well.


The Actual Architecture

Behind the fun, there's a real system. The sound coordinator is a Python module that handles:

Actor-centric selection. Every sound request starts with WHO (which agent persona), then WHAT (what action). The coordinator looks up the persona's sound pool for that action and picks from the available options:

# actor-sound-mappings.yaml
personas:
  cursor:
    warcraft_character: peasant
    stng_character: picard
    action_sounds:
      completing_work:
        warcraft: [peasant-job-done.mp3, peasant-jobs-done-alt.mp3]
        stng: [STFC-picard-engage-intense.mp3, STFC-picard-make-it-so.mp3]
      easter_eggs:
        warcraft: [peasant-im-not-listening.mp3, peasant-dont-want-to-do-this.mp3]

Anti-fatigue logic. A recency window prevents the same sound from playing twice within the last 5 plays.

Per-agent cooldown prevents spamming.

Global rate limit caps at 10 sounds per minute.

Without this, 15 agents completing tasks simultaneously would be an audio war zone.

Easter egg probability. Each easter egg has a configurable probability (1-8%) and a context trigger.

  • Data's "Humor, I love it" only fires when Claude Code completes work and the previous action was user-initiated.
  • The Borg's "Resistance is futile" only fires when a prompt injection is blocked.
  • “You must construct additional pylons,” fires at 1% probability when a resource is exhausted (my token-circuit-breakers hook).

Persona enforcement. A guard script ensures agents can NEVER play sounds from another agent's persona. Claude Code (Deckard/Rifleman/Data) cannot play K's sounds. Cursor (K/Peasant/Picard) cannot play Deckard's sounds. This is enforced by hook, not by trust. Because, as we've established, agents ignore rules.


The Worf Problem

Some sounds are actor-agnostic. Security alerts, for example. When the injection scanner blocks a prompt injection attempt, you want to know about it regardless of which agent triggered it.

For these, I have ambient sounds — not tied to any persona. MCP operations get combadge chirps and tricorder sounds. Security events get transporter effects. And serious security blocks?

Worf says "They've adapted."

The Borg Collective says "Resistance is futile."

These are ambient sounds that any agent can trigger, and they hit different because they're rare. You hear the Borg exactly when something actually got blocked. It's an alarm, but it's an alarm that makes you grin while you investigate.


The Pavlovian Effect

I wrote that sound notifications are "not a gimmick," and I meant it, but I also want to be honest about why they're not a gimmick, because the reason is weirder than I expected.

After about 48 hours of using the system, I stopped consciously hearing the sounds. Instead, I developed automatic associations:

  • Peasant's "Job's done" → Cursor finished something routine → I don't need to look
  • Data's "Aye, sir" → Claude Code acknowledged a handoff → It's working on it
  • Peon's "Something need doing?" → Gemini CLI started a new task → Expected
  • Red Alert sound → Security event → I should look NOW
  • Picard's "Make it so" → A delegation chain completed → That was a multi-step workflow

This is exactly what happens after thousands of hours of Warcraft.

You stop parsing the words and start parsing the meaning.

The sounds become compressed status updates — you get information density that would take 10-15 seconds to gather + tediously think about by checking terminals, reading outputs, and interpreting the “what’s next,” moment for myself.

Instead, all this gets delivered neatly + tightly inside of a 2-second audio cue you process without context-switching.

That's also a great delivery method for a small Pavlovian dopamine hit.

It's not about the nostalgia (although the nostalgia is nice).

It's about the information + cognitive bandwidth.

Sound is a sensory channel that's completely wasted in most development environments, though lately seems in some offices people are taking to whispering to their agents because everyone is dictating to their AI agents these days.

I could never - I prefer typing (I type rather fast) & so I always, always, always have headphones on. Now, even during meetings, I know when I need to check on an agent who is finished with something I wanted done or who needs me to review their outputs / questions.

I filled the auditory gap it with operational data and now I have ambient awareness of however many different concurrent agents I want to run at once without looking at a single terminal until I’m ready to do so.


The Sound Library

For the completists among you: nearly 400 sound files across all three universes. A couple dozen Warcraft canonical sounds (Peasant, Peon, Rifleman, ambient work sounds), a larger set of STNG sounds (Picard, Data, Geordi, Worf, Borg, ambient bridge sounds), and a production-ready Blade Runner set (Deckard, K, Joi, Spinner). A handful of easter eggs with probability-weighted triggers. Anti-fatigue built in from day one: recency window, per-agent cooldown, global rate cap.

The MCP server that plays them is a standalone local and open source.

Any developer can use it — you just point it at your own sound directory and spend a bit of time massaging the YAML mappings and it works with any agent platform that supports MCP.


How to Build This For Yourself

You don't need 398 sound files and three cinematic universes. Here's the minimal version:

1. Pick 3-5 sounds per agent. Session start, task complete, error, needs input. That's enough. Use whatever sounds make you happy — game SFX, movie quotes, custom recordings, or even just specific system sounds.

2. Write a YAML mapping. Agent name → action type → sound file. Keep it simple.

3. Build a tiny MCP server (or a bash script) that reads the YAML and plays the right sound via afplay (macOS) or paplay (Linux).

4. Wire it to your hooks. session-start → greeting sound. afterFileEdit → completion sound. It's just a script call in your hooks.json.

5. Add anti-fatigue from day one. Even with 5 agents, hearing the same sound 30 times in an hour will make you disable the whole system. Recency tracking is not optional.

The elaborate persona system, the three-universe architecture, the probability-weighted easter eggs — that's what happens when you let a system evolve over months.

Start with 15 sound files and a bash script. You'll naturally grow it once you feel how much better your ambient awareness gets.

Or you could just start with the Peasant saying "I don't wanna do this" for every session start. That alone is worth the entire setup.


The sound notification system is part of my own Agentic Developer Toolkit / Harness.

The MCP sound server is available as a standalone open-source package.

Architecture documented in S-ADR-021 (Actor-Centric Sound Design Philosophy), T-ADR-017 (Universal Sound Architecture), and T-ADR-027 (Multi-Universe Sound Architecture).

John Click is a DevOps / IT Platform Engineer who has definitely spent too much time on this and regrets nothing. He writes at johnclick.ai and johnclick.dev.