A team of agents (PM, Eng, QA) tackles my Linear tickets while I’m driving

Name: Agent Army
Author: Agent Army

San Francisco, CA — February 10, 2026

Last week the weather in the Bay Area was fantastic and I really wanted to take a road trip and take my eyes off the monitor. But with all the OpenClaw craze going on, it was getting really hard to step away.

Inspired by @steipete to take my work on the road and delegate my ideas to OpenClaw, I went on the trip and came back to several open PRs. Over the next 7 days, the team cumulatively closed over 150 tickets across 4 projects. Here's my setup and what I learned.

Stage 0: Just asking OpenClaw to do work

At first I just messaged my OpenClaw agent and asked it to tackle Linear tickets directly. It worked okay. But I quickly realized my workflow naturally splits into 3 stages:

Research: I prompt the agent to do deep research on the ticket topic, look over all the code in the repo, and propose an adequate solution
Plan: I ask it to summarize everything and drop the plan back into the Linear ticket as a comment (kind of like Plan mode in Claude Code)
Execute: I ask it to implement the proposed solution by breaking it down into digestible smaller sub-tickets, then code each one

The results were almost good. But “almost” adds up. Sometimes the tests don't pass. Sometimes the build compiles with errors. Sometimes there are review comments from review agents like CodeRabbit that need to be addressed. So I'd send it back to the PR and ask it to fix things up.

A few things I learned the hard way:

Agents stumble when instructions are too concrete and rigid. They need room to reason about the problem, not just follow a script
Scout (QA) would consistently ignore PR review comments unless explicitly reminded to check for them in its heartbeat
Context windows fill up fast when one agent is doing research, planning, and coding. Quality degrades noticeably toward the end of long tasks

A single agent doing all three stages works, but it's slow and the context gets messy. The agent that researched the problem is now also trying to debug a failing CI pipeline. Not great.

The insight: 3 agents > 1 agent

I found that having three separate agents, each with a clear role, is actually better and faster than asking a single agent to do everything.

So I split responsibilities, just like in a human team:

Agent	Role	What it does
Juno	Product Manager	Picks up Linear tickets, researches requirements, reads the codebase, breaks work into sub-issues, writes acceptance criteria, and assigns tasks
Titus	Lead Engineer	Takes assigned tickets, writes production code via Claude Code, runs builds and tests, opens PRs on GitHub, responds to review feedback
Scout	QA Engineer	Reviews PRs against acceptance criteria, runs tests, auto-fixes test failures with Claude Code, flags issues back to Linear

Each agent has its own personality defined in a SOUL.md file, its own tools configuration, and its own persistent memory. They coordinate through Linear (ticket status), GitHub (PRs and reviews), and Slack (notifications).

How they stay busy: Heartbeats

Each agent runs a heartbeat, a loop that fires every 60 seconds and checks for new work. This is the key difference from just prompting an agent once and walking away.

Juno's heartbeat checks Linear for new tickets that need breakdown. Titus watches for tickets assigned to it in “Ready” status. Scout monitors open PRs that need review.

No polling from my side. No babysitting. I assign a ticket to the PM, and the pipeline kicks off automatically. Juno researches and plans. Titus picks it up and codes. Scout reviews the PR. If Scout finds issues, it either fixes them directly or files a bug back to Linear, and the cycle continues.

All context that's generally lost when agents switch tasks is persisted to Linear. Agents always know what's going on and I get visibility.

The workflow in practice

Here's what a typical ticket looks like flowing through the system:

I create a ticket in Linear (or just message Juno on Slack)
Juno picks it up on the next heartbeat, reads the codebase, researches the problem, writes a technical plan with sub-tasks, and assigns it to Titus
Titus picks up the sub-tasks, writes code using Claude Code, runs the build, and opens a PR
Scout picks up the PR, checks it against the acceptance criteria Juno wrote, runs tests, and either approves it or flags what needs fixing
If something fails, Titus gets a new ticket or Scout addresses it directly

All while I'm driving through Big Sur with no cell service.

Deploying the team: Agent Army

As powerful as they are, changing the agents was a pain. I wanted to update skills, tweak heartbeats, and experiment with their context windows.

At first I did everything manually. SSHing into servers, editing config files, restarting OpenClaw instances. It got old fast.

So I built ClawUp, a CLI tool that handles the whole thing.

$ npx army-create my-project # scaffold agent identities & config

$ clawup setup # validate secrets & integrations

$ clawup deploy # provisions 3 servers, installs everything

That's it. Three agents, each on their own cloud instance (AWS or Hetzner), connected via a Tailscale mesh VPN, pre-configured with OpenClaw, Claude Code, Linear, GitHub, and Slack integrations. The first setup is the longest, and honestly, the most annoying part isn't the infrastructure. It's the accounts. OpenClaw agents behave like humans with computers, and they really need their own accounts with proper permissions. GitHub is the worst offender here. They actively block agents from creating accounts. Slack is similar. I ended up manually creating GitHub and Slack accounts for each agent, which felt absurd. The whole setup would've been 10x faster if agents could just sign up for email (with something like AgentMail) and create their own accounts. After that initial pain, it's a single command to deploy or destroy.

Each agent is defined by a set of workspace files:

presets/

├── base/ # Shared config (AGENTS.md, BOOTSTRAP.md, USER.md)

├── pm/ # Juno: SOUL.md, IDENTITY.md, HEARTBEAT.md, TOOLS.md

├── eng/ # Titus: SOUL.md, IDENTITY.md, HEARTBEAT.md, TOOLS.md

├── tester/ # Scout: SOUL.md, IDENTITY.md, HEARTBEAT.md, TOOLS.md

└── skills/ # Reusable skills (ticket prep, PR testing, etc.)

You can use the built-in presets, tweak them, or define completely custom agents.

Clean slate resets

One thing I'm currently experimenting with: completely resetting the context and redeploying the agents with a clean slate between major tasks.

OpenClaw agents accumulate context over time, and sometimes that context gets stale or contradictory. A fresh deploy gives you agents that start from the latest version of your presets with zero baggage.

$ clawup destroy -y && clawup deploy -y

Takes about 5 minutes. I do this roughly once a day or whenever I'm shifting to a different codebase / project.

What it costs

On Hetzner, three CX22 instances (2 vCPU, 4GB RAM each) run about $18–22/month. On AWS with t3.medium instances, it's closer to $110–120/month. Plus your Anthropic API usage. I have the Max plan and I'm running very close to the limit.

I use Hetzner for development. It's ~80% cheaper and more than enough for this workload.

Try it

Agent Army is MIT licensed. The repo is at github.com/stepandel/agent-army and the docs are at docs.clawup.sh.

Install it, deploy my presets, or build your own team of agents. The presets are a starting point. The real power is in customizing the SOUL.md, HEARTBEAT.md, skills and plugins for your specific workflow.

If you're spending time waiting for your Claude Code to finish work, go drive through Big Sur instead. Juno, Titus, and Scout will hold down the fort.

— Stepan