What happens when you give an AI a project file and walk away. Building karl, an autonomous multi-agent development loop that wrote its own code, tests, and architecture decisions and the uncomfortable lessons about cost, control and what engineering becomes next.
I did not write karl. karl wrote karl.
That sentence should bother you a little. It bothers me, and I'm the one who did it. karl is an autonomous multi-agent development loop, a glorified bash script that reads a product requirements document, selects the next unfinished ticket and orchestrates a bunch of AI agents to plan, review, architect, test, implement and deploy the feature. Then it merges and picks up the next ticket. It does this until every story passes or your API budget is gone. The entire project, 3,800 lines of bash, 660 tests, the README, the documentation, everything was produced by karl itself. My contribution was a JSON file describing what I wanted. I pressed enter and went to get coffee. Several coffees, actually.
karl is expensive as hell, and I want to be honest about that. Each ticket fires a minimum of six Claude invocations: planner, reviewer, architect, tester, developer, deployment gate. The rework loop, where developer and tester go back and forth until tests pass, can multiply that by ten. With --instances 3, you're running three of these pipelines in parallel across isolated git worktrees. Each invocation burns through a context window in no time.
I didn't track the exact spend for karl building itself. I should have. What I can say is that 24 user stories across roughly ten days of autonomous execution consumed a non-trivial portion of a Max subscription. This is not a tool for the budget-conscious. It's a tool for the curious with coins to spare. If that makes you uncomfortable, good. It should. The economics of autonomous agent loops are genuinely unsettled. What's cheap enough to justify running unsupervised? I don't know yet. Nobody does.
Everyone asks "Why the hell a bash script?" — karl needs to orchestrate shell commands — git operations, file I/O, Claude CLI invocations — and persist state to disk between completely stateless agent runs. Bash does all of this natively. No runtime, no dependencies beyond git, jq, and the Claude CLI. It runs on macOS and Linux without a build step. The entire thing is chmod +x and go. There's a deeper reason. The core design constraint is fresh agents with no cross-ticket memory. Every agent invocation starts with a clean context window. All durable state lives in files. This is the same "persist to disk, forget everything else" philosophy behind Geoffrey Huntley's Ralph loop. When your persistence model is files and your orchestration model is shell commands, bash is not the wrong choice. It's the honest one. No framework abstractions hiding what's actually happening. Every agent invocation is a claude command. Every state transition is a file write. Every branch operation is a git command. You can read the entire system in an afternoon.
karl's own PRD has just over 20 user stories. I started karl with its own PRD pointed at its own codebase. It completed most of it in a single day — roughly ten hours of autonomous execution. By day two the core pipeline was functional end-to-end. I watched terminal output scroll. I did not write code. I occasionally killed the process when it was clearly stuck and restarted it, which felt uncomfortably like managing a junior developer. The architecture decision records are the most unsettling part. The architect agent made technical decisions and documented them in a format I'd accept from a human architect. Not because the reasoning was brilliant, but because it was sufficient. Clear trade-offs, clear rationale, clear consequences.
Single-instance karl works. You run it, it loops, tickets get done. The multi-instance mode is where things get interesting and occasionally terrible. Each worker claims a ticket atomically using POSIX locks, creates a git worktree for filesystem isolation, runs the full pipeline, then serializes the merge to main through a merge arbitrator. A coordinator periodically checks for file overlap between workers. On paper, this is elegant. In practice, it's a hot mess. No amount of upfront design catches a deadlock that only manifests when an LLM-generated implementation fails a test four times in a row on a ticket whose children block three other tickets across two different parent stories. If you want to play around with it be aware that this will burn through your funds in a speed that is difficult to comprehend.
I've been writing about the shift from writing code to designing constraints. karl is the practical test of that thesis. No hand-holding. No code review. Just constraints and a loop. It worked better than I expected and worse than I'd like. This is either the future of software engineering or a very expensive way to generate bash scripts. Probably both.
If you are still wondering if you should use this: If you have API credits to burn and a well-defined PRD, karl will ship it. Not perfectly. Not cheaply. But autonomously. If you expect production-grade reliability from a project entirely written by an AI in a ten-day loop, you will be disappointed. karl is an experiment in what autonomous agent loops can do today. It is not a tool for shipping your company's production code. It is a tool for exploring where the boundary is between "constraints I define" and "code that exists."
Link to Github: https://github.com/kayoslab/karl
I create, I explore, I learn — never full, always hungry.