Giving your agents just enough freedom

Egor Kraev

As Claude Code writes most of my code now (under close supervision), I am naturally very aware of the bottlenecks in that flow. Right now for me, they were twofold:

Firstly, having only one copy of the codebase on disk meant the agent could only work on one branch at a time. This is a problem because a typical workflow for me is one Claude Code instance iterating on the issues highlighted by CodeRabbit/SonarQube PR reviews, on that PR’s branch, and another agent plowing ahead on a different branch building completely new logic — or maybe more than one instance of each kind of agent. I didn’t want to juggle git worktrees manually or even with hand-rolled scripts (git is fiddly enough as it is).

Secondly, me always having to press “approve” on every little python snippet or bash command that the agent wants to run is a hassle and a major brake on the agents; but I’m not willing to let the agent run wild on my whole machine either. And no, Claude Code’s “auto” mode is not a solution here, as it just swaps one probabilistic, black-box mechanism for another.

Branch juggling with GitKraken

I solved the first challenge by using GitKraken, which allows to spin up an agent with one button click, auto-creating a worktree for it. I found it really handy for juggling branches, in particular it shows all the branches in a nice diagram, and you can drag to merge, at which point GitKraken gets the branch from its worktree if needed, and does the merge.

⭐ Star on GitHub · Quick start · Open an issue

Sandboxing without giving up control

The second challenge was more subtle. What I wanted was precisely the following: let Claude do whatever it wants inside a sandbox (containing the freshly spawned worktree), and also allow it to read from GitHub (eg CodeRabbit reviews) without approval, but always ask for approval for git or gh commands that have any external effects, such as git push or creating a PR (both of which have to be called unsandboxed to get the credentials), as well as for any other unsandboxed command.

There is an option to sandbox Claude Code by just adding a couple of lines to settings.json, but Claude can bypass it at will by setting “DangerouslyBypassSandbox: true” on a tool call 😛. Fortunately, this can be controlled deterministically by a PreToolUse hook that checks for that flag.

So I could achieve the behavior I wanted by combining the sandboxing with a PreToolUse hook that checks for the “DangerouslyBypassSandbox” flag, and if that is true, forces CC to ask me unless the command is one of the whitelisted gh or git commands. The latter is harder than it sounds, as Claude Code often emits multi-stage commands chained by &&, > or |, and the hook should ideally decompose that into its component parts, and auto-approve the command if the parts are innocuous, but not otherwise.

Now I got the best of both worlds: each agent can run wild on the worktree that was spawned for it, and can fetch data from GitHub without asking me, but anything else I have to approve.

The remaining unsolved flow

The one flow that I haven’t found a good solution for yet is the case where a project legitimately contains multiple repos (current example: a text-to-SQL benchmark repo, the SLayer repo that I’m benchmarking, and a repo with agent harnesses that actually use SLayer to convert text to SQL, and I’m refining the latter as well as fixing bugs in the first two as part of the same flow).

I haven’t come across a single solution that would gracefully juggle worktrees for all the repos involved, in sync (I don’t mean hand-rolled scripts) - have you?