Sandboxing LLM coding agents: part1

Context and Motivation

LLM coding agents moved fast from cloud demos to tools running on developer workstations. They don't just suggest code anymore. They execute it. They start shells, install packages, edit repos, run tests, and sometimes open PRs. All with the same permissions you have.

Once we allow tool execution (running commands), next to the usual problem: "is the model right?", there's a new one: "what happens when it makes a mistake?" Here, mistakes might mean running incorrect - or even malicious - commands. Models might misunderstand vague instructions or follow injected context, which they shouldn't. Adversarial text buried in dependencies or docs can influence them.

You can't verify every action an agent takes. You can verify the isolation layer: show that no matter what the agent tries, it can't touch sensitive files or networks you didn't allow. That doesn't solve everything, but it turns agentic workflows from blind trust into something you can audit.

Visdom 2.0 AI-Native SDLC

The Missing Layer Between AI Coding and Production

Why This Matters

Developers report agents running rm -rf on the wrong directory. Home folders vanish. Others watch agents discover cloud credentials and spin up infrastructure without permission. Bills pile up. Worse yet, production data could be destroyed. In adversarial scenarios, agents can be abused to exfiltrate secrets.

A malicious model isn't required for these failures. It happens when you give a non-deterministic system broad privileges and assume good intentions or system prompts are enough. As coding agents get more capable and more integrated into daily work, sandboxing stops being theoretical. It's the only responsible way to run them.

This article will explore the threats posed by and sandboxing tools for coding agents running on a local workstation.

Threat Model and Attack Surface

The actor is an LLM coding agent with execution privileges. It can access the filesystem, the network and execute shell commands. It runs on a developer's machine. It's pitched as a productivity assistant, but from a security angle, it's an autonomous process with delegated authority.

Core Assumption

The agent is not reliably correct or well-behaved. It might follow ambiguous instructions or treat comments or other input as directives. Training artifacts can cause unexpected behavior. Tool interfaces can be misused (e.g., parameter confusion or encoding issues). Once you allow execution, you can't assume that "the agent knows better" or that alignment will prevent harm.

Scope

The scope is risks from running an LLM coding agent with execution privileges on a developer's machine.

The focus is on containing the agent, not verifying the code it writes. If you treat the agent as an untrusted process with dangerous privileges, you can reason about isolation and blast radius using familiar systems concepts. You can apply sandboxing techniques instead of hoping better prompts or smarter models will fix the problem. Instead of predicting behavior, you design containment that limits the damage.

Bugs, logic errors, insecure APIs, missing validation, or other vulnerabilities in generated code are out of scope. Those are real problems, but they're not unique to agents. They already exist with human-written code and can be handled through review, testing, static analysis, and secure dev practices. Mixing those concerns with agent containment blurs the actual security boundary.

Attack Surface

When an LLM coding agent runs on your workstation, its attack surface looks like yours. Same account, same permissions. The difference isn't what it can do — it's how predictably it acts. If context or instructions direct it somewhere, it will exercise the available permissions.

Filesystem Access

An agent that can read anywhere can stray past the project directory into config files, SSH keys, browser profiles, or other data that was never in scope. Write access is worse: it enables destructive operations and subtle tampering with files not in version control, changes that won't get reviewed or rolled back.

Shell and Process Execution

This gives the agent the same authority as your terminal. It can install software, spawn background processes, and invoke compilers or other powerful tools to achieve things it couldn't do directly. Even without malicious intent, agents can create long-lived processes or modify system state.

Project-Embedded Execution

Modern development workflows run code directly from the repository: git hooks, build scripts, test runners, package lifecycle scripts. If an agent modifies these, the changes might not alert you right away. They execute later during normal work, delaying impact and making it hard to trace back to the agent.

Network Egress

If outbound connections are open, the agent can exfiltrate data, fetch untrusted payloads, or establish command-and-control channels. From the outside, this traffic looks like normal developer activity.

Credentials and Identity Artifacts

Your machine is full of auth material: SSH keys, cloud tokens, API keys, credential helpers that silently grant access. If an agent can read or invoke these, it can impersonate you across services, potentially long after the session ends. The impact spreads beyond the local machine.

Persistence Mechanisms

With enough privileges, an agent can register scheduled tasks or adjust startup configs so changes survive reboots. Package installs and upgrades increase the risk: post-install scripts or version downgrades can quietly weaken security. Even well-intentioned changes leave a state that's difficult to notice or fully unwind.

Resource Exhaustion

Runaway builds and infinite loops use up CPU and memory, or disk space. In adversarial cases, the same capabilities can be used for cryptomining. Usually visible through degraded performance or unexpected usage, but the disruption and cost make execution limits worthwhile.

When you run an LLM coding agent, you hand your authority to a non-deterministic process. The attack surface tells you what needs to be constrained or denied.

Overview of Available Sandboxing Tools

This overview covers sandboxing tools that run locally on a developer workstation. Hosted or production-oriented platforms aren't included. The focus is on what you can install and use as part of your local workflow.

Local sandboxing tools use several different approaches to isolation. The most important distinctions are how the isolation works and how you use the agent. Tooling choice depends on whether the agent runs from the CLI or inside an IDE and if it needs to operate containers or access a browser. Those often matter more than isolation strength.

OS-Level Isolation

At the lighter end are OS-level isolation mechanisms that share the host kernel.

On Linux, tools built on bubblewrap restrict filesystem views, environment variables, and capabilities with low startup overhead. Projects like sandbox-run and Anthropic's Sandbox Runtime (SRT) use this to sandbox individual commands without containers or VMs.

On macOS, sandbox-exec provides a similar primitive. Higher-level wrappers like Sandboxtron and Anthropic SRT offer prebuilt profiles.

Container-Based Isolation

Containers are a practical middle ground. Docker- and Podman-based projects like Agent Sandbox, TSK, and Leash benefit from mature tooling and broad community support.

Only Agent Sandbox and Sandcat provide direct IDE integration. They use a docker-compose file that works as a Devcontainer. This makes it straightforward to run the agent inside VS Code or JetBrains.

Other projects use containers but wrap them in custom runtimes. You could try IDE integration via "Remote Development" extensions, but it's more complicated. A few container tools, notably TSK and Agent of Empires support assigning tasks to multiple agents in parallel for "agent swarm" workflows.

Container security depends on the runtime environment and how the container runs, including Docker daemon config. Most projects run agents as a non-root user inside the container. Exceptions: Leash and Agent of Empires run as root, which increases risk if the agent is compromised. Capability restrictions vary: TSK applies the most restrictive set, Agent Sandbox drops some, and the rest operate with defaults.

VM-Based Isolation

VM-based isolation introduces a separate kernel and a clear safety boundary against accidental host damage, but it comes with operational friction: slower boot times, resource allocation, and persistent VM state you need to maintain or clean up.

Lightweight VM approaches use KVM or macOS Virtualization.framework. Projects like Microsandbox, Matchlock, and Docker's AI Sandboxes reduce startup latency and improve the experience, but they still add complexity compared to containers. The listed VM-based projects are designed for headless usage and don't support running an IDE inside the VM.

Audit Capabilities

Audit helps you understand what happened after a failure or compromise. It's also useful for policy refinement. If a task fails because of an overly restrictive rule, detailed logs show exactly which file path or network destination was denied, so you can decide what to allow next time. Without that visibility, you're guessing or disabling isolation to get work done.

Across the projects audit capabilities are the biggest gap: isolation mechanisms are common, but good, developer-friendly observability into what the agent did is rare. Projects that route traffic through an HTTP proxy expose proxy logs, which at least show outbound network activity. Outside that, you're stuck with generic tools like strace and container or VM logs: low-level and cumbersome. Leash stands out as the only project offering a complete filesystem and network audit trail out of the box, with structured telemetry and a UI.

Choosing a Sandbox

There's no single sandbox that fits everyone. OS-level sandboxes are lightweight. Containers will often be a familiar technology and have good IDE compatibility. VMs offer the strongest isolation, but at a higher operational cost.

Projects based on system primitives can lack features. Most don't isolate the network at all. Some tools like Anthropic SRT can only deny specific paths (read anything by default). This limits how tightly you can constrain the agent.

Docker and Docker Compose are already used in many development workflows. This lowers adoption friction for container-based sandboxes. A declarative docker-compose.yml is transparent and easy to inspect. The isolation model is easier to reason about compared to opaque runtime systems.

Some advanced container projects build custom control planes on top of the Docker API. While powerful, these abstractions can obscure what's happening at the isolation boundary. Volume mounts and network configuration are encoded in application logic rather than expressed declaratively. This makes the system harder to audit.

VMs increase complexity further. Beyond running a guest OS, acceptable performance requires non-trivial practices: snapshot management, layered disk images, optimized file sharing via mechanisms like FUSE. These aren't concepts most application developers routinely work with. VMs strengthen the host protection boundary, but they add an order of magnitude more operational and conceptual overhead.

Finally, there's a trust question. The goal is to sandbox an untrusted LLM coding agent, but you have to trust the sandbox itself. Simpler systems built on well-understood, widely audited primitives reduce the trusted computing base. More feature-rich frameworks offer convenience and orchestration, but they expand what you must trust. Sandboxing shifts the trust boundary, it doesn't eliminate it.

Project	File Isolation	Network Isolation	Audit	Isolation Mechanism	macOS	Linux	Supported Agents
Anthropic SRT GitHub	Read allowed everywhere by default, write denied by default; only specific paths can be denied or allowed	HTTP/SOCKS allowlist	macOS: system logs; Linux: strace	sandbox-exec / bubblewrap	✅	✅	Any command
scode GitHub	Project RW; 35+ sensitive paths (creds/personal) blocked by default; configurable via --allow/--block	❌ Online/Offline only	Violation logs; scode audit UI	sandbox-exec / bubblewrap	✅	✅	Any command; agent-agnostic (Claude, Codex, OpenCode, Goose, Gemini, Pi, Qwen, …)
nono GitHub	Project RW; configurable read/write	HTTP proxy; secret injection; TCP port allowlist	JSON audit logs	sandbox-exec / Landlock	✅	✅	Any command; Claude Code, Codex, OpenCode, OpenClaw, Swival
sandbox-run GitHub	Project RW; system RO; HOME remapped; extra host paths optionally bound	❌ Shared host network	None	bubblewrap	❌	✅	Any command
Sandboxtron GitHub	Project RW; limited HOME RW (caches); rest of host RO	❌ Offline: localhost only; Online: full	None	sandbox-exec	✅	❌	Any command
Agent Sandbox GitHub	Project RW	MITM proxy	Proxy logs	Container + mitmproxy + iptables	⚠️ (Docker)	✅	Claude Code, Codex CLI, GitHub Copilot CLI
TSK GitHub	Project RW; task state persisted outside project	Squid proxy	Proxy logs	Container + Squid + iptables	⚠️ (Docker)	✅	Claude Code, Codex
Leash GitHub	Project RW; optional host config mounts	Cedar	FS + Net + MCP telemetry; UI	Container + Cedar policies	⚠️ (Docker)	✅	Claude, Codex, Gemini, Qwen, OpenCode
Agent of Empires GitHub	Project RW; git worktrees; host configs mostly RO; persistent auth volumes	❌ Container inherits host	Container logs	tmux sessions; optional per-session container	⚠️ (Docker)	✅	Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI, Cursor CLI, Copilot CLI, Pi.dev
Packnplay GitHub	Project RW (git worktree); credentials RO or copied into container	❌ Container inherits host	Container logs	Container	⚠️ (Docker)	✅	Claude Code, Codex, Gemini; Any command
Sandcat GitHub	Project RW	Transparent MITM proxy; secret injection	Proxy logs (mitmweb UI)	Container + mitmproxy + WireGuard	⚠️ (Docker)	✅	Any command; Claude Code (auto onboarding)
release-engineers/agent-sandbox GitHub	Project copy; patch-file writeback	tinyproxy	Proxy logs	Container	⚠️ (Docker)	✅	Claude Code; any command; patch-based workflow
SafeYolo GitHub	Project RW	MITM proxy	Proxy logs	Container	⚠️ (Docker)	✅	Claude Code, OpenAI Codex
Vagrant Blog	Project RW via 2‑way VM sync	❌ Full internet	VM logs	Full VM	✅	✅	Any command
Matchlock GitHub	Project RW via COW filesystem inside microVM	MITM proxy; secret injection	Proxy logs + VFS hooks	nftables DNAT rules / gVisor userspace TCP/IP	🍏 Apple Silicon only	⚠️ KVM only	Any command / SDK (Go, Python, TypeScript)
Vibe GitHub	Project RW; package caches shared	❌ Full internet	VM logs	Virtualization.framework VM	🍏 Apple Silicon only	❌	Claude Code, Codex, Gemini
Docker Sandboxes Docs	Project RW via 2‑way VM sync	HTTP(S) proxy	None	MicroVM	✅ (Docker Desktop)	⚠️(documented: Ubuntu)	Claude Code, Codex, Copilot, Droid, Gemini, Kiro, OpenCode, Docker Agent; not extensible
Microsandbox GitHub	Isolated; volumes as configured	Flexible rules via --net-rule grammar	None	MicroVM (libkrun)	✅	✅	Any command; Native MCP; SDKs (Py, JS, Rust)
claude-code-safety-net GitHub	❌ command interception only	❌ No network isolation	JSON logs	Plugin/hook layer (not a sandbox)	✅	✅	Claude Code, OpenCode, Gemini CLI, Copilot CLI

In the second part of the article, we will look at the Practical Example: Sandboxed Single-Agent Setup.

Sandboxing LLM coding agents: part 1