Context and Motivation
LLM coding agents moved fast from cloud demos to tools running on developer workstations. They don't just suggest code anymore. They execute it. They start shells, install packages, edit repos, run tests, and sometimes open PRs. All with the same permissions you have.
Once we allow tool execution (running commands), next to the usual problem: "is the model right?", there's a new one: "what happens when it makes a mistake?" Here, mistakes might mean running incorrect - or even malicious - commands. Models might misunderstand vague instructions or follow injected context, which they shouldn't. Adversarial text buried in dependencies or docs can influence them.
You can't verify every action an agent takes. You can verify the isolation layer: show that no matter what the agent tries, it can't touch sensitive files or networks you didn't allow. That doesn't solve everything, but it turns agentic workflows from blind trust into something you can audit.
Why This Matters
Developers report agents running rm -rf on the wrong directory. Home folders vanish. Others watch agents discover cloud credentials and spin up infrastructure without permission. Bills pile up. Worse yet, production data could be destroyed. In adversarial scenarios, agents can be abused to exfiltrate secrets.
A malicious model isn't required for these failures. It happens when you give a non-deterministic system broad privileges and assume good intentions or system prompts are enough. As coding agents get more capable and more integrated into daily work, sandboxing stops being theoretical. It's the only responsible way to run them.
This article will explore the threats posed by and sandboxing tools for coding agents running on a local workstation.
Threat Model and Attack Surface
The actor is an LLM coding agent with execution privileges. It can access the filesystem, the network and execute shell commands. It runs on a developer's machine. It's pitched as a productivity assistant, but from a security angle, it's an autonomous process with delegated authority.
Core Assumption
The agent is not reliably correct or well-behaved. It might follow ambiguous instructions or treat comments or other input as directives. Training artifacts can cause unexpected behavior. Tool interfaces can be misused (e.g., parameter confusion or encoding issues). Once you allow execution, you can't assume that "the agent knows better" or that alignment will prevent harm.
Scope
The scope is risks from running an LLM coding agent with execution privileges on a developer's machine.
The focus is on containing the agent, not verifying the code it writes. If you treat the agent as an untrusted process with dangerous privileges, you can reason about isolation and blast radius using familiar systems concepts. You can apply sandboxing techniques instead of hoping better prompts or smarter models will fix the problem. Instead of predicting behavior, you design containment that limits the damage.
Bugs, logic errors, insecure APIs, missing validation, or other vulnerabilities in generated code are out of scope. Those are real problems, but they're not unique to agents. They already exist with human-written code and can be handled through review, testing, static analysis, and secure dev practices. Mixing those concerns with agent containment blurs the actual security boundary.
Attack Surface
When an LLM coding agent runs on your workstation, its attack surface looks like yours. Same account, same permissions. The difference isn't what it can do — it's how predictably it acts. If context or instructions direct it somewhere, it will exercise the available permissions.
Filesystem Access
An agent that can read anywhere can stray past the project directory into config files, SSH keys, browser profiles, or other data that was never in scope. Write access is worse: it enables destructive operations and subtle tampering with files not in version control, changes that won't get reviewed or rolled back.
Shell and Process Execution
This gives the agent the same authority as your terminal. It can install software, spawn background processes, and invoke compilers or other powerful tools to achieve things it couldn't do directly. Even without malicious intent, agents can create long-lived processes or modify system state.
Project-Embedded Execution
Modern development workflows run code directly from the repository: git hooks, build scripts, test runners, package lifecycle scripts. If an agent modifies these, the changes might not alert you right away. They execute later during normal work, delaying impact and making it hard to trace back to the agent.
Network Egress
If outbound connections are open, the agent can exfiltrate data, fetch untrusted payloads, or establish command-and-control channels. From the outside, this traffic looks like normal developer activity.
Credentials and Identity Artifacts
Your machine is full of auth material: SSH keys, cloud tokens, API keys, credential helpers that silently grant access. If an agent can read or invoke these, it can impersonate you across services, potentially long after the session ends. The impact spreads beyond the local machine.
Persistence Mechanisms
With enough privileges, an agent can register scheduled tasks or adjust startup configs so changes survive reboots. Package installs and upgrades increase the risk: post-install scripts or version downgrades can quietly weaken security. Even well-intentioned changes leave a state that's difficult to notice or fully unwind.
Resource Exhaustion
Runaway builds and infinite loops use up CPU and memory, or disk space. In adversarial cases, the same capabilities can be used for cryptomining. Usually visible through degraded performance or unexpected usage, but the disruption and cost make execution limits worthwhile.
When you run an LLM coding agent, you hand your authority to a non-deterministic process. The attack surface tells you what needs to be constrained or denied.
Overview of Available Sandboxing Tools
This overview covers sandboxing tools that run locally on a developer workstation. Hosted or production-oriented platforms aren't included. The focus is on what you can install and use as part of your local workflow.
Local sandboxing tools use several different approaches to isolation. The most important distinctions are how the isolation works and how you use the agent. Tooling choice depends on whether the agent runs from the CLI or inside an IDE and if it needs to operate containers or access a browser. Those often matter more than isolation strength.
OS-Level Isolation
At the lighter end are OS-level isolation mechanisms that share the host kernel.
On Linux, tools built on bubblewrap restrict filesystem views, environment variables, and capabilities with low startup overhead. Projects like sandbox-run and Anthropic's Sandbox Runtime (SRT) use this to sandbox individual commands without containers or VMs.
On macOS, sandbox-exec provides a similar primitive. Higher-level wrappers like Sandboxtron and Anthropic SRT offer prebuilt profiles.
Container-Based Isolation
Containers are a practical middle ground. Docker- and Podman-based projects like Agent Sandbox, TSK, and Leash benefit from mature tooling and broad community support.
Only Agent Sandbox and Sandcat provide direct IDE integration. They use a docker-compose file that works as a Devcontainer. This makes it straightforward to run the agent inside VS Code or JetBrains.
Other projects use containers but wrap them in custom runtimes. You could try IDE integration via "Remote Development" extensions, but it's more complicated. A few container tools, notably TSK and Agent of Empires support assigning tasks to multiple agents in parallel for "agent swarm" workflows.
Container security depends on the runtime environment and how the container runs, including Docker daemon config. Most projects run agents as a non-root user inside the container. Exceptions: Leash and Agent of Empires run as root, which increases risk if the agent is compromised. Capability restrictions vary: TSK applies the most restrictive set, Agent Sandbox drops some, and the rest operate with defaults.
VM-Based Isolation
VM-based isolation introduces a separate kernel and a clear safety boundary against accidental host damage, but it comes with operational friction: slower boot times, resource allocation, and persistent VM state you need to maintain or clean up.
Lightweight VM approaches use KVM or macOS Virtualization.framework. Projects like Microsandbox, Matchlock, and Docker's AI Sandboxes reduce startup latency and improve the experience, but they still add complexity compared to containers. The listed VM-based projects are designed for headless usage and don't support running an IDE inside the VM.
Audit Capabilities
Audit helps you understand what happened after a failure or compromise. It's also useful for policy refinement. If a task fails because of an overly restrictive rule, detailed logs show exactly which file path or network destination was denied, so you can decide what to allow next time. Without that visibility, you're guessing or disabling isolation to get work done.
Across the projects audit capabilities are the biggest gap: isolation mechanisms are common, but good, developer-friendly observability into what the agent did is rare. Projects that route traffic through an HTTP proxy expose proxy logs, which at least show outbound network activity. Outside that, you're stuck with generic tools like strace and container or VM logs: low-level and cumbersome. Leash stands out as the only project offering a complete filesystem and network audit trail out of the box, with structured telemetry and a UI.
Choosing a Sandbox
There's no single sandbox that fits everyone. OS-level sandboxes are lightweight. Containers will often be a familiar technology and have good IDE compatibility. VMs offer the strongest isolation, but at a higher operational cost.
Projects based on system primitives can lack features. Most don't isolate the network at all. Some tools like Anthropic SRT can only deny specific paths (read anything by default). This limits how tightly you can constrain the agent.
Docker and Docker Compose are already used in many development workflows. This lowers adoption friction for container-based sandboxes. A declarative docker-compose.yml is transparent and easy to inspect. The isolation model is easier to reason about compared to opaque runtime systems.
Some advanced container projects build custom control planes on top of the Docker API. While powerful, these abstractions can obscure what's happening at the isolation boundary. Volume mounts and network configuration are encoded in application logic rather than expressed declaratively. This makes the system harder to audit.
VMs increase complexity further. Beyond running a guest OS, acceptable performance requires non-trivial practices: snapshot management, layered disk images, optimized file sharing via mechanisms like FUSE. These aren't concepts most application developers routinely work with. VMs strengthen the host protection boundary, but they add an order of magnitude more operational and conceptual overhead.
Finally, there's a trust question. The goal is to sandbox an untrusted LLM coding agent, but you have to trust the sandbox itself. Simpler systems built on well-understood, widely audited primitives reduce the trusted computing base. More feature-rich frameworks offer convenience and orchestration, but they expand what you must trust. Sandboxing shifts the trust boundary, it doesn't eliminate it.
| Project | File Isolation | Network Isolation | Audit | Isolation Mechanism | macOS | Linux | Supported Agents |
|---|---|---|---|---|---|---|---|
Anthropic SRT | Read allowed everywhere by default, write denied by default; only specific paths can be denied or allowed | HTTP/SOCKS allowlist | macOS: system logs; Linux: strace | sandbox-exec / bubblewrap | ✅ | ✅ | Any command |
scode | Project RW; 35+ sensitive paths (creds/personal) blocked by default; configurable via --allow/--block | ❌ Online/Offline only | Violation logs; scode audit UI | sandbox-exec / bubblewrap | ✅ | ✅ | Any command; agent-agnostic (Claude, Codex, OpenCode, Goose, Gemini, Pi, Qwen, …) |
nono | Project RW; configurable read/write | HTTP proxy; secret injection | JSON audit logs | sandbox-exec / Landlock | ✅ | ✅ | Any command; Claude Code, Codex, OpenCode, OpenClaw, Swival |
sandbox-run | Project RW; system RO; HOME remapped; extra host paths optionally bound | ❌ Shared host network | None | bubblewrap | ❌ | ✅ | Any command |
Sandboxtron | Project RW; limited HOME RW (caches); rest of host RO | ❌ Offline: localhost only; Online: full | None | sandbox-exec | ✅ | ❌ | Any command |
Agent Sandbox | Project RW | MITM proxy | Proxy logs | Container + mitmproxy + iptables | ⚠️ (Docker) | ✅ | Claude Code, Codex CLI, GitHub Copilot CLI |
TSK | Project RW; task state persisted outside project | Squid proxy | Proxy logs | Container + Squid + iptables | ⚠️ (Docker) | ✅ | Claude Code, Codex |
Leash | Project RW; optional host config mounts | Cedar | FS + Net + MCP telemetry; UI | Container + Cedar policies | ⚠️ (Docker) | ✅ | Claude, Codex, Gemini, Qwen, OpenCode |
Agent of Empires | Project RW; git worktrees; host configs mostly RO; persistent auth volumes | ❌ Container inherits host | Container logs | tmux sessions; optional per-session container | ⚠️ (Docker) | ✅ | Claude Code, OpenCode, Mistral Vibe, Codex CLI, Gemini CLI, Cursor CLI, Copilot CLI, Pi.dev |
Packnplay | Project RW (git worktree); credentials RO or copied into container | ❌ Container inherits host | Container logs | Container | ⚠️ (Docker) | ✅ | Claude Code, Codex, Gemini; Any command |
Sandcat | Project RW | Transparent MITM proxy; secret injection | Proxy logs (mitmweb UI) | Container + mitmproxy + WireGuard | ⚠️ (Docker) | ✅ | Any command; Claude Code (auto onboarding) |
release-engineers/agent-sandbox | Project copy; patch-file writeback | tinyproxy | Proxy logs | Container | ⚠️ (Docker) | ✅ | Claude Code; any command; patch-based workflow |
SafeYolo | Project RW | MITM proxy | Proxy logs | Container | ⚠️ (Docker) | ✅ | Claude Code, OpenAI Codex |
Vagrant | Project RW via 2‑way VM sync | ❌ Full internet | VM logs | Full VM | ✅ | ✅ | Any command |
Matchlock | Project RW via COW filesystem inside microVM | MITM proxy; secret injection | Proxy logs + VFS hooks | nftables DNAT rules / gVisor userspace TCP/IP | 🍏 Apple Silicon only | ⚠️ KVM only | Any command / SDK (Go, Python, TypeScript) |
Vibe | Project RW; package caches shared | ❌ Full internet | VM logs | Virtualization.framework VM | 🍏 Apple Silicon only | ❌ | Claude Code, Codex, Gemini |
Docker Sandboxes | Project RW via 2‑way VM sync | HTTP(S) proxy | None | Virtualization.framework VM | ✅ (Docker Desktop) | ❌ | Claude Code; other in development; not extensible |
Microsandbox | Isolated; volumes as configured | ❌ none/local/public/any | None | MicroVM (libkrun) | ✅ | ✅ | Any command; Native MCP; SDKs (Py, JS, Rust) |
claude-code-safety-net | ❌ command interception only | ❌ No network isolation | JSON logs | Plugin/hook layer (not a sandbox) | ✅ | ✅ | Claude Code, OpenCode, Gemini CLI, Copilot CLI |
In the second part of the article, we will look at the Practical Example: Sandboxed Single-Agent Setup.




