Sandboxing LLM Coding Agents: Part 2

Editor's Note: for the Context, Motivation and overview of tools, please visit part 1.

Practical Example: Sandboxed Single-Agent Setup

This section uses Agent Sandbox as an example. It shows how to run a single LLM coding agent with meaningful isolation on a developer workstation. Agent Sandbox is built on Docker and Docker Compose, which gives it a low barrier to entry. An important feature is that it integrates with IDE workflows via devcontainers.

Visdom 2.0 AI-Native SDLC

The Missing Layer Between AI Coding and Production

Setup Overview

The setup treats the agent as an untrusted process. It minimizes what it can access.

The container gets the project workspace and a project-scoped state volume (for agent credentials and settings) and avoids broad host mounts. The developer's home directory isn't exposed by default. Sensitive config isn't implicitly shared. Optional integrations like dotfiles or shell customizations can be enabled during setup, but they're mounted read-only. This rules out large classes of accidental damage and makes it easier to reason about what the agent can and can't see.

Network access is similarly constrained. Outbound traffic is blocked by default and forced through an HTTP proxy that enforces an explicit allowlist. In the default configuration, the agent can only reach the LLM API endpoint. Access to other protocols and the host network is blocked. If the agent tries to pull in unexpected dependencies or exfiltrate data, those actions fail loudly instead of succeeding silently.

Initialization

The entry point is the agentbox CLI. From the project root:

The interactive prompt asks for agent type, runtime mode, and IDE integration. In this example, the Claude agent is selected, then whether the agent should run through a CLI workflow or inside a devcontainer. When devcontainer mode is selected, it asks for the IDE choice so it can allowlist the right domains automatically.

Initializing Agent Sandbox with agentbox init command

Running init creates a .agent-sandbox/ directory containing the network policy definition and a Docker Compose configuration that matches the chosen runtime mode. You can examine the generated files in the companion repository.

Runtime Modes

Command Line

When working from an existing terminal, the CLI runtime mode is the most lightweight path. Start the sandbox and enter a shell with:

This places you in a Debian-based container where only the repo workspace is mounted and all outbound traffic routes through the sandbox proxy. Because the CLI wrapper delegates to Docker Compose, you can also start the same environment manually:

Inside the container shell, the agent can be invoked normally:

The command can also be executed from the host through the wrapper:

Devcontainer (IDE-integrated)

For IDE integration, devcontainer mode runs the development backend inside the sandbox. The generated devcontainer config installs the Claude plugin automatically inside the container. Use "Reopen in Container" from VS Code or activate it using "Create Dev Container" in a JetBrains IDE.

opening devcontainer in vscode

opening devcontainer in idea

Once running, the plugins and terminals opened through the IDE operate inside the container context.

IDE integration for Code

IDE integration for JetBrains

Authenticating the Agent

Claude Code expects browser authentication on the same machine. This won't work when the agent runs inside a container. During first-time auth, you need to manually copy the login URL from the sandbox terminal, open it in the host browser, complete the login, and paste the authorization code back into the container session.

Copying authentication URL in IDE

Copying authentication URL in terminal

Credentials persist in a named Docker volume, so you only need to do the handshake once per project.

Language Runtimes

At this stage the agent has file access, but the base image intentionally excludes language runtimes. Build and test workflows fail until you install tooling. Below is an example of installing Java, similar steps are required for other languages.

Using Devcontainer Features

One option is to use Devcontainer Features to add a Java runtime by inserting this config into devcontainer.json:

Recreating the container installs Java and IDE integrations automatically.

Unfortunately, the "Java" Devcontainer feature does not import system certificates into the Java keystore, so we have to do this manually:

Using Dockerfile

If you use a CLI workflow or don't want to import the certificates manually, you can extend the agent image with a custom Dockerfile.

Then adjust the Docker Compose config so the agent service builds from this local Dockerfile instead of referencing the prebuilt image:

Using this command to edit the Compose file will recreate the Compose stack after the file is modified:

After rebuilding the environment, Java becomes available:

Despite having the runtime installed, strict network rules prevent dependency retrieval:

The failure reveals the blocked host, and the same information can be confirmed through proxy logs:

Managing Network Policy

To allow required outbound domains, edit the sandbox policy:

After opening the configuration, adding the Maven repository host to the allowlist enables downloads:

Saving the file triggers a proxy restart to enforce the updated rules. Because the policy is mounted read-only inside the container, the agent cannot modify its own network permissions.

Running Maven again demonstrates that shell tooling works through the proxy, but Java tooling does not honor proxy environment variables:

Resolving this requires passing command line parameters to Java. This can be fixed by adding an environment variable to the agent service definition (use agentbox edit compose to have the stack reloaded automatically):

With the stack reloaded, Maven execution proceeds:

Basic Volume Persistence

To prevent repeated downloads when containers restart, add persistence. Edit the Compose configuration to add a named Docker volume at /home/dev/.m2 for Maven cache:

The Dockerfile is also adjusted to create the .m2 directory ahead of time so ownership aligns correctly when the volume is mounted:

With those changes, the artifact cache persists across container restarts.

Examples and Advanced Patterns

The steps above show how to configure Agent Sandbox for Java projects. You can examine the results in a companion repository.

The repository also demonstrates advanced patterns not covered in this article, including using host Maven repository for faster dependency resolution, and an offline mode that works without accessing the remote repositories.

Other Agents

Agent Sandbox project also supports using GitHub Copilot in a very similar way. Adding other CLI-based agents is straightforward: you need to extend the base image (ghcr.io/mattolson/agent-sandbox-base) with the agent installation and add required Web domains to the allowlist.

An interesting feature of Devcontainers is that any supported plugins can be used inside the sandbox. You don't need an agent-specific image, only to adjust the network allowlist. The agent "runtime" will be provisioned by the relevant plugin.

For example, we can install a Codex extension in VS Code. (Agent Sandbox has recently added support for Codex as a preview, but we'll do it manually here for demonstration purposes).

After installing the extension, we will be blocked from authenticating with the OAuth flow.

Screenshot of Codex not working due to authentication block

This is expected since Codex is not yet on the allowlist, and the process inside the container is trying to exchange the OAuth code for a token.

We can use agentbox edit policy to add the required domains:

After restarting the proxy, we can continue the login process and work with Codex inside the sandbox.

Screenshot of Codex working successfully after authentication

Careful readers might be surprised that the login flow worked at all. The "trick" is that VS Code extensions can forward additional ports from the Devcontainer to the host. While this makes the onboarding experience easier, it also shows that using Devcontainers can weaken the isolation in unexpected ways.

Similarly, we can use Junie in JetBrains IDE, yet the process is a bit more complicated. Because the JetBrains Devcontainer implementation is built on the "Remote Development" feature, the language can be a bit misleading. The "host" in this context means the "remote host" i.e., a backend process running inside the container. The "client" in this context means a process running directly on the local machine that connects to the remote host.

To complicate things further, the setup changes depending on the patch version of the IDE. "Remote Development" feature will download and install an IDE backend independently of the main IDE version. The version you see in the "About" dialog when running the remote client is what matters.

What works on the 2025.3.3 version: you need the plugin installed on both the "host" and the "client" side. The setup process sometimes fails. If you don't see the plugin installed as in the screenshot below, close the IDE, stop the containers, and try again from scratch.

Screenshot of JetBrains plugin installed in devcontainer

We can install the plugin because JetBrains marketplace domains are already allowlisted (assuming we've chosen the Devcontainer mode and JetBrains during project initialization). However, Junie will not work because its API is blocked.

Screenshot of JetBrains plugin not working in devcontainer

Required domains can be identified through proxy logs, but Agent Sandbox has a predefined service definition called jetbrains-ai, that will save us time.

Updating the policy and restarting the proxy will allow you to use Junie inside IntelliJ.

Screenshot of Junie working in JetBrains

An alternative approach is to use the "JetBrains AI assistant" plugin:

Screenshot of AI assistant installed on host

This plugin can work when installed only on the host and requires the same network allowlist configuration.

Screenshot of AI assistant working

The AI assistant plugin also offers other agents, but since they are not allowlisted by default, they will fail to run.

Screenshot of codex not working

Required domains can be extended if you want to use the other agents. Note that failing to work doesn't prove the extension is completely sandboxed. The pitfalls of Devcontainers integration will be discussed later.

The step of installing the extension manually can be automated via the customizations.vscode.extensions and customizations.jetbrains.plugins arrays in the devcontainer.json file. You can check the generated devcontainer.json file for the Claude example.

Extra features

Personal shell configuration can be brought into the sandbox through dotfiles and shell.d scripts. Both are mounted read-only to prevent agent tampering. You enable them by uncommenting the relevant lines in the generated docker-compose.yml file. For example:

To forward your Git configuration into the container, create a dotfiles directory and the .gitconfig file:

The entrypoint recursively symlinks everything from ~/.dotfiles into $HOME, so .dotfiles/.gitconfig becomes ~/.gitconfig inside the container. Commits made by the agent carry your identity.

Shell customizations work through ~/.config/agent-sandbox/shell.d/:

These *.sh files are sourced at shell startup before ~/.zshrc.

Conclusion

Storing the sandbox configuration in a Git repo lets all team members use the same setup and audit it. Personal preferences live in dotfiles, so each developer can customize their experience.

Once the sandbox is set up, you use it via the CLI tool or directly through Docker Compose or IDE Devcontainers integration.

From inside the container, the agent has a functional dev environment. From outside, you get familiar tooling, but mistakes or misbehavior are contained. Once configured for your stack, it runs without friction.

Deeper Analysis: What's Protected and What Isn't

Agent Sandbox draws a crisp boundary around what an LLM coding agent is allowed to touch. Understanding that boundary is required for correct use. The protections it offers are meaningful, but they're not absolute. They work best when paired with disciplined workflows.

What's Protected

The sandbox protects the host filesystem. By limiting read–write access to a single project directory and avoiding broad mounts of the developer's home or system paths, it prevents an agent from casually inspecting or damaging unrelated files. This prevents many of the common accidental failures.

Persistent credentials are shielded by default. SSH keys, cloud tokens, and credential helpers never enter the container unless explicitly forwarded.

Network access is also constrained. All outbound traffic goes through a proxy and is gated by allowlists. This reduces the risk of unintended data exfiltration and makes network usage observable instead of implicit.

Unavoidable Exposure

The agent can read anything inside the allowed project directory. Unless specifically protected by read-only mounts, the contents are also mutable. This includes config files, test fixtures, and any sensitive information that happens to live there.

The full prompt/response logs are available inside the sandbox, so the agent can access past conversations unless they are manually deleted.

The LLM API key is available inside the sandbox. If this is a concern, there are projects that inject secrets at the proxy level.

Subtle Risks

Some subtle risks come from ways the sandbox can be bypassed without breaking out of the container itself.

Misconfigured volume mounts can quietly undo the isolation guarantees. Adding convenience mounts for caches or shared directories can expand the agent's authority in ways you forget later.

Overly permissive network rules have a similar effect. Once allowlists grow broad enough, the network boundary becomes largely symbolic. Agent Sandbox keeps the configuration inside the project directory, so at least the changes are easy to audit.

The most important escape route is project-embedded execution. Even if the agent never leaves its container, it can modify files that execute later on the host, like git hooks, build scripts, or test runners.

One mitigation is to mount the .git directory read-only and perform git operations on the host. This reduces some obvious abuse paths, but it's not practical to protect every executable surface in a modern project. The safest rule is to never execute code modified by the LLM outside the sandbox until it's been reviewed. Treat it like an untrusted pull request.

Configuration Persistence

Access to project files allows the agent to modify the environment configuration. This extends its influence beyond the current runtime.

Notable examples include Docker Compose files when using CLI workflow, devcontainer.json and associated container build artifacts when using devcontainers, and .vscode/ and .idea/ directories that affect the IDE. Changes to these files can alter mounts, forwarded resources, or execution behavior on later launches. Agent Sandbox mitigates this class of persistence by mounting those directories as read-only, preventing modification by the agent.

IDE Integration Risks

VS Code Remote integration adds a second boundary that exists independently of container isolation. Its devcontainer workflow deploys a vscode-server inside the container. Extensions then communicate with the host editor over a persistent RPC channel.

Research has shown that workspace-side extensions can use the RPC channel to invoke host-routed commands to open a local terminal and inject keystrokes. This results in command execution on the developer machine without breaking container isolation.

The same bridge exposes other host capabilities, including clipboard access and URL handling. Git config and authentication-agent sockets are also forwarded into the container, allowing processes to request signatures or authenticate through unlocked host keys. Extensions can also open additional ports, as shown in the previous section.

Comparable remote-development architecture in JetBrains IDEs is more complex and likely exposes similar cross-boundary interactions, although public research at this level of detail is currently lacking.

MCP Servers

Model Context Protocol servers deserve special attention. An MCP server is an out-of-process capability provider. If it's not sandboxed, it can do almost anything the host can do, regardless of how tightly the agent itself is constrained. From a threat-model perspective, MCP servers are part of the trusted computing base and must be sandboxed or audited separately.

Kernel and Runtime Vulnerabilities

The sandbox is vulnerable to kernel or container runtime issues. Exploiting a flaw in the host kernel, container runtime, or virtualization layer can allow a process to escape its intended isolation entirely. These risks, while rare, shouldn't be ignored, but they're outside the scope of this article.

Relaxing Constraints for More Complex Workflows

Strict sandboxing is a good default, but dev workflows often require selectively loosening constraints. The coding agent may need access to external systems, shared resources, or richer execution environments.

Treat these relaxations as deliberate, incremental decisions. Don't use one-time switches that revert the sandbox to "basically the host."

Dependency Management

One common reason to relax constraints is dependency management. Many projects rely on downloading packages and other tools from the internet. A sandbox that only allows access to an LLM API endpoint quickly becomes impractical.

Broader network access may be required, but access should be carefully scoped. Allowlist specific package registries or mirrors instead of enabling unrestricted outbound traffic. That preserves some visibility and control while acknowledging that modern builds are inherently networked.

A common compromise is to grant read-only access to shared directories. Dependency caches or build outputs let the agent reuse the existing state without modifying it. This can improve performance and ergonomics while keeping write access tightly scoped.

Collaboration Workflows

Collaboration-oriented workflows are another frequent driver. Agents may need to open pull requests or interact with code hosting platforms. Supporting this often means allowing outbound access to services like GitHub or GitLab and injecting credentials that let the agent authenticate.

The same pattern applies to agents that need to call internal APIs or services. In all these cases, scoped credential injection is better than forwarding your personal credentials. Limit credentials by role and lifetime to reduce the blast radius if the agent misbehaves or is compromised.

Container Operations

When additional container build capabilities are needed, mounting the host Docker socket into the sandbox may seem convenient, but it breaks the isolation model. The socket exposes control over the host container runtime, letting workloads launch with arbitrary mounts or privileges and collapsing the sandbox boundary.

A safer approach is to run a nested runtime inside the container. For example, rootless Docker-in-Docker, so container operations remain scoped to the sandbox. This introduces performance and caching tradeoffs, but it preserves the trust boundary. If container orchestration inside the sandbox becomes a core requirement, consider a VM-based sandbox.

Browser Automation

Running browsers inside a container can be challenging. There are extra dependencies, shared memory, and the browser's own sandboxing to worry about. For workflows which rely heavily on browser automation, this is another reason to consider a VM-based sandbox, where the browser runs on a full OS and these limitations largely disappear.

Gradual Expansion

Expand gradually with explicit intent. Each relaxation should answer a concrete question: what capability is missing, and what's the narrowest way to provide it?

Sandboxing remains valuable even when constraints are loosened, as long as defaults stay restrictive and exceptions are easy to audit and revoke. Treat sandbox configuration as part of the workflow, not a static setup step. This helps ensure that increased power for the agent doesn't quietly turn into unbounded trust.

Conclusion: Balancing Safety and Productivity

LLM coding agents are powerful, but their autonomy comes with real risks. Once they execute commands, mistakes can happen fast.

Sandboxing turns blind trust into auditable containment. Start with strict limits. Watch what the agent does. Expand privileges when needed, then watch again. The goal isn't zero risk. It's risk you can understand and control.

Treat sandbox configuration as part of your workflow, not a one-time setup. When constraints are declared and versioned alongside your code, you can use AI safely instead of gambling with your environment.