MicroPython in WASM: How Simon Willison Built a 362KB Sandbox for Agent Code Execution

Simon Willison released micropython-wasm as an alpha package and immediately shipped it to production in datasette-agent-micropython. The implementation solves a specific problem: how to execute agent-generated Python code without giving it full system privileges. The sandbox runs a 362KB MicroPython interpreter inside WebAssembly, maintains persistent state across multiple eval() calls, and exposes host functions through 78 lines of C.

This is not a research project. It is a working implementation with known unknowns.

Why WebAssembly for Agent Code Execution

Willison needed a sandbox that installs cleanly from PyPI, enforces memory and CPU limits, controls file and network access, and exposes host functions. JavaScript engines like v8 are too complicated to embed safely. Pyodide only runs in browsers or Node.js. MicroPython compiled to WASM hits all the requirements.

The wasmtime Python library provides binary wheels and active maintenance. WebAssembly’s security model is battle-tested in browsers. If the C bridge code fails, the worst outcome is a WASM exception, not a system compromise.

The Persistent State Problem

A naive WASM sandbox starts the interpreter, runs code, and stops. Variables disappear between calls. For agent workflows, you need persistent state so a variable defined in one session.run() call survives into the next.

Willison’s solution uses a thread-based queue architecture:

Python code starts a thread and creates request and reply queues
Inside WASM, MicroPython runs an eval() loop that blocks on __session_next__()
The host function waits on the request queue until Python sends new code
MicroPython evaluates the code and calls __session_result__() with the result
The host function posts to the reply queue, and Python’s session.run() returns

This keeps the interpreter alive across multiple code blocks:

from micropython_wasm import MicroPythonSession

with MicroPythonSession() as session:
    print(session.run("x = 10\nprint(x)").stdout)  # 10
    print(session.run("x += 5\nprint(x)").stdout)  # 15
    print(session.run("print(x * 2)").stdout)      # 30

The blocking mechanism coordinates between Python’s request queue and the WASM interpreter loop without restarting the interpreter.

Host Function Bridge Architecture

The 78 lines of C compile into the 362KB WASM blob. This code registers Python functions as callable from inside MicroPython. The bridge handles:

Function registration from the host environment
Argument marshaling between Python and MicroPython types
Return value serialization back to the host
Exception propagation across the WASM boundary

The C layer exposes __session_next__() and __session_result__() as blocking host functions. MicroPython calls these during its eval() loop. The host Python code controls what functions are available inside the sandbox.

Security Boundaries and Limits

Boundary	Mechanism	Failure Mode
Memory	wasmtime memory limits	Hard stop, exception raised, no recovery
CPU	wasmtime “fuel” concept (20M default)	Hard stop, exception raised, no recovery
Filesystem	No access unless explicitly exposed	Cannot read or write files
Network	No access unless explicitly exposed	Cannot open sockets
Host functions	Only registered functions callable	Cannot escape to arbitrary Python

The fuel-based CPU limit is the weakest link. Wasmtime measures WebAssembly operations, not wall-clock time or Python bytecode instructions. The 20 million default is experimental. Set it too low and legitimate code times out. Set it too high and while True: s += "x" runs too long before stopping.

Memory limits are straightforward. You set a maximum heap size and WASM enforces it.

MicroPython vs. Full CPython Trade-offs

MicroPython is a subset of Python 3 optimized for constrained environments. It lacks most of the standard library. No requests, no pandas, no numpy. This is a feature for sandboxing: less surface area means fewer ways to break out.

The interpreter is small enough to compile to a 362KB WASM blob. Full CPython would be orders of magnitude larger and harder to wire up.

The downside is compatibility. Agent-generated code that assumes CPython libraries will fail. You must design prompts and tool schemas around MicroPython’s limitations.

Production Deployment Shape

Willison shipped this in datasette-agent-micropython, a plugin for Datasette Agent. The plugin:

Installs from PyPI with no extra build steps
Runs agent-generated code in response to user queries
Exposes selected Datasette functions as host functions
Maintains interpreter state across a conversation session

The alpha label is honest. Willison tested it by locking GPT-5.5 xhigh inside the sandbox and challenging it to break out. It has not succeeded yet. That is not a security audit.

The package requires Python 3.8+ and wasmtime-py 28.0.0 or later. Binary wheels are available for Linux, macOS, and Windows on x86_64 and ARM64.

Observability and Failure Modes

The sandbox returns structured results with stdout, stderr, and exception details. You can log every session.run() call and inspect what code the agent generated.

Known failure modes:

Fuel exhaustion: code stops mid-execution, exception raised
Memory exhaustion: WASM runtime kills the process
Invalid Python syntax: MicroPython raises SyntaxError
Host function not registered: AttributeError inside the sandbox
C bridge bug: WASM exception, not a host system compromise

The thread-based queue can deadlock if the host function never returns or the WASM side never calls __session_result__(). Timeouts on both sides mitigate this.

Technical Verdict

Use this when:

You need to execute agent-generated Python code in a server-side Python application
You can constrain the code to MicroPython’s standard library
You want a single PyPI install with no external dependencies
You are comfortable with alpha-quality software and willing to test thoroughly
You need persistent state across multiple code execution calls
You can tolerate imprecise CPU limits (fuel is experimental)

Avoid this when:

You need full CPython compatibility or third-party libraries like requests or pandas
You require precise, predictable CPU time limits for billing or SLA enforcement
You need a security-audited sandbox for high-stakes environments (financial, healthcare, PII)
You cannot tolerate occasional WASM runtime exceptions or deadlocks
You need sub-100ms execution latency (WASM startup overhead is non-trivial)

The persistent state mechanism is clever. The C bridge is small enough to audit. The fuel-based CPU limit is the biggest unknown. This is a working proof of concept that needs production hardening before you bet a business on it. If you are building agent tooling and need safe code execution today, this is worth testing. If you need guarantees, wait for more battle-testing or hire a security team to audit the C bridge.