Simon Willison released micropython-wasm as an alpha package and immediately shipped it to production in datasette-agent-micropython. The implementation solves a specific problem: how to execute agent-generated Python code without giving it full system privileges. The sandbox runs a 362KB MicroPython interpreter inside WebAssembly, maintains persistent state across multiple eval() calls, and exposes host functions through 78 lines of C.
This is not a research project. It is a working implementation with known unknowns.
Why WebAssembly for Agent Code Execution
Willison needed a sandbox that installs cleanly from PyPI, enforces memory and CPU limits, controls file and network access, and exposes host functions. JavaScript engines like v8 are too complicated to embed safely. Pyodide only runs in browsers or Node.js. MicroPython compiled to WASM hits all the requirements.
The wasmtime Python library provides binary wheels and active maintenance. WebAssembly’s security model is battle-tested in browsers. If the C bridge code fails, the worst outcome is a WASM exception, not a system compromise.
The Persistent State Problem
A naive WASM sandbox starts the interpreter, runs code, and stops. Variables disappear between calls. For agent workflows, you need persistent state so a variable defined in one session.run() call survives into the next.
Willison’s solution uses a thread-based queue architecture:
- Python code starts a thread and creates request and reply queues
- Inside WASM, MicroPython runs an eval() loop that blocks on
__session_next__() - The host function waits on the request queue until Python sends new code
- MicroPython evaluates the code and calls
__session_result__()with the result - The host function posts to the reply queue, and Python’s
session.run()returns
This keeps the interpreter alive across multiple code blocks:
from micropython_wasm import MicroPythonSession
with MicroPythonSession() as session:
print(session.run("x = 10\nprint(x)").stdout) # 10
print(session.run("x += 5\nprint(x)").stdout) # 15
print(session.run("print(x * 2)").stdout) # 30
The blocking mechanism coordinates between Python’s request queue and the WASM interpreter loop without restarting the interpreter.
Host Function Bridge Architecture
The 78 lines of C compile into the 362KB WASM blob. This code registers Python functions as callable from inside MicroPython. The bridge handles:
- Function registration from the host environment
- Argument marshaling between Python and MicroPython types
- Return value serialization back to the host
- Exception propagation across the WASM boundary
The C layer exposes __session_next__() and __session_result__() as blocking host functions. MicroPython calls these during its eval() loop. The host Python code controls what functions are available inside the sandbox.
Security Boundaries and Limits
| Boundary | Mechanism | Failure Mode |
|---|---|---|
| Memory | wasmtime memory limits | Hard stop, exception raised, no recovery |
| CPU | wasmtime “fuel” concept (20M default) | Hard stop, exception raised, no recovery |
| Filesystem | No access unless explicitly exposed | Cannot read or write files |
| Network | No access unless explicitly exposed | Cannot open sockets |
| Host functions | Only registered functions callable | Cannot escape to arbitrary Python |
The fuel-based CPU limit is the weakest link. Wasmtime measures WebAssembly operations, not wall-clock time or Python bytecode instructions. The 20 million default is experimental. Set it too low and legitimate code times out. Set it too high and while True: s += "x" runs too long before stopping.
Memory limits are straightforward. You set a maximum heap size and WASM enforces it.
MicroPython vs. Full CPython Trade-offs
MicroPython is a subset of Python 3 optimized for constrained environments. It lacks most of the standard library. No requests, no pandas, no numpy. This is a feature for sandboxing: less surface area means fewer ways to break out.
The interpreter is small enough to compile to a 362KB WASM blob. Full CPython would be orders of magnitude larger and harder to wire up.
The downside is compatibility. Agent-generated code that assumes CPython libraries will fail. You must design prompts and tool schemas around MicroPython’s limitations.
Production Deployment Shape
Willison shipped this in datasette-agent-micropython, a plugin for Datasette Agent. The plugin:
- Installs from PyPI with no extra build steps
- Runs agent-generated code in response to user queries
- Exposes selected Datasette functions as host functions
- Maintains interpreter state across a conversation session
The alpha label is honest. Willison tested it by locking GPT-5.5 xhigh inside the sandbox and challenging it to break out. It has not succeeded yet. That is not a security audit.
The package requires Python 3.8+ and wasmtime-py 28.0.0 or later. Binary wheels are available for Linux, macOS, and Windows on x86_64 and ARM64.
Observability and Failure Modes
The sandbox returns structured results with stdout, stderr, and exception details. You can log every session.run() call and inspect what code the agent generated.
Known failure modes:
- Fuel exhaustion: code stops mid-execution, exception raised
- Memory exhaustion: WASM runtime kills the process
- Invalid Python syntax: MicroPython raises SyntaxError
- Host function not registered: AttributeError inside the sandbox
- C bridge bug: WASM exception, not a host system compromise
The thread-based queue can deadlock if the host function never returns or the WASM side never calls __session_result__(). Timeouts on both sides mitigate this.
Technical Verdict
Use this when:
- You need to execute agent-generated Python code in a server-side Python application
- You can constrain the code to MicroPython’s standard library
- You want a single PyPI install with no external dependencies
- You are comfortable with alpha-quality software and willing to test thoroughly
- You need persistent state across multiple code execution calls
- You can tolerate imprecise CPU limits (fuel is experimental)
Avoid this when:
- You need full CPython compatibility or third-party libraries like requests or pandas
- You require precise, predictable CPU time limits for billing or SLA enforcement
- You need a security-audited sandbox for high-stakes environments (financial, healthcare, PII)
- You cannot tolerate occasional WASM runtime exceptions or deadlocks
- You need sub-100ms execution latency (WASM startup overhead is non-trivial)
The persistent state mechanism is clever. The C bridge is small enough to audit. The fuel-based CPU limit is the biggest unknown. This is a working proof of concept that needs production hardening before you bet a business on it. If you are building agent tooling and need safe code execution today, this is worth testing. If you need guarantees, wait for more battle-testing or hire a security team to audit the C bridge.