mech.app
Security

Mog: A 3,200-Token Language Spec for Capability-Based Agent Sandboxing

How a statically typed embedded language uses capability-based permissions to let agents write and execute code without full system access.

Source: moglang.org
Mog: A 3,200-Token Language Spec for Capability-Based Agent Sandboxing

When you let an LLM generate executable code, you face a binary choice: run it with full system access or run it in a sandbox that blocks everything interesting. Mog takes a third path. It is a statically typed, compiled, embedded language with a 3,200-token spec designed for agent authorship. The host application controls exactly which functions a compiled Mog program can call, using capability-based permissions at the language level rather than OS-level isolation.

The workflow is straightforward. An agent writes a Mog program. The host compiles it. The host dynamically loads it as a plugin, script, or hook. At runtime, the Mog program can only call functions the host explicitly granted. If the agent tries to invoke a capability it does not have, the call fails at the boundary.

Why Capability-Based Permissions Beat Traditional Sandboxing

Traditional sandboxing uses containers, VMs, or seccomp filters to restrict what a process can do at the OS level. You block syscalls, network access, or filesystem paths. These approaches work for isolating untrusted binaries, but they are coarse-grained. You either allow open() or you do not. You cannot say “this process can read /data/user123 but not /data/user456” without complex policy files or SELinux rules.

Mog moves the permission boundary into the language runtime. The host application defines a capability set before loading the Mog program. Each capability is a named function the Mog code can call. If the agent writes code that tries to invoke http.get() but the host only granted log.write(), the call fails immediately. No syscall filtering, no container overhead, no policy engine.

This matters for agent-generated plugins because:

  • Granular control: You can grant database.read() but not database.write().
  • Dynamic loading: Plugins can be compiled and loaded without restarting the host.
  • Composability: Different plugins can have different capability sets in the same process.
  • Auditability: The host logs every capability invocation with arguments and return values, so you see exactly what the agent tried to do and what it returned.

The 3,200-Token Language Spec

Mog’s spec fits in 3,200 tokens because it omits features that agents rarely use correctly. No operator precedence (all operators are flat, left-to-right). No implicit type conversions. No classes or inheritance. No macros. No preprocessor.

What it includes:

  • Static typing with inference
  • Functions and closures
  • Arrays, maps, and strings
  • Pattern matching
  • Async/await for I/O
  • Built-in math and string methods

The small spec means you can include the entire language reference in an LLM context window alongside the task description. The agent does not need to guess syntax or look up documentation. It writes valid Mog code in one shot.

Example Mog program:

fn calculate_risk(balance: f64, threshold: f64) -> str {
  if balance < threshold {
    return "high"
  } else if balance < threshold * 2.0 {
    return "medium"
  } else {
    return "low"
  }
}

fn main() {
  let risk := calculate_risk(1500.0, 1000.0)
  log.write("Risk level: " + risk)
}

The agent writes this. The host compiles it. The host grants the log.write capability. The program runs. If the agent tries to add http.get("https://evil.com"), the compiler accepts it (the syntax is valid), but the runtime rejects it (the capability was not granted).

How the Host Controls Capabilities

The host application defines capabilities as a map of function names to native implementations. When the Mog program is loaded, the runtime checks every external function call against the capability map. If the function exists in the map, the call proceeds. If not, the runtime returns an error.

Capability grant example (pseudocode):

let mut capabilities = CapabilitySet::new();
capabilities.grant("log.write", |msg: String| {
  println!("[LOG] {}", msg);
});
capabilities.grant("database.read", |query: String| {
  execute_read_only_query(query)
});

let program = compile_mog_source(agent_generated_code)?;
let runtime = Runtime::new(program, capabilities);
runtime.execute()?;

The Mog program cannot call database.write because the host never granted it. The Mog program cannot call std::process::Command because that is not a Mog function. The Mog program cannot allocate unbounded memory because the host sets a memory limit when creating the runtime.

Failure Modes and Observability

Mog programs can fail in three ways:

  1. Compile-time errors: The agent writes invalid syntax or type mismatches. The host catches this before loading the program.
  2. Runtime capability errors: The agent calls a function it does not have permission to use. The host logs the attempt and returns an error to the Mog program.
  3. Logic errors: The agent writes valid Mog code that compiles and runs but produces incorrect results. This is the hardest to catch.

For logic errors, you need observability. The host can inject a trace.event() capability that the agent calls at decision points. The host logs these events to a structured log or trace backend. You can then replay the execution and see where the agent’s logic diverged from expectations.

Example trace capability:

fn process_order(order: Map) {
  trace.event("order_received", order)
  let total := calculate_total(order)
  trace.event("total_calculated", {"total": total})
  if total > 1000.0 {
    trace.event("high_value_order", {"total": total})
    notify_manager(order)
  }
}

The host collects these trace events and correlates them with the agent’s prompt, the generated Mog code, and the final outcome. If the agent’s logic is wrong, you see exactly where it made the bad decision.

Dynamic Plugin Loading and State Isolation

Mog programs are compiled to bytecode and loaded into a runtime. The host can load multiple Mog programs in the same process, each with its own capability set and memory space. Plugins do not share state unless the host explicitly provides a shared capability (like a key-value store).

Loading flow:

  1. Agent generates Mog source code.
  2. Host compiles source to bytecode.
  3. Host creates a new runtime with a specific capability set.
  4. Host loads bytecode into runtime.
  5. Host calls the main() function or a specific entry point.
  6. Mog program runs until completion or timeout.
  7. Host unloads the runtime and reclaims memory.

The host does not need to restart. You can load, run, and unload Mog programs on every request. This is useful for agent-generated webhooks, scheduled tasks, or user-defined automation rules.

State isolation means a malicious or buggy Mog program cannot corrupt another plugin’s data. Each runtime has its own heap. If a Mog program crashes, the host catches the error and continues running other plugins.

Trade-Offs: Mog vs. Alternatives

ApproachGranularityOverheadAgent UsabilityState Isolation
MogPer-function capabilityLow (in-process)High (3,200-token spec)Per-runtime heap
ContainersPer-syscall or networkHigh (process boundary)Low (full language, large attack surface)Full process isolation
WASM sandboxesPer-importMedium (WASM runtime)Medium (depends on language)Per-instance memory
Lua/Python sandboxingMonkey-patching globalsLow (in-process)Medium (agents know the language)Shared state unless careful

Mog trades language expressiveness for security and agent usability. You cannot write a kernel in Mog. You can write a plugin that processes data, calls APIs, and makes decisions.

When to Use Mog

Use Mog when you need agents to generate executable code that runs in production, you want fine-grained control over what the generated code can do, you need to load and unload agent-generated plugins dynamically, you want the full language spec to fit in the agent’s context window, and you need auditability for every external function call.

Avoid Mog when you need a general-purpose language with a large standard library, you are running code from human developers who expect Python, JavaScript, or Go, you need to integrate with existing libraries that are not exposed as capabilities, you need operator precedence or complex type hierarchies, or you are okay with coarse-grained OS-level sandboxing (containers are simpler).

Technical Verdict

Mog is a purpose-built language for agent-generated plugins. It solves the problem of letting LLMs write executable code without giving them full system access. The capability-based permission model is more flexible than OS-level sandboxing and more auditable than monkey-patching globals in a dynamic language. The 3,200-token spec means agents can learn the entire language in one context window.

The trade-off is expressiveness. Mog is not a replacement for Python or JavaScript. It is a replacement for unsafe eval() or uncontrolled subprocess execution. If you are building an agent platform where agents need to write custom logic, Mog gives you a security boundary that is both enforceable and observable.

The failure mode to watch is logic errors. Mog prevents agents from calling unauthorized functions, but it does not prevent agents from writing code that compiles and runs but does the wrong thing. You need observability, testing, and human review for that.