Strudai's Client-Side Agent Loop: Why Browser-Based Music Coding Agents Run Without a Backend

Strudai wraps the Strudel live-coding music environment with an agentic layer that runs entirely in the browser. No backend except model inference. No server-side state. No database. The agent loop, code execution, and real-time collaboration all happen client-side, with users bringing their own Anthropic or OpenRouter API keys.

This architecture exposes the plumbing constraints of browser-based agent orchestration: single-threaded JavaScript event loops, no persistent state, API keys in localStorage, and synchronous code execution in a sandbox.

Why Client-Side Agent Orchestration

Most agent frameworks assume server-side orchestration. LangGraph, AutoGen, and CrewAI all expect a backend to manage state, coordinate tool calls, and persist conversation history. Strudai inverts this by running the entire agent loop in the browser.

Why this works for live-coding music:

No latency tolerance. Music generation needs immediate feedback. Roundtrips to a backend add 50-200ms per interaction.
Ephemeral sessions. Live sets are one-off performances. No need to persist state beyond the browser session.
User-owned compute. The browser tab is the runtime. No server scaling, no cold starts, no queue management.
BYOK security model. Users paste their own API keys. No server-side credential storage or proxy layer.

What this breaks:

No cross-session memory. Refresh the page, lose the conversation history.
No multi-user state sync. Collaboration happens in a single browser instance, not across devices.
API key exposure. Keys live in localStorage. If an attacker gets XSS, they get the key.

Architecture: Agent Loop in a Single Thread

Strudai’s agent loop runs in the browser’s main JavaScript thread. The Strudel environment already runs there, so the agent shares the same event loop.

Core components:

Agent orchestrator (JavaScript module): Manages prompt construction, model API calls, and response parsing.
Strudel runtime (existing): Executes live-coding DSL, generates audio, renders visuals.
Code editor (CodeMirror or Monaco): Displays agent-generated code in real time.
Model API client (fetch wrapper): Streams responses from Anthropic or OpenRouter.

Flow for a single agent turn:

async function agentTurn(userPrompt, context) {
  // 1. Build prompt with current code state
  const systemPrompt = buildSystemPrompt(context.currentCode, context.history);
  
  // 2. Stream model response
  const stream = await fetchModelStream({
    model: 'claude-3-5-sonnet',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userPrompt }
    ],
    apiKey: localStorage.getItem('anthropic_key')
  });
  
  // 3. Parse streaming response for code blocks
  let codeBuffer = '';
  for await (const chunk of stream) {
    if (chunk.type === 'code_block') {
      codeBuffer += chunk.content;
      // Update editor in real time
      editor.setValue(codeBuffer);
      // Execute in Strudel runtime
      strudel.evaluate(codeBuffer);
    }
  }
  
  // 4. Store turn in session memory
  context.history.push({ user: userPrompt, assistant: codeBuffer });
  
  return codeBuffer;
}

Key constraint: JavaScript is single-threaded. While the model streams a response, the browser can’t execute other code unless you yield control with await or setTimeout. Strudai uses async/await to interleave model streaming with code execution.

State Management Without a Backend

The agent needs to track conversation history, current code state, and user preferences. Without a server, all state lives in browser memory or localStorage.

State storage layers:

Layer	Scope	Persistence	Use Case
In-memory object	Single session	Lost on refresh	Conversation history, current code buffer
localStorage	Per-origin	Survives refresh	API keys, user preferences, last session snapshot
IndexedDB	Per-origin	Survives refresh	Large session history, audio samples (future)
URL params	Shareable	Stateless	Share a specific code state via link

Example state object:

const sessionState = {
  history: [
    { role: 'user', content: 'make a techno beat' },
    { role: 'assistant', content: 's("bd sd").fast(2)' }
  ],
  currentCode: 's("bd sd").fast(2)',
  preferences: {
    model: 'claude-3-5-sonnet',
    temperature: 0.7,
    autoExecute: true
  }
};

Persistence strategy:

On every agent turn: Append to sessionState.history in memory.
On page unload: Serialize sessionState to localStorage as a snapshot.
On page load: Restore from localStorage if available.

What breaks: If the user opens two tabs, each has independent state. No cross-tab sync. No conflict resolution. This is acceptable for live-coding sessions but would break collaborative editing.

BYOK Security Boundaries

Strudai asks users to paste their Anthropic or OpenRouter API key into a text field. The key is stored in localStorage and sent directly to the model provider from the browser.

Why BYOK:

No backend credential storage. The developers never see user keys.
No proxy layer. Requests go straight from browser to Anthropic/OpenRouter.
No rate limiting infrastructure. Users manage their own quotas.

Security risks:

XSS exposure. If an attacker injects JavaScript, they can read localStorage and exfiltrate the key.
No key rotation. Users must manually update keys if compromised.
No scoped permissions. The key has full account access, not just Strudai-scoped access.

Mitigation strategies:

Content Security Policy (CSP). Restrict inline scripts and external script sources.
Subresource Integrity (SRI). Pin hashes of third-party scripts.
Session-only storage option. Store keys in sessionStorage instead of localStorage (lost on tab close).

What this doesn’t solve: If the Strudai domain is compromised, attackers can modify the JavaScript to steal keys before CSP applies. BYOK shifts trust from the backend to the frontend delivery mechanism.

Real-Time Collaboration: Human and Agent in the Same Editor

Strudai lets you edit code while the agent is generating. Both human and agent write to the same CodeMirror instance.

Conflict scenarios:

Agent writes line 5 while human edits line 3. No conflict. Both changes apply.
Agent overwrites entire buffer while human is typing. Human’s changes are lost.
Agent streams code slowly while human executes manually. Strudel runtime sees partial code.

Current resolution strategy:

Agent writes win. When the agent streams a new code block, it replaces the entire buffer.
Manual execution gate. User can toggle autoExecute to prevent the agent from running code automatically.

Better strategy (not implemented):

Use Operational Transformation (OT) or CRDTs to merge agent and human edits. Libraries like Yjs or Automerge could handle this, but they add 50-100KB to the bundle and require rethinking the agent’s code generation format.

Trade-off table:

Approach	Complexity	Bundle Size	Conflict Handling
Agent overwrites	Low	0 KB	Human loses
Manual execution toggle	Low	0 KB	Human controls timing
OT/CRDT merge	High	50-100 KB	Both edits preserved

Browser Sandbox Constraints

Running in the browser means no file system access, no native process spawning, and no network access outside CORS-allowed origins.

What the agent can’t do:

Read local audio files. Unless the user uploads via <input type="file">.
Write to disk. Can only trigger downloads via blob URLs.
Call arbitrary APIs. Limited to CORS-enabled endpoints or proxies.
Access system audio devices directly. Must use Web Audio API.

What the agent can do:

Generate and play audio. Strudel uses Web Audio API to synthesize sound in real time.
Render visuals. Canvas and WebGL for graphics.
Fetch from allowed origins. Strudel can load samples from CDNs with CORS headers.

Workaround for file access:

If the agent needs to load a custom sample, the user must upload it manually. The agent can then reference it by name:

// Agent-generated code
samples('github:user/repo/samples')
s("kick snare").sound()

The samples() function fetches from a GitHub repo with CORS enabled. No local file system required.

Streaming Model Responses and Code Execution Sync

The agent streams code from the model and executes it in Strudel as it arrives. This creates a synchronization problem: partial code is invalid.

Example streaming sequence:

Chunk 1: "s("
Chunk 2: "bd sd"
Chunk 3: "").fast(2)"

If you execute after chunk 1, Strudel throws a syntax error. If you wait until chunk 3, there’s no real-time feedback.

Current strategy:

Buffer until code block complete. Wait for the model to close the code fence (```) before executing.
Execute on newline. If the agent generates multi-line code, execute each complete line.

Better strategy:

Use a parser to detect syntactically complete expressions. Execute when the parser confirms validity. Libraries like Acorn (JavaScript parser) could do this, but add overhead.

Failure mode:

If the model generates invalid code, Strudel throws an error. The agent doesn’t see the error unless the orchestrator captures it and feeds it back in the next prompt.

Error feedback loop:

try {
  strudel.evaluate(codeBuffer);
} catch (error) {
  // Feed error back to agent
  const fixPrompt = `The code threw an error: ${error.message}. Fix it.`;
  await agentTurn(fixPrompt, context);
}

This creates a retry loop. The agent sees the error and generates a fix. Without this, the agent is blind to execution failures.

Deployment Shape

Strudai is a static site. No server-side rendering, no API routes, no database. The entire app is HTML, CSS, and JavaScript served from a CDN or static host.

Deployment options:

Vercel/Netlify. Push to GitHub, auto-deploy on commit.
GitHub Pages. Free hosting for AGPL-3.0 projects.
Self-hosted CDN. Nginx serving static files.

Cost structure:

Hosting: $0 (GitHub Pages) to $20/month (Vercel Pro).
Model inference: User pays via their own API key.
Bandwidth: Negligible. The app is ~500 KB gzipped.

Scaling:

No backend means no scaling problem. Each user runs the agent in their own browser. The only shared resource is the static file CDN.

Observability Gaps

Client-side agents are hard to observe. No centralized logs, no trace IDs, no metrics aggregation.

What you can’t see:

How many users are running agents. No analytics unless you add a tracking script.
What prompts users send. No server-side logging.
Where the agent fails. Errors happen in the user’s console, not your logs.

What you can add:

Client-side error tracking. Sentry or LogRocket to capture exceptions.
Anonymous telemetry. Send sanitized usage stats to a lightweight backend (Plausible, PostHog).
Replay sessions. Record user interactions for debugging (privacy concerns).

Trade-off:

Adding observability reintroduces a backend dependency. If you want zero backend, you accept zero visibility.

Likely Failure Modes

1. API key exhaustion.

User hits rate limit or runs out of credits mid-session. The agent stops working. No fallback.

2. Model generates infinite loop.

Agent writes code that runs forever. Browser tab freezes. User must kill the tab.

3. CORS blocks sample loading.

Agent tries to load audio from a non-CORS origin. Fetch fails. No audio plays.

4. XSS via malicious code generation.

Agent generates code that injects a script tag. If Strudel’s eval is not sandboxed, attacker code runs.

5. State loss on accidental refresh.

User refreshes mid-session. Conversation history is lost unless persisted to localStorage.

Technical Verdict

Use Strudai’s architecture when:

Sessions are ephemeral and don’t need cross-device sync.
Users are willing to manage their own API keys.
You want zero backend infrastructure and hosting costs.
Real-time feedback is more important than robust error handling.
Your domain has strong CSP and you trust your static file delivery.

Avoid this architecture when:

You need persistent state across sessions or devices.
API key security is a hard requirement (enterprise, compliance).
You need centralized observability and error tracking.
Users expect collaborative editing with conflict resolution.
Your agent needs file system access or non-CORS APIs.

Client-side agent orchestration works for constrained, single-user, ephemeral workflows. It breaks when you need durability, security, or observability. Strudai proves the pattern is viable for live-coding music. Whether it scales to other domains depends on how much you’re willing to sacrifice.

Source Links

Primary: Strudai
GitHub: strudai/strudai
Discussion: Show HN