CHAP: The Protocol That Lets Humans and Agents Negotiate Responsibility Boundaries in Production

Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions.

Production deployments are no longer one human supervising one model. They are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries. The technical surface for this collaboration remains weakly specified.

When an agent drafts a response and a human edits it before it ships, the moment of human judgement is the most valuable signal in the system. In current practice it is recorded, if at all, in application code, chat threads, ticket comments, and tribal memory.

CHAP (Collaborative Human-Agent Protocol) addresses this gap. It defines structured handoff points, approval gates, and escalation paths for agents that do real work.

The Problem CHAP Solves

Two protocol standards address adjacent concerns:

MCP standardizes agent access to tools and data
A2A standardizes agent-to-agent interoperability

Neither defines the shared workspace in which humans and agents perform accountable work together.

CHAP fills that gap. Under CHAP:

The override that used to vanish into a chat thread becomes a structured event carrying a diff, a rationale, and a content hash
The handoff between shifts becomes a portable envelope rather than a pinned message
The human approval of an agent’s draft becomes a non-repudiable signed decision that can be replayed years later

Core Protocol Components

CHAP achieves this through a small core and composable profiles.

Core Primitives

Workspaces are the boundary of shared context. A workspace contains:

Participants (humans and agents)
Tasks (units of work with state)
Artefacts (versioned outputs)
An append-only evidence log

Participants have roles and capabilities. The protocol distinguishes:

Humans with approval authority
Agents with execution authority
Observers with read-only access

Tasks move through a state machine:

PROPOSED → ASSIGNED → IN_PROGRESS → REVIEW_REQUIRED → APPROVED → COMPLETED
                                   ↓
                                REJECTED → REASSIGNED

Artefacts are versioned content objects. Each artefact includes:

Content hash (SHA-256)
Parent hash (for diffs)
Author (human or agent ID)
Timestamp
Optional signature

Evidence log is an append-only sequence of events. Every state transition, approval, override, and handoff is logged with:

Event type
Actor ID
Timestamp
Payload (structured JSON)
Optional cryptographic signature

State Machine for Handoffs

CHAP defines explicit triggers for escalation:

Trigger	Condition	Action
Confidence threshold	Agent confidence < 0.7	Escalate to human review
Policy boundary	Action requires approval per policy	Block until approval received
Timeout	No human response within SLA	Escalate to next tier or rollback
Conflict	Multiple agents propose contradictory actions	Escalate to human arbitration
Override	Human edits agent output	Log diff and rationale

When an agent escalates, it emits a REVIEW_REQUIRED event containing:

Task ID
Proposed action
Confidence score
Supporting evidence
Suggested reviewers

The task state transitions to REVIEW_REQUIRED. The agent blocks or continues other work depending on the task dependency graph.

Context Preservation Across Boundaries

When a human reviews an agent’s work, CHAP preserves:

Full conversation history
Tool calls made
Data accessed
Intermediate reasoning steps
Confidence scores at each step

This context is serialized into the task payload. The human sees the same state the agent saw when it requested review.

When the human approves or overrides, the response flows back as a structured event:

{
  "event_type": "APPROVAL",
  "task_id": "task-1234",
  "reviewer_id": "human-alice",
  "decision": "APPROVED_WITH_CHANGES",
  "diff": {
    "before_hash": "abc123",
    "after_hash": "def456",
    "changes": []
  },
  "rationale": "Updated claim amount based on policy section 4.2",
  "signature": "..."
}

The agent receives this event, applies the changes, and logs the override in the evidence log.

Multi-Agent Coordination

CHAP handles multi-agent coordination through workspace-level locking and dependency graphs.

When one agent needs human approval but others are blocked waiting, CHAP defines:

Queuing semantics: Tasks waiting for review enter a priority queue. Humans see a unified review dashboard across all agents.

Timeout behavior: If no human responds within the configured SLA, the task can:

Escalate to a higher tier
Rollback to the last approved state
Proceed with a fallback action (if policy allows)

Rollback semantics: If a human rejects a task, CHAP can:

Revert the workspace to the last approved artefact
Notify dependent agents to pause or rollback
Log the rejection reason for future training

Dependency Graph Example

Agent A: Draft contract → REVIEW_REQUIRED
Agent B: Schedule signing (blocked on Agent A)
Agent C: Notify stakeholders (blocked on Agent B)

If the human rejects Agent A’s draft, CHAP automatically cancels Agent B and Agent C’s tasks and logs the cascade.

Responsibility Tracking and Audit

CHAP specifies metadata that agents must emit at each step:

Decision attribution: Every action is attributed to a participant ID. If an agent acts autonomously, the agent ID is logged. If a human approves, the human ID is logged.

Audit logging structure: The evidence log is structured as a sequence of events:

{
  "log_id": "log-5678",
  "workspace_id": "workspace-abc",
  "events": [
    {
      "seq": 1,
      "timestamp": "2026-06-09T10:00:00Z",
      "event_type": "TASK_CREATED",
      "actor_id": "agent-x",
      "payload": {}
    },
    {
      "seq": 2,
      "timestamp": "2026-06-09T10:05:00Z",
      "event_type": "REVIEW_REQUIRED",
      "actor_id": "agent-x",
      "payload": {}
    },
    {
      "seq": 3,
      "timestamp": "2026-06-09T10:15:00Z",
      "event_type": "APPROVAL",
      "actor_id": "human-alice",
      "payload": {},
      "signature": "..."
    }
  ]
}

Metadata requirements: Each event must include:

Actor ID (human or agent)
Timestamp (ISO 8601)
Event type (from a fixed enum)
Payload (structured JSON matching the event schema)
Optional signature (for non-repudiation)

This structure supports:

Compliance audits (who approved what and when)
Incident investigation (what led to a bad decision)
Training data (human overrides as ground truth)

Composable Profiles

CHAP’s core is minimal. Profiles add domain-specific behavior:

Review profile adds:

Multi-stage approval workflows
Reviewer assignment rules
Escalation policies

Modes profile adds:

Autonomous mode (agent proceeds without approval)
Supervised mode (agent requests approval for all actions)
Audit mode (agent logs all actions but does not block)

Security profile adds:

Cryptographic signatures on approvals
Role-based access control
Data access logging

Profiles compose. A production deployment might use Review + Security + Modes together.

Implementation Shape

A CHAP-compliant system needs:

Workspace service: Manages workspaces, participants, and tasks. Exposes REST or gRPC API.

Evidence store: Append-only database (e.g., PostgreSQL with insert-only tables, or a dedicated event store like EventStoreDB).

Agent runtime: Integrates CHAP client library. Emits events at state transitions.

Human interface: Dashboard for reviewing tasks, approving actions, and viewing evidence logs.

Policy engine: Evaluates rules to determine when escalation is required.

Code Snippet: Agent Requesting Review

from chap import Client, Task, ReviewRequest

client = Client(workspace_id="workspace-abc")

# Agent completes draft
task = Task(
    id="task-1234",
    type="CONTRACT_DRAFT",
    artefact={"content": "...", "hash": "abc123"}
)

# Agent confidence is low, request review
if confidence < 0.7:
    review_request = ReviewRequest(
        task_id=task.id,
        confidence=confidence,
        rationale="Ambiguous clause in section 3",
        suggested_reviewers=["human-alice"]
    )
    client.request_review(review_request)
    
    # Block until approval
    approval = client.wait_for_approval(task.id, timeout=3600)
    
    if approval.decision == "APPROVED_WITH_CHANGES":
        # Apply human edits
        task.artefact = approval.updated_artefact
        client.log_override(task.id, approval.diff, approval.rationale)

Observability and Failure Modes

Observability: CHAP’s evidence log is the primary observability surface. Metrics to track:

Review latency (time from REVIEW_REQUIRED to APPROVAL)
Override rate (percentage of tasks with human edits)
Escalation rate (percentage of tasks requiring review)
Timeout rate (percentage of tasks that timeout waiting for review)

Failure modes:

Failure	Impact	Mitigation
Human never responds	Task stuck in `REVIEW_REQUIRED`	Timeout policy with escalation or rollback
Evidence log corruption	Audit trail lost	Replicate log to multiple stores, use content hashes
Agent ignores approval	Human override not applied	Runtime enforcement: block agent actions until approval applied
Workspace state divergence	Agents see inconsistent state	Use distributed locking or optimistic concurrency control

Deployment Considerations

Single-tenant vs. multi-tenant: CHAP workspaces can be scoped to a single team or shared across an organization. Multi-tenant deployments need stronger isolation and access control.

Synchronous vs. asynchronous: Agents can block waiting for approval (synchronous) or continue other work and poll for approval (asynchronous). Asynchronous is more scalable but requires careful dependency management.

On-premise vs. cloud: CHAP’s evidence log can be stored on-premise for compliance or in a managed cloud service for scalability.

Technical Verdict

Use CHAP when:

Agents make decisions that require human approval (claims processing, contract review, clinical triage)
You need a structured audit trail for compliance or incident investigation
Multiple agents and humans collaborate on shared work across shifts or time zones
You want to capture human overrides as training data for future models

Avoid CHAP when:

Agents operate in fully autonomous mode with no human oversight (e.g., batch data processing)
Latency is critical and human review would block real-time operations
Your deployment is a single agent with a single human supervisor (the overhead is not justified)
You need real-time collaboration (CHAP is designed for asynchronous handoffs, not live co-editing)

CHAP is infrastructure for production systems where responsibility matters. It turns tribal knowledge about who approved what into a durable, queryable protocol.