mech.app
AI Agents

CHAP: The Protocol That Lets Humans and Agents Negotiate Responsibility Boundaries in Production

How CHAP defines structured handoff points, approval gates, and escalation paths when agents move from chat into operational roles affecting real work.

Source: arxiv.org
CHAP: The Protocol That Lets Humans and Agents Negotiate Responsibility Boundaries in Production

Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions.

Production deployments are no longer one human supervising one model. They are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries. The technical surface for this collaboration remains weakly specified.

When an agent drafts a response and a human edits it before it ships, the moment of human judgement is the most valuable signal in the system. In current practice it is recorded, if at all, in application code, chat threads, ticket comments, and tribal memory.

CHAP (Collaborative Human-Agent Protocol) addresses this gap. It defines structured handoff points, approval gates, and escalation paths for agents that do real work.

The Problem CHAP Solves

Two protocol standards address adjacent concerns:

  • MCP standardizes agent access to tools and data
  • A2A standardizes agent-to-agent interoperability

Neither defines the shared workspace in which humans and agents perform accountable work together.

CHAP fills that gap. Under CHAP:

  • The override that used to vanish into a chat thread becomes a structured event carrying a diff, a rationale, and a content hash
  • The handoff between shifts becomes a portable envelope rather than a pinned message
  • The human approval of an agent’s draft becomes a non-repudiable signed decision that can be replayed years later

Core Protocol Components

CHAP achieves this through a small core and composable profiles.

Core Primitives

Workspaces are the boundary of shared context. A workspace contains:

  • Participants (humans and agents)
  • Tasks (units of work with state)
  • Artefacts (versioned outputs)
  • An append-only evidence log

Participants have roles and capabilities. The protocol distinguishes:

  • Humans with approval authority
  • Agents with execution authority
  • Observers with read-only access

Tasks move through a state machine:

PROPOSED → ASSIGNED → IN_PROGRESS → REVIEW_REQUIRED → APPROVED → COMPLETED

                                REJECTED → REASSIGNED

Artefacts are versioned content objects. Each artefact includes:

  • Content hash (SHA-256)
  • Parent hash (for diffs)
  • Author (human or agent ID)
  • Timestamp
  • Optional signature

Evidence log is an append-only sequence of events. Every state transition, approval, override, and handoff is logged with:

  • Event type
  • Actor ID
  • Timestamp
  • Payload (structured JSON)
  • Optional cryptographic signature

State Machine for Handoffs

CHAP defines explicit triggers for escalation:

TriggerConditionAction
Confidence thresholdAgent confidence < 0.7Escalate to human review
Policy boundaryAction requires approval per policyBlock until approval received
TimeoutNo human response within SLAEscalate to next tier or rollback
ConflictMultiple agents propose contradictory actionsEscalate to human arbitration
OverrideHuman edits agent outputLog diff and rationale

When an agent escalates, it emits a REVIEW_REQUIRED event containing:

  • Task ID
  • Proposed action
  • Confidence score
  • Supporting evidence
  • Suggested reviewers

The task state transitions to REVIEW_REQUIRED. The agent blocks or continues other work depending on the task dependency graph.

Context Preservation Across Boundaries

When a human reviews an agent’s work, CHAP preserves:

  • Full conversation history
  • Tool calls made
  • Data accessed
  • Intermediate reasoning steps
  • Confidence scores at each step

This context is serialized into the task payload. The human sees the same state the agent saw when it requested review.

When the human approves or overrides, the response flows back as a structured event:

{
  "event_type": "APPROVAL",
  "task_id": "task-1234",
  "reviewer_id": "human-alice",
  "decision": "APPROVED_WITH_CHANGES",
  "diff": {
    "before_hash": "abc123",
    "after_hash": "def456",
    "changes": []
  },
  "rationale": "Updated claim amount based on policy section 4.2",
  "signature": "..."
}

The agent receives this event, applies the changes, and logs the override in the evidence log.

Multi-Agent Coordination

CHAP handles multi-agent coordination through workspace-level locking and dependency graphs.

When one agent needs human approval but others are blocked waiting, CHAP defines:

Queuing semantics: Tasks waiting for review enter a priority queue. Humans see a unified review dashboard across all agents.

Timeout behavior: If no human responds within the configured SLA, the task can:

  • Escalate to a higher tier
  • Rollback to the last approved state
  • Proceed with a fallback action (if policy allows)

Rollback semantics: If a human rejects a task, CHAP can:

  • Revert the workspace to the last approved artefact
  • Notify dependent agents to pause or rollback
  • Log the rejection reason for future training

Dependency Graph Example

Agent A: Draft contract → REVIEW_REQUIRED
Agent B: Schedule signing (blocked on Agent A)
Agent C: Notify stakeholders (blocked on Agent B)

If the human rejects Agent A’s draft, CHAP automatically cancels Agent B and Agent C’s tasks and logs the cascade.

Responsibility Tracking and Audit

CHAP specifies metadata that agents must emit at each step:

Decision attribution: Every action is attributed to a participant ID. If an agent acts autonomously, the agent ID is logged. If a human approves, the human ID is logged.

Audit logging structure: The evidence log is structured as a sequence of events:

{
  "log_id": "log-5678",
  "workspace_id": "workspace-abc",
  "events": [
    {
      "seq": 1,
      "timestamp": "2026-06-09T10:00:00Z",
      "event_type": "TASK_CREATED",
      "actor_id": "agent-x",
      "payload": {}
    },
    {
      "seq": 2,
      "timestamp": "2026-06-09T10:05:00Z",
      "event_type": "REVIEW_REQUIRED",
      "actor_id": "agent-x",
      "payload": {}
    },
    {
      "seq": 3,
      "timestamp": "2026-06-09T10:15:00Z",
      "event_type": "APPROVAL",
      "actor_id": "human-alice",
      "payload": {},
      "signature": "..."
    }
  ]
}

Metadata requirements: Each event must include:

  • Actor ID (human or agent)
  • Timestamp (ISO 8601)
  • Event type (from a fixed enum)
  • Payload (structured JSON matching the event schema)
  • Optional signature (for non-repudiation)

This structure supports:

  • Compliance audits (who approved what and when)
  • Incident investigation (what led to a bad decision)
  • Training data (human overrides as ground truth)

Composable Profiles

CHAP’s core is minimal. Profiles add domain-specific behavior:

Review profile adds:

  • Multi-stage approval workflows
  • Reviewer assignment rules
  • Escalation policies

Modes profile adds:

  • Autonomous mode (agent proceeds without approval)
  • Supervised mode (agent requests approval for all actions)
  • Audit mode (agent logs all actions but does not block)

Security profile adds:

  • Cryptographic signatures on approvals
  • Role-based access control
  • Data access logging

Profiles compose. A production deployment might use Review + Security + Modes together.

Implementation Shape

A CHAP-compliant system needs:

Workspace service: Manages workspaces, participants, and tasks. Exposes REST or gRPC API.

Evidence store: Append-only database (e.g., PostgreSQL with insert-only tables, or a dedicated event store like EventStoreDB).

Agent runtime: Integrates CHAP client library. Emits events at state transitions.

Human interface: Dashboard for reviewing tasks, approving actions, and viewing evidence logs.

Policy engine: Evaluates rules to determine when escalation is required.

Code Snippet: Agent Requesting Review

from chap import Client, Task, ReviewRequest

client = Client(workspace_id="workspace-abc")

# Agent completes draft
task = Task(
    id="task-1234",
    type="CONTRACT_DRAFT",
    artefact={"content": "...", "hash": "abc123"}
)

# Agent confidence is low, request review
if confidence < 0.7:
    review_request = ReviewRequest(
        task_id=task.id,
        confidence=confidence,
        rationale="Ambiguous clause in section 3",
        suggested_reviewers=["human-alice"]
    )
    client.request_review(review_request)
    
    # Block until approval
    approval = client.wait_for_approval(task.id, timeout=3600)
    
    if approval.decision == "APPROVED_WITH_CHANGES":
        # Apply human edits
        task.artefact = approval.updated_artefact
        client.log_override(task.id, approval.diff, approval.rationale)

Observability and Failure Modes

Observability: CHAP’s evidence log is the primary observability surface. Metrics to track:

  • Review latency (time from REVIEW_REQUIRED to APPROVAL)
  • Override rate (percentage of tasks with human edits)
  • Escalation rate (percentage of tasks requiring review)
  • Timeout rate (percentage of tasks that timeout waiting for review)

Failure modes:

FailureImpactMitigation
Human never respondsTask stuck in REVIEW_REQUIREDTimeout policy with escalation or rollback
Evidence log corruptionAudit trail lostReplicate log to multiple stores, use content hashes
Agent ignores approvalHuman override not appliedRuntime enforcement: block agent actions until approval applied
Workspace state divergenceAgents see inconsistent stateUse distributed locking or optimistic concurrency control

Deployment Considerations

Single-tenant vs. multi-tenant: CHAP workspaces can be scoped to a single team or shared across an organization. Multi-tenant deployments need stronger isolation and access control.

Synchronous vs. asynchronous: Agents can block waiting for approval (synchronous) or continue other work and poll for approval (asynchronous). Asynchronous is more scalable but requires careful dependency management.

On-premise vs. cloud: CHAP’s evidence log can be stored on-premise for compliance or in a managed cloud service for scalability.

Technical Verdict

Use CHAP when:

  • Agents make decisions that require human approval (claims processing, contract review, clinical triage)
  • You need a structured audit trail for compliance or incident investigation
  • Multiple agents and humans collaborate on shared work across shifts or time zones
  • You want to capture human overrides as training data for future models

Avoid CHAP when:

  • Agents operate in fully autonomous mode with no human oversight (e.g., batch data processing)
  • Latency is critical and human review would block real-time operations
  • Your deployment is a single agent with a single human supervisor (the overhead is not justified)
  • You need real-time collaboration (CHAP is designed for asynchronous handoffs, not live co-editing)

CHAP is infrastructure for production systems where responsibility matters. It turns tribal knowledge about who approved what into a durable, queryable protocol.

Tags

agentic-ai orchestration infrastructure

Primary Source

arxiv.org