Security Model

CKP applies a defense-in-depth approach to agent security. Seven layers work together so that no single compromise leads to full agent takeover. Each layer narrows what the agent can do, and permissions can only flow downward.

The 7 layers

Layer 1  Channel     ── Access control (who can talk to the agent)
Layer 2  Policy      ── Rule engine (what is allowed / denied / requires approval)
Layer 3  Sandbox     ── Isolation (where tools execute)
Layer 4  Provider    ── Token governance (how much the agent can spend)
Layer 5  Memory      ── Data scoping (what the agent remembers and where)
Layer 6  Swarm       ── Trust boundaries (which agents can collaborate)
Layer 7  Identity    ── Autonomy level (how much freedom the agent has)

Each layer grants permissions that its parent allows. Tool permissions cannot exceed Sandbox permissions, Sandbox cannot exceed Policy, Policy cannot exceed Identity, and so on.

Trust hierarchy

Human (highest trust)
  └── Channel (authenticated session)
        └── Identity (declared autonomy level)
              └── Policy (behavioral rules)
                    └── Sandbox (execution constraints)
                          └── Tool (lowest trust, most constrained)

A Tool can never do more than its Sandbox allows. A Sandbox can never permit more than its Policy allows. This ensures that adding a new tool cannot accidentally bypass security boundaries.

Threat model

Threat	Layer	Mitigation
Unauthorized human access	Channel	Allowlists, authentication, rate limiting
Prompt injection	Policy	Pattern-based detection + LLM-based detection
Secret exfiltration	Sandbox + Policy	Leak scanning, network restrictions
SSRF attacks	Sandbox	DNS pinning, IP blocking, allowlisted domains
Destructive execution	Policy	Approval gates (`require-approval` action)
Cost runaway	Provider	Token limits, cost caps, daily quotas
Cross-agent contamination	Swarm + Memory	Memory scoping per agent, trust boundaries
Malicious imported skills	Skill + Sandbox	Skill permissions declaration, sandboxed execution
Privilege escalation	Sandbox	Container/WASM isolation, resource limits

Key mechanisms

Access control (Channel)

Channels define who can interact with the agent. Each channel supports an access_control block:

channels:
  - name: slack-team
    kind: slack
    access_control:
      mode: allowlist
      identifiers: ["U12345", "U67890"]

Modes: allowlist (explicit allow), denylist (explicit deny), or implementation-defined defaults.

Rule engine (Policy)

Policies define what the agent can do. Rules match patterns and apply actions:

policies:
  - name: security
    rules:
      - pattern: "tool:file-delete"
        action: deny
      - pattern: "tool:web-fetch"
        action: require-approval
      - pattern: "tool:calculator"
        action: allow
    prompt_injection:
      detection: pattern
      action: deny

Actions: allow, deny, require-approval, audit-only.

Execution isolation (Sandbox)

Sandboxes define where and how tools execute:

sandbox:
  level: container
  limits:
    memory_mb: 512
    cpu_shares: 256
    timeout_ms: 30000
  network:
    mode: restricted
    allowed_domains: ["api.example.com"]

Levels (ascending isolation): none, process, wasm, container, vm.

Token governance (Provider)

Providers enforce how much the agent can spend:

providers:
  - name: claude
    kind: anthropic
    model: claude-sonnet-4-20250514
    limits:
      max_tokens_per_request: 4096
      max_tokens_per_day: 100000
      max_cost_per_day_usd: 10.00

Approval flow

When a policy rule specifies require-approval, the runtime pauses tool execution and requests human confirmation:

1. Agent decides to call tool
2. Policy evaluates → require-approval
3. Runtime sends approval request to Channel
4. Human approves (claw.tool.approve) or denies (claw.tool.deny)
5. Runtime proceeds or aborts
6. If no response within timeout → error -32012 (Approval timeout)

This ensures humans remain in the loop for sensitive operations without blocking autonomous execution of safe tools.

Defense composition

The layers compose multiplicatively. An agent with:

A Channel allowlist of 3 users
A Policy that denies file deletion
A Sandbox with no network access
A Provider capped at 10K tokens/day

…has a very narrow attack surface. Each layer independently reduces risk, and an attacker must compromise all layers simultaneously to gain full control.