Skip to content

Security Model

CKP applies a defense-in-depth approach to agent security. Seven layers work together so that no single compromise leads to full agent takeover. Each layer narrows what the agent can do, and permissions can only flow downward.


Layer 1 Channel ── Access control (who can talk to the agent)
Layer 2 Policy ── Rule engine (what is allowed / denied / requires approval)
Layer 3 Sandbox ── Isolation (where tools execute)
Layer 4 Provider ── Token governance (how much the agent can spend)
Layer 5 Memory ── Data scoping (what the agent remembers and where)
Layer 6 Swarm ── Trust boundaries (which agents can collaborate)
Layer 7 Identity ── Autonomy level (how much freedom the agent has)

Each layer grants permissions that its parent allows. Tool permissions cannot exceed Sandbox permissions, Sandbox cannot exceed Policy, Policy cannot exceed Identity, and so on.


Human (highest trust)
└── Channel (authenticated session)
└── Identity (declared autonomy level)
└── Policy (behavioral rules)
└── Sandbox (execution constraints)
└── Tool (lowest trust, most constrained)

A Tool can never do more than its Sandbox allows. A Sandbox can never permit more than its Policy allows. This ensures that adding a new tool cannot accidentally bypass security boundaries.


ThreatLayerMitigation
Unauthorized human accessChannelAllowlists, authentication, rate limiting
Prompt injectionPolicyPattern-based detection + LLM-based detection
Secret exfiltrationSandbox + PolicyLeak scanning, network restrictions
SSRF attacksSandboxDNS pinning, IP blocking, allowlisted domains
Destructive executionPolicyApproval gates (require-approval action)
Cost runawayProviderToken limits, cost caps, daily quotas
Cross-agent contaminationSwarm + MemoryMemory scoping per agent, trust boundaries
Malicious imported skillsSkill + SandboxSkill permissions declaration, sandboxed execution
Privilege escalationSandboxContainer/WASM isolation, resource limits

Channels define who can interact with the agent. Each channel supports an access_control block:

channels:
- name: slack-team
kind: slack
access_control:
mode: allowlist
identifiers: ["U12345", "U67890"]

Modes: allowlist (explicit allow), denylist (explicit deny), or implementation-defined defaults.

Policies define what the agent can do. Rules match patterns and apply actions:

policies:
- name: security
rules:
- pattern: "tool:file-delete"
action: deny
- pattern: "tool:web-fetch"
action: require-approval
- pattern: "tool:calculator"
action: allow
prompt_injection:
detection: pattern
action: deny

Actions: allow, deny, require-approval, audit-only.

Sandboxes define where and how tools execute:

sandbox:
level: container
limits:
memory_mb: 512
cpu_shares: 256
timeout_ms: 30000
network:
mode: restricted
allowed_domains: ["api.example.com"]

Levels (ascending isolation): none, process, wasm, container, vm.

Providers enforce how much the agent can spend:

providers:
- name: claude
kind: anthropic
model: claude-sonnet-4-20250514
limits:
max_tokens_per_request: 4096
max_tokens_per_day: 100000
max_cost_per_day_usd: 10.00

When a policy rule specifies require-approval, the runtime pauses tool execution and requests human confirmation:

1. Agent decides to call tool
2. Policy evaluates → require-approval
3. Runtime sends approval request to Channel
4. Human approves (claw.tool.approve) or denies (claw.tool.deny)
5. Runtime proceeds or aborts
6. If no response within timeout → error -32012 (Approval timeout)

This ensures humans remain in the loop for sensitive operations without blocking autonomous execution of safe tools.


The layers compose multiplicatively. An agent with:

  • A Channel allowlist of 3 users
  • A Policy that denies file deletion
  • A Sandbox with no network access
  • A Provider capped at 10K tokens/day

…has a very narrow attack surface. Each layer independently reduces risk, and an attacker must compromise all layers simultaneously to gain full control.