Policy

The Policy primitive defines behavioral rules — what the agent is allowed to do, what requires approval, and what is always denied. Policies bind to tools, skills, categories, or the entire agent using a firewall-style first-match-wins rule engine.

URI pattern: claw://local/policy/{name}

Schema

claw: "0.3.0"
kind: Policy
metadata:
  name: "standard-policy"
  version: "1.0.0"
spec:
  rules:
    - id: "deny-destructive"
      action: "deny"
      scope: "tool"
      match:
        annotations:
          destructiveHint: true
      reason: "Destructive tools are blocked by default"

    - id: "approve-network"
      action: "require-approval"
      scope: "category"
      match:
        category: "network"
      reason: "Network access requires human confirmation"
      approval:
        timeout_seconds: 300
        default_if_timeout: "deny"

    - id: "allow-readonly"
      action: "allow"
      scope: "tool"
      match:
        annotations:
          readOnlyHint: true

    - id: "allow-workspace-fs"
      action: "allow"
      scope: "category"
      match:
        category: "filesystem"
      conditions:
        path_within: "/workspace"

    - id: "default-deny"
      action: "deny"
      scope: "all"
      reason: "Default deny policy"

  prompt_injection:
    detection: "hybrid"           # "pattern" | "llm-based" | "hybrid" | "none"
    pattern_engine: "aho-corasick"
    pattern_count: 50
    action: "block-and-log"       # "block-and-log" | "warn" | "log-only" | "ignore"

  secret_scanning:
    enabled: true
    scope: "output"               # "input" | "output" | "both"
    patterns: 22
    action: "redact"              # "redact" | "block" | "warn"

  input_validation:
    max_size_bytes: 102400
    null_byte_detection: true
    whitespace_analysis: true
    encoding: "utf-8"

  rate_limits:
    tool_calls_per_minute: 30
    tokens_per_hour: 100000
    cost_per_day_usd: 10.00

  audit:
    log_inputs: true
    log_outputs: true
    log_approvals: true
    retention: "90d"
    destination: "file"           # "file" | "sqlite" | "webhook" | "syslog"

Key Fields

Field	Required	Description
`rules`	Yes	Ordered list of rules. First match wins.
`rules[].id`	Yes	Unique rule identifier.
`rules[].action`	Yes	One of `allow`, `deny`, `require-approval`, `audit-only`.
`rules[].scope`	Yes	What the rule applies to: `tool`, `category`, `skill`, or `all`.
`rules[].match`	Yes	Matching criteria (annotations, category, tool name).
`prompt_injection`	No	Prompt injection defense configuration.
`secret_scanning`	No	Secret leak prevention in inputs/outputs.
`input_validation`	No	Input size and encoding validation.
`rate_limits`	No	Rate limiting and spending caps.
`audit`	No	Audit logging configuration.

Rule Actions

Action	Behavior
`allow`	Execute without further checks.
`deny`	Block execution and log the attempt.
`require-approval`	Pause execution, present to human via Channel, wait for approval.
`audit-only`	Allow execution but emit a detailed audit event.

Rule Evaluation

Rules are evaluated using first-match-wins semantics:

Rules are checked in array order.
The first rule whose match criteria are satisfied is applied.
Subsequent rules are NOT evaluated for that request.
If no rule matches, the runtime MUST deny the action (implicit default-deny).

When multiple Policy primitives are referenced in a manifest, the runtime MUST evaluate them as a single concatenated rule list in the order they appear in the policies array.

Validation Rules

The rules array MUST contain at least one entry.
Rules MUST be evaluated in array order. The first matching rule MUST be applied; subsequent rules MUST NOT be evaluated for that request.
If no rule matches a given tool call, the runtime MUST deny the action (implicit default-deny).
When multiple Policy primitives are referenced, the runtime MUST evaluate them as a single concatenated rule list in manifest order.
For require-approval, if the approval timeout expires and default_if_timeout is deny, the runtime MUST deny the action with error code -32012.

Design Rationale

The Policy primitive uses a firewall-style first-match-wins rule engine with four actions. It integrates prompt injection detection, secret scanning, rate limiting, and spending caps into a single auditable declaration. This makes an agent’s behavioral constraints explicit, versionable, and portable across implementations.