Testing

CKP includes a conformance test harness (ckp-test) with 31 core wire vectors across three levels. Use it to validate your manifests and prove your agent’s conformance level.

Installing ckp-test

git clone https://github.com/angelgalvisc/ckp-test.git
cd ckp-test
npm install
npm run build

The harness validates CKP 0.2.0 and 0.3.0 manifests. CKP 0.3.0 adds schema-level validation for optional world_models declarations and skills[].world_model_ref.

Validating a manifest

Check that a claw.yaml file is structurally valid before running any agent code:

node dist/cli.js validate path/to/claw.yaml

The validator checks:

Required top-level fields (claw, kind, metadata, spec)
kind: Claw for root manifests
spec.identity is present
spec.providers contains at least one entry
Inline primitives have their required fields
skills[].world_model_ref resolves when present
world_models[].memory_ref and constraints.policy_ref point to declared primitives when present

Running conformance tests

Point the harness at a running agent (via stdio) and specify the target level:

node dist/cli.js run \
  --target "node path/to/your/agent.js" \
  --manifest path/to/claw.yaml \
  --level 1

The --target argument is the command that launches your agent. The harness starts it as a child process, communicates over stdio using JSON-RPC 2.0, and sends each test vector as a request.

Options

Flag	Description
`--target`	Command to launch the agent (e.g., `"node dist/agent.js"`)
`--manifest`	Path to the agent’s `claw.yaml` manifest
`--level`	Conformance level to test: `1`, `2`, or `3`

Example: testing the SDK

# Build SDK examples
cd clawkernel/sdk && npm run build:dev

# Run L1 vectors
node ../../ckp-test/dist/cli.js run \
  --target "node dist/examples/l1-agent.js" \
  --manifest examples/l2.claw.yaml \
  --level 1

# Run all levels (L3 includes L1 + L2)
node ../../ckp-test/dist/cli.js run \
  --target "node dist/examples/l3-agent.js" \
  --manifest examples/l3.claw.yaml \
  --level 3

Test vector overview

The 31 vectors are organized by conformance level. Each vector specifies an input, the expected outcome, and a reference to the normative spec section.

Level 1: Core (13 vectors)

L1 tests cover identity, provider, lifecycle, and transport fundamentals.

Vector	Description
TV-L1-01	Valid minimal manifest (accept)
TV-L1-02	Manifest missing identity (reject)
TV-L1-03	Manifest missing providers (reject)
TV-L1-04	Initialize happy path
TV-L1-05	Initialize with version mismatch (error -32001)
TV-L1-06	Status query (returns lifecycle state + uptime)
TV-L1-07	Graceful shutdown
TV-L1-08	Initialized notification
TV-L1-09	Manifest with empty providers (reject)
TV-L1-10	Unknown method (error -32601)
TV-L1-11	Invalid request — missing method (error -32600)
TV-L1-12	Parse error — malformed JSON (error -32700)
TV-L1-13	Heartbeat notification validation

Level 2: Standard (10 vectors)

L2 tests exercise the tool execution pipeline, gates, and approval flow.

Vector	Description
TV-L2-01	Valid L2 manifest (accept)
TV-L2-02	Tool call with valid arguments
TV-L2-03	Tool call with invalid arguments (error -32602)
TV-L2-04	Policy-denied tool call (error -32011)
TV-L2-05	Tool execution timeout (error -32014)
TV-L2-06	Approval flow — happy path
TV-L2-07	Approval flow — timeout (error -32012)
TV-L2-08	Approval flow — explicit deny (error -32013)
TV-L2-09	Tool call blocked by sandbox (error -32010)
TV-L2-10	Provider quota exceeded (error -32021)

Level 3: Full (8 vectors)

L3 tests cover memory operations, swarm coordination, and access control validation.

Vector	Description
TV-L3-01	Valid L3 manifest with all 9 core primitives (`WorldModel` and Telemetry optional) (accept)
TV-L3-02	Swarm delegate + report round-trip
TV-L3-03	Memory store + query round-trip
TV-L3-04	Allowlist mode with roles field (reject — invalid)
TV-L3-05	Role-based mode with allowed_ids (reject — invalid)
TV-L3-06	Swarm broadcast notification
TV-L3-07	Swarm discover peers
TV-L3-08	Memory compact

Two details are important:

TV-L3-02 covers the full claw.swarm.delegate -> claw.swarm.report round-trip. claw.swarm.report is not a standalone conformance vector.
TV-L3-04 and TV-L3-05 are manifest-validation vectors. They verify invalid Channel.access_control combinations and do not require a live wire exchange.

Coherence gate

Beyond conformance tests, CKP provides a coherence auditor that validates consistency across the specification documents themselves:

./tools/coherence-audit.sh spec/ reports/

The auditor runs 10 rules:

Error code coherence across spec and test vectors
Method contract consistency
JSON/YAML/ABNF syntax validation
Normative boundary enforcement
Cross-reference validation
ABNF grammar conformance
Conformance level correctness
MUST-level requirement coverage
Field name consistency
Editorial consistency

Requirements: bash, jq, python3 (with PyYAML), perl.

Interpreting results

The harness assigns one of three verdicts per level:

Verdict	Meaning
CONFORMANT	All vectors pass (0 skips, 0 fails, 0 errors)
PARTIAL	Some vectors pass, but at least one is skipped. No failures.
NON-CONFORMANT	At least one vector fails

Important rules

A skip means the vector was not executed (e.g., the agent lacks the required transport or feature). Skipped vectors prevent a CONFORMANT verdict — the best possible result with any skips is PARTIAL.
A fail means the agent responded incorrectly (wrong error code, missing fields, invalid structure). Any failure results in NON-CONFORMANT.
An error means the harness itself encountered an unexpected problem.

Example output

Level 1: 13/13 PASS, 0 SKIP, 0 FAIL -> L1 CONFORMANT
Level 2: 10/10 PASS, 0 SKIP, 0 FAIL -> L2 CONFORMANT
Level 3: 8/8 PASS, 0 SKIP, 0 FAIL -> L3 CONFORMANT

The overall result is the lowest verdict across all tested levels.

Skip justifications

When a vector is skipped, the report includes a justification. Common reasons:

No transport configured — the agent does not expose a stdio JSON-RPC listener (e.g., WhatsApp-only agents)
Feature not implemented — the agent lacks a specific capability (e.g., no approval workflow)
Protocol not negotiated — the agent does not support version negotiation

Skip justifications are documented in the conformance report and in per-agent compatibility profiles.