Testing
CKP includes a conformance test harness (ckp-test) with 31 core wire vectors across three levels. Use it to validate your manifests and prove your agent’s conformance level.
Installing ckp-test
Section titled “Installing ckp-test”git clone https://github.com/angelgalvisc/ckp-test.gitcd ckp-testnpm installnpm run buildThe harness validates CKP 0.2.0 and 0.3.0 manifests. CKP 0.3.0 adds schema-level validation for optional world_models declarations and skills[].world_model_ref.
Validating a manifest
Section titled “Validating a manifest”Check that a claw.yaml file is structurally valid before running any agent code:
node dist/cli.js validate path/to/claw.yamlThe validator checks:
- Required top-level fields (
claw,kind,metadata,spec) kind: Clawfor root manifestsspec.identityis presentspec.providerscontains at least one entry- Inline primitives have their required fields
skills[].world_model_refresolves when presentworld_models[].memory_refandconstraints.policy_refpoint to declared primitives when present
Running conformance tests
Section titled “Running conformance tests”Point the harness at a running agent (via stdio) and specify the target level:
node dist/cli.js run \ --target "node path/to/your/agent.js" \ --manifest path/to/claw.yaml \ --level 1The --target argument is the command that launches your agent. The harness starts it as a child process, communicates over stdio using JSON-RPC 2.0, and sends each test vector as a request.
Options
Section titled “Options”| Flag | Description |
|---|---|
--target | Command to launch the agent (e.g., "node dist/agent.js") |
--manifest | Path to the agent’s claw.yaml manifest |
--level | Conformance level to test: 1, 2, or 3 |
Example: testing the SDK
Section titled “Example: testing the SDK”# Build SDK examplescd clawkernel/sdk && npm run build:dev
# Run L1 vectorsnode ../../ckp-test/dist/cli.js run \ --target "node dist/examples/l1-agent.js" \ --manifest examples/l2.claw.yaml \ --level 1
# Run all levels (L3 includes L1 + L2)node ../../ckp-test/dist/cli.js run \ --target "node dist/examples/l3-agent.js" \ --manifest examples/l3.claw.yaml \ --level 3Test vector overview
Section titled “Test vector overview”The 31 vectors are organized by conformance level. Each vector specifies an input, the expected outcome, and a reference to the normative spec section.
Level 1: Core (13 vectors)
Section titled “Level 1: Core (13 vectors)”L1 tests cover identity, provider, lifecycle, and transport fundamentals.
| Vector | Description |
|---|---|
| TV-L1-01 | Valid minimal manifest (accept) |
| TV-L1-02 | Manifest missing identity (reject) |
| TV-L1-03 | Manifest missing providers (reject) |
| TV-L1-04 | Initialize happy path |
| TV-L1-05 | Initialize with version mismatch (error -32001) |
| TV-L1-06 | Status query (returns lifecycle state + uptime) |
| TV-L1-07 | Graceful shutdown |
| TV-L1-08 | Initialized notification |
| TV-L1-09 | Manifest with empty providers (reject) |
| TV-L1-10 | Unknown method (error -32601) |
| TV-L1-11 | Invalid request — missing method (error -32600) |
| TV-L1-12 | Parse error — malformed JSON (error -32700) |
| TV-L1-13 | Heartbeat notification validation |
Level 2: Standard (10 vectors)
Section titled “Level 2: Standard (10 vectors)”L2 tests exercise the tool execution pipeline, gates, and approval flow.
| Vector | Description |
|---|---|
| TV-L2-01 | Valid L2 manifest (accept) |
| TV-L2-02 | Tool call with valid arguments |
| TV-L2-03 | Tool call with invalid arguments (error -32602) |
| TV-L2-04 | Policy-denied tool call (error -32011) |
| TV-L2-05 | Tool execution timeout (error -32014) |
| TV-L2-06 | Approval flow — happy path |
| TV-L2-07 | Approval flow — timeout (error -32012) |
| TV-L2-08 | Approval flow — explicit deny (error -32013) |
| TV-L2-09 | Tool call blocked by sandbox (error -32010) |
| TV-L2-10 | Provider quota exceeded (error -32021) |
Level 3: Full (8 vectors)
Section titled “Level 3: Full (8 vectors)”L3 tests cover memory operations, swarm coordination, and access control validation.
| Vector | Description |
|---|---|
| TV-L3-01 | Valid L3 manifest with all 9 core primitives (WorldModel and Telemetry optional) (accept) |
| TV-L3-02 | Swarm delegate + report round-trip |
| TV-L3-03 | Memory store + query round-trip |
| TV-L3-04 | Allowlist mode with roles field (reject — invalid) |
| TV-L3-05 | Role-based mode with allowed_ids (reject — invalid) |
| TV-L3-06 | Swarm broadcast notification |
| TV-L3-07 | Swarm discover peers |
| TV-L3-08 | Memory compact |
Two details are important:
TV-L3-02covers the fullclaw.swarm.delegate->claw.swarm.reportround-trip.claw.swarm.reportis not a standalone conformance vector.TV-L3-04andTV-L3-05are manifest-validation vectors. They verify invalidChannel.access_controlcombinations and do not require a live wire exchange.
Coherence gate
Section titled “Coherence gate”Beyond conformance tests, CKP provides a coherence auditor that validates consistency across the specification documents themselves:
./tools/coherence-audit.sh spec/ reports/The auditor runs 10 rules:
- Error code coherence across spec and test vectors
- Method contract consistency
- JSON/YAML/ABNF syntax validation
- Normative boundary enforcement
- Cross-reference validation
- ABNF grammar conformance
- Conformance level correctness
- MUST-level requirement coverage
- Field name consistency
- Editorial consistency
Requirements: bash, jq, python3 (with PyYAML), perl.
Interpreting results
Section titled “Interpreting results”The harness assigns one of three verdicts per level:
| Verdict | Meaning |
|---|---|
| CONFORMANT | All vectors pass (0 skips, 0 fails, 0 errors) |
| PARTIAL | Some vectors pass, but at least one is skipped. No failures. |
| NON-CONFORMANT | At least one vector fails |
Important rules
Section titled “Important rules”- A skip means the vector was not executed (e.g., the agent lacks the required transport or feature). Skipped vectors prevent a CONFORMANT verdict — the best possible result with any skips is PARTIAL.
- A fail means the agent responded incorrectly (wrong error code, missing fields, invalid structure). Any failure results in NON-CONFORMANT.
- An error means the harness itself encountered an unexpected problem.
Example output
Section titled “Example output”Level 1: 13/13 PASS, 0 SKIP, 0 FAIL -> L1 CONFORMANTLevel 2: 10/10 PASS, 0 SKIP, 0 FAIL -> L2 CONFORMANTLevel 3: 8/8 PASS, 0 SKIP, 0 FAIL -> L3 CONFORMANTThe overall result is the lowest verdict across all tested levels.
Skip justifications
Section titled “Skip justifications”When a vector is skipped, the report includes a justification. Common reasons:
- No transport configured — the agent does not expose a stdio JSON-RPC listener (e.g., WhatsApp-only agents)
- Feature not implemented — the agent lacks a specific capability (e.g., no approval workflow)
- Protocol not negotiated — the agent does not support version negotiation
Skip justifications are documented in the conformance report and in per-agent compatibility profiles.