Skip to content

Testing

CKP includes a conformance test harness (ckp-test) with 31 core wire vectors across three levels. Use it to validate your manifests and prove your agent’s conformance level.

Terminal window
git clone https://github.com/angelgalvisc/ckp-test.git
cd ckp-test
npm install
npm run build

The harness validates CKP 0.2.0 and 0.3.0 manifests. CKP 0.3.0 adds schema-level validation for optional world_models declarations and skills[].world_model_ref.


Check that a claw.yaml file is structurally valid before running any agent code:

Terminal window
node dist/cli.js validate path/to/claw.yaml

The validator checks:

  • Required top-level fields (claw, kind, metadata, spec)
  • kind: Claw for root manifests
  • spec.identity is present
  • spec.providers contains at least one entry
  • Inline primitives have their required fields
  • skills[].world_model_ref resolves when present
  • world_models[].memory_ref and constraints.policy_ref point to declared primitives when present

Point the harness at a running agent (via stdio) and specify the target level:

Terminal window
node dist/cli.js run \
--target "node path/to/your/agent.js" \
--manifest path/to/claw.yaml \
--level 1

The --target argument is the command that launches your agent. The harness starts it as a child process, communicates over stdio using JSON-RPC 2.0, and sends each test vector as a request.

FlagDescription
--targetCommand to launch the agent (e.g., "node dist/agent.js")
--manifestPath to the agent’s claw.yaml manifest
--levelConformance level to test: 1, 2, or 3
Terminal window
# Build SDK examples
cd clawkernel/sdk && npm run build:dev
# Run L1 vectors
node ../../ckp-test/dist/cli.js run \
--target "node dist/examples/l1-agent.js" \
--manifest examples/l2.claw.yaml \
--level 1
# Run all levels (L3 includes L1 + L2)
node ../../ckp-test/dist/cli.js run \
--target "node dist/examples/l3-agent.js" \
--manifest examples/l3.claw.yaml \
--level 3

The 31 vectors are organized by conformance level. Each vector specifies an input, the expected outcome, and a reference to the normative spec section.

L1 tests cover identity, provider, lifecycle, and transport fundamentals.

VectorDescription
TV-L1-01Valid minimal manifest (accept)
TV-L1-02Manifest missing identity (reject)
TV-L1-03Manifest missing providers (reject)
TV-L1-04Initialize happy path
TV-L1-05Initialize with version mismatch (error -32001)
TV-L1-06Status query (returns lifecycle state + uptime)
TV-L1-07Graceful shutdown
TV-L1-08Initialized notification
TV-L1-09Manifest with empty providers (reject)
TV-L1-10Unknown method (error -32601)
TV-L1-11Invalid request — missing method (error -32600)
TV-L1-12Parse error — malformed JSON (error -32700)
TV-L1-13Heartbeat notification validation

L2 tests exercise the tool execution pipeline, gates, and approval flow.

VectorDescription
TV-L2-01Valid L2 manifest (accept)
TV-L2-02Tool call with valid arguments
TV-L2-03Tool call with invalid arguments (error -32602)
TV-L2-04Policy-denied tool call (error -32011)
TV-L2-05Tool execution timeout (error -32014)
TV-L2-06Approval flow — happy path
TV-L2-07Approval flow — timeout (error -32012)
TV-L2-08Approval flow — explicit deny (error -32013)
TV-L2-09Tool call blocked by sandbox (error -32010)
TV-L2-10Provider quota exceeded (error -32021)

L3 tests cover memory operations, swarm coordination, and access control validation.

VectorDescription
TV-L3-01Valid L3 manifest with all 9 core primitives (WorldModel and Telemetry optional) (accept)
TV-L3-02Swarm delegate + report round-trip
TV-L3-03Memory store + query round-trip
TV-L3-04Allowlist mode with roles field (reject — invalid)
TV-L3-05Role-based mode with allowed_ids (reject — invalid)
TV-L3-06Swarm broadcast notification
TV-L3-07Swarm discover peers
TV-L3-08Memory compact

Two details are important:

  • TV-L3-02 covers the full claw.swarm.delegate -> claw.swarm.report round-trip. claw.swarm.report is not a standalone conformance vector.
  • TV-L3-04 and TV-L3-05 are manifest-validation vectors. They verify invalid Channel.access_control combinations and do not require a live wire exchange.

Beyond conformance tests, CKP provides a coherence auditor that validates consistency across the specification documents themselves:

Terminal window
./tools/coherence-audit.sh spec/ reports/

The auditor runs 10 rules:

  1. Error code coherence across spec and test vectors
  2. Method contract consistency
  3. JSON/YAML/ABNF syntax validation
  4. Normative boundary enforcement
  5. Cross-reference validation
  6. ABNF grammar conformance
  7. Conformance level correctness
  8. MUST-level requirement coverage
  9. Field name consistency
  10. Editorial consistency

Requirements: bash, jq, python3 (with PyYAML), perl.


The harness assigns one of three verdicts per level:

VerdictMeaning
CONFORMANTAll vectors pass (0 skips, 0 fails, 0 errors)
PARTIALSome vectors pass, but at least one is skipped. No failures.
NON-CONFORMANTAt least one vector fails
  • A skip means the vector was not executed (e.g., the agent lacks the required transport or feature). Skipped vectors prevent a CONFORMANT verdict — the best possible result with any skips is PARTIAL.
  • A fail means the agent responded incorrectly (wrong error code, missing fields, invalid structure). Any failure results in NON-CONFORMANT.
  • An error means the harness itself encountered an unexpected problem.
Level 1: 13/13 PASS, 0 SKIP, 0 FAIL -> L1 CONFORMANT
Level 2: 10/10 PASS, 0 SKIP, 0 FAIL -> L2 CONFORMANT
Level 3: 8/8 PASS, 0 SKIP, 0 FAIL -> L3 CONFORMANT

The overall result is the lowest verdict across all tested levels.


When a vector is skipped, the report includes a justification. Common reasons:

  • No transport configured — the agent does not expose a stdio JSON-RPC listener (e.g., WhatsApp-only agents)
  • Feature not implemented — the agent lacks a specific capability (e.g., no approval workflow)
  • Protocol not negotiated — the agent does not support version negotiation

Skip justifications are documented in the conformance report and in per-agent compatibility profiles.