Agent Lifecycle
Every CKP agent follows a deterministic lifecycle with five states. The lifecycle ensures agents start cleanly, run predictably, and shut down gracefully.
State diagram
Section titled “State diagram” ┌──────────────────────────────────────────┐ │ │ ┌───────▼──────┐ ┌──────────┐ ┌───────┐ ┌▼────────┐ │ INIT ├───►│ STARTING ├───►│ READY ├───►│STOPPING │ └──────────────┘ └─────┬────┘ └───┬───┘ └────┬────┘ │ │ │ ▼ │ ▼ ┌────────┐ │ ┌─────────┐ │ ERROR │◄────────┘ │ STOPPED │ └────┬───┘ └─────────┘ │ └──────► STOPPING ──► STOPPEDThe agent parses and validates its manifest before any resources are allocated.
Steps:
- Parse
claw.yaml(or JSON equivalent) - Resolve file references and registry URIs (
claw://registry/...) - Validate each primitive against its JSON Schema
- Check primitive compatibility (e.g., tools referenced by skills must exist)
- Infer conformance level from primitives present
Transitions:
- On success → STARTING
- On validation failure → ERROR (code
-32060: Manifest invalid)
The claw.initialize message triggers this phase. The runtime MUST NOT allocate connections, start processes, or consume tokens during INIT.
STARTING
Section titled “STARTING”The agent initializes its runtime resources in dependency order.
Steps:
- Initialize Memory stores (connect to backends, load state)
- Connect to Provider endpoints (verify API keys, check availability)
- Start Sandbox runtimes (spawn processes, containers, or WASM modules)
- Open Channel connections (authenticate with Telegram, Slack, etc.)
- Load Policy rules into the rule engine
- Discover peer agents (if Swarm is configured)
Transitions:
- On success → READY
- On failure → ERROR (with specific error code per subsystem)
Initialization order follows the dependency graph. Memory and Provider initialize first because other primitives may depend on them.
The agent loop is running. The agent processes messages, reasons with LLMs, executes tools, and coordinates with peers.
Behavior:
- Messages from Channels are routed to Identity for processing
- Provider handles LLM inference requests
- Tool and Skill executions run within Sandbox and Policy constraints
- Memory is continuously read and written
- Swarm coordination is active (if configured)
- Heartbeat notifications are sent to the Operator (default: every 30s)
Transitions:
- On
claw.shutdown→ STOPPING - On unrecoverable error → ERROR
The claw.status method returns "ready" during this phase.
STOPPING
Section titled “STOPPING”The agent drains in-flight operations and releases resources in reverse initialization order.
Steps:
- Stop accepting new messages from Channels
- Drain in-flight tool executions (wait up to configured timeout)
- Close Channel connections gracefully
- Flush Memory (persist pending writes)
- Stop Sandbox runtimes
- Disconnect from Providers
Transitions:
- On success → STOPPED (terminal)
- On timeout or failure → STOPPED (with warnings logged)
The claw.shutdown message accepts a drain_timeout_ms parameter. In-flight operations that exceed this timeout are terminated.
An error occurred that prevents normal operation.
Behavior:
- Log error details with full context (primitive, method, parameters)
- Attempt recovery if the error is retryable:
- Provider unavailable (
-32020) → retry with exponential backoff - Memory backend error (
-32030) → retry - Peer unreachable (
-32040) → retry
- Provider unavailable (
- If recovery succeeds → return to READY
- If unrecoverable → transition to STOPPING
Unrecoverable errors:
- Manifest invalid (
-32060) - All providers exhausted (no fallback available)
- Sandbox runtime crashed without restart capability
- Policy configuration contradiction
Lifecycle events
Section titled “Lifecycle events”Each state transition emits a structured event for Telemetry:
{ "event": "lifecycle.transition", "timestamp": "2026-02-23T13:00:00.000Z", "data": { "from": "STARTING", "to": "READY", "duration_ms": 1250, "primitives_initialized": [ "identity", "provider", "channel", "tool", "sandbox", "policy" ] }}These events enable operators to monitor startup times, detect degradation, and alert on repeated ERROR transitions.
Summary
Section titled “Summary”| State | Entry Condition | Key Activity | Exit |
|---|---|---|---|
| INIT | claw.initialize received | Validate manifest | STARTING or ERROR |
| STARTING | Manifest valid | Allocate resources | READY or ERROR |
| READY | Resources initialized | Agent loop running | STOPPING or ERROR |
| STOPPING | Shutdown requested or error | Drain and release | STOPPED |
| ERROR | Any failure | Retry or escalate | READY or STOPPING |