Agent Lifecycle

Every CKP agent follows a deterministic lifecycle with five states. The lifecycle ensures agents start cleanly, run predictably, and shut down gracefully.

State diagram

           ┌──────────────────────────────────────────┐
           │                                          │
   ┌───────▼──────┐    ┌──────────┐    ┌───────┐    ┌▼────────┐
   │     INIT     ├───►│ STARTING ├───►│ READY ├───►│STOPPING │
   └──────────────┘    └─────┬────┘    └───┬───┘    └────┬────┘
                             │             │             │
                             ▼             │             ▼
                        ┌────────┐         │       ┌─────────┐
                        │ ERROR  │◄────────┘       │ STOPPED │
                        └────┬───┘                 └─────────┘
                             │
                             └──────► STOPPING ──► STOPPED

INIT

The agent parses and validates its manifest before any resources are allocated.

Steps:

Parse claw.yaml (or JSON equivalent)
Resolve file references and registry URIs (claw://registry/...)
Validate each primitive against its JSON Schema
Check primitive compatibility (e.g., tools referenced by skills must exist)
Infer conformance level from primitives present

Transitions:

On success → STARTING
On validation failure → ERROR (code -32060: Manifest invalid)

The claw.initialize message triggers this phase. The runtime MUST NOT allocate connections, start processes, or consume tokens during INIT.

STARTING

The agent initializes its runtime resources in dependency order.

Steps:

Initialize Memory stores (connect to backends, load state)
Connect to Provider endpoints (verify API keys, check availability)
Start Sandbox runtimes (spawn processes, containers, or WASM modules)
Open Channel connections (authenticate with Telegram, Slack, etc.)
Load Policy rules into the rule engine
Discover peer agents (if Swarm is configured)

Transitions:

On success → READY
On failure → ERROR (with specific error code per subsystem)

Initialization order follows the dependency graph. Memory and Provider initialize first because other primitives may depend on them.

READY

The agent loop is running. The agent processes messages, reasons with LLMs, executes tools, and coordinates with peers.

Behavior:

Messages from Channels are routed to Identity for processing
Provider handles LLM inference requests
Tool and Skill executions run within Sandbox and Policy constraints
Memory is continuously read and written
Swarm coordination is active (if configured)
Heartbeat notifications are sent to the Operator (default: every 30s)

Transitions:

On claw.shutdown → STOPPING
On unrecoverable error → ERROR

The claw.status method returns "ready" during this phase.

STOPPING

The agent drains in-flight operations and releases resources in reverse initialization order.

Steps:

Stop accepting new messages from Channels
Drain in-flight tool executions (wait up to configured timeout)
Close Channel connections gracefully
Flush Memory (persist pending writes)
Stop Sandbox runtimes
Disconnect from Providers

Transitions:

On success → STOPPED (terminal)
On timeout or failure → STOPPED (with warnings logged)

The claw.shutdown message accepts a drain_timeout_ms parameter. In-flight operations that exceed this timeout are terminated.

ERROR

An error occurred that prevents normal operation.

Behavior:

Log error details with full context (primitive, method, parameters)
Attempt recovery if the error is retryable:
- Provider unavailable (-32020) → retry with exponential backoff
- Memory backend error (-32030) → retry
- Peer unreachable (-32040) → retry
If recovery succeeds → return to READY
If unrecoverable → transition to STOPPING

Unrecoverable errors:

Manifest invalid (-32060)
All providers exhausted (no fallback available)
Sandbox runtime crashed without restart capability
Policy configuration contradiction

Lifecycle events

Each state transition emits a structured event for Telemetry:

{
  "event": "lifecycle.transition",
  "timestamp": "2026-02-23T13:00:00.000Z",
  "data": {
    "from": "STARTING",
    "to": "READY",
    "duration_ms": 1250,
    "primitives_initialized": [
      "identity", "provider", "channel",
      "tool", "sandbox", "policy"
    ]
  }
}

These events enable operators to monitor startup times, detect degradation, and alert on repeated ERROR transitions.

Summary

State	Entry Condition	Key Activity	Exit
INIT	`claw.initialize` received	Validate manifest	STARTING or ERROR
STARTING	Manifest valid	Allocate resources	READY or ERROR
READY	Resources initialized	Agent loop running	STOPPING or ERROR
STOPPING	Shutdown requested or error	Drain and release	STOPPED
ERROR	Any failure	Retry or escalate	READY or STOPPING