Skip to content

Agent Lifecycle

Every CKP agent follows a deterministic lifecycle with five states. The lifecycle ensures agents start cleanly, run predictably, and shut down gracefully.


┌──────────────────────────────────────────┐
│ │
┌───────▼──────┐ ┌──────────┐ ┌───────┐ ┌▼────────┐
│ INIT ├───►│ STARTING ├───►│ READY ├───►│STOPPING │
└──────────────┘ └─────┬────┘ └───┬───┘ └────┬────┘
│ │ │
▼ │ ▼
┌────────┐ │ ┌─────────┐
│ ERROR │◄────────┘ │ STOPPED │
└────┬───┘ └─────────┘
└──────► STOPPING ──► STOPPED

The agent parses and validates its manifest before any resources are allocated.

Steps:

  1. Parse claw.yaml (or JSON equivalent)
  2. Resolve file references and registry URIs (claw://registry/...)
  3. Validate each primitive against its JSON Schema
  4. Check primitive compatibility (e.g., tools referenced by skills must exist)
  5. Infer conformance level from primitives present

Transitions:

  • On success → STARTING
  • On validation failure → ERROR (code -32060: Manifest invalid)

The claw.initialize message triggers this phase. The runtime MUST NOT allocate connections, start processes, or consume tokens during INIT.


The agent initializes its runtime resources in dependency order.

Steps:

  1. Initialize Memory stores (connect to backends, load state)
  2. Connect to Provider endpoints (verify API keys, check availability)
  3. Start Sandbox runtimes (spawn processes, containers, or WASM modules)
  4. Open Channel connections (authenticate with Telegram, Slack, etc.)
  5. Load Policy rules into the rule engine
  6. Discover peer agents (if Swarm is configured)

Transitions:

  • On success → READY
  • On failure → ERROR (with specific error code per subsystem)

Initialization order follows the dependency graph. Memory and Provider initialize first because other primitives may depend on them.


The agent loop is running. The agent processes messages, reasons with LLMs, executes tools, and coordinates with peers.

Behavior:

  • Messages from Channels are routed to Identity for processing
  • Provider handles LLM inference requests
  • Tool and Skill executions run within Sandbox and Policy constraints
  • Memory is continuously read and written
  • Swarm coordination is active (if configured)
  • Heartbeat notifications are sent to the Operator (default: every 30s)

Transitions:

  • On claw.shutdownSTOPPING
  • On unrecoverable error → ERROR

The claw.status method returns "ready" during this phase.


The agent drains in-flight operations and releases resources in reverse initialization order.

Steps:

  1. Stop accepting new messages from Channels
  2. Drain in-flight tool executions (wait up to configured timeout)
  3. Close Channel connections gracefully
  4. Flush Memory (persist pending writes)
  5. Stop Sandbox runtimes
  6. Disconnect from Providers

Transitions:

  • On success → STOPPED (terminal)
  • On timeout or failure → STOPPED (with warnings logged)

The claw.shutdown message accepts a drain_timeout_ms parameter. In-flight operations that exceed this timeout are terminated.


An error occurred that prevents normal operation.

Behavior:

  1. Log error details with full context (primitive, method, parameters)
  2. Attempt recovery if the error is retryable:
    • Provider unavailable (-32020) → retry with exponential backoff
    • Memory backend error (-32030) → retry
    • Peer unreachable (-32040) → retry
  3. If recovery succeeds → return to READY
  4. If unrecoverable → transition to STOPPING

Unrecoverable errors:

  • Manifest invalid (-32060)
  • All providers exhausted (no fallback available)
  • Sandbox runtime crashed without restart capability
  • Policy configuration contradiction

Each state transition emits a structured event for Telemetry:

{
"event": "lifecycle.transition",
"timestamp": "2026-02-23T13:00:00.000Z",
"data": {
"from": "STARTING",
"to": "READY",
"duration_ms": 1250,
"primitives_initialized": [
"identity", "provider", "channel",
"tool", "sandbox", "policy"
]
}
}

These events enable operators to monitor startup times, detect degradation, and alert on repeated ERROR transitions.


StateEntry ConditionKey ActivityExit
INITclaw.initialize receivedValidate manifestSTARTING or ERROR
STARTINGManifest validAllocate resourcesREADY or ERROR
READYResources initializedAgent loop runningSTOPPING or ERROR
STOPPINGShutdown requested or errorDrain and releaseSTOPPED
ERRORAny failureRetry or escalateREADY or STOPPING