The request path
Authenticate
A client calls the API with an
Authorization: Bearer credential - a
kt_live_… API key from your backend, or a short-lived
session token from an end user’s browser.Check the budget
Karta validates the credential and checks your org’s
budget. If a cap is already exhausted it
returns 402 Payment Required before running anything - no surprise bills.
Resolve the session & release
Karta resolves the session (creating
one if needed), confirms it belongs to your org, and pins the turn to the
project’s currently active release.
Run the harness in isolation
Karta hands the turn to the harness running in an isolated, per-session
sandbox. The harness runs its agentic loop - tools, MCP, memory - and emits
typed events.
Stream back
Karta relays those events to the caller as they happen (SSE), pausing for
approval prompts when the agent needs permission to act.
Meter
When the turn completes, Karta records token and cost
usage against your budget.
Per-session isolation
Every session runs in its own microVM sandbox - a hardware-isolated boundary, not a shared container. One tenant’s agent cannot see another’s filesystem, processes, or memory, and a misbehaving or prompt-injected agent is confined to its own short-lived environment. The sandbox is created for the session and torn down after it, so nothing leaks between users or between runs.Embedding Karta in your own process instead? See the two isolation models in
Multi-tenancy.
Running agents vs. managing them
Karta is split into two planes, and the separation is deliberate - it’s exactly the boundary security and platform teams should look for in anything that runs agent code:Data plane - runs your agents
The request path: sessions, harness execution in isolated sandboxes, release
serving, streaming, and request-time budget enforcement. This is where
agent code actually runs.
Control plane - manages your account
The system of record: identity and team roles, API keys, usage metering and
budgets, billing, BYOK provider keys, outbound webhooks, and the audit log.
Karta delegates; it doesn’t duplicate
The single most important design choice: the harness is the source of truth for conversation history, persistence, resumption, and tool/MCP integration. Karta keeps no second copy. A session is a lightweight handle - metadata, participants, the current agent, pending approvals - not a message store. That’s why there are no sync races, why resuming a session just continues where it left off, and why an example from Claude Code’s or OpenCode’s own docs runs unchanged on Karta.Streaming is the primitive
Every entry point is event-streamed. A non-streaming request is just an accumulated stream - the same typed events (text, tool use, reasoning, approvals, errors) folded into one response. Build real-time UIs directly, or collect the final result; it’s the same underlying model.Read next
Harness applications
What you actually ship, and how a harness is detected.
Releases
Immutable snapshots, atomic activation, and instant rollback.
Multi-tenancy
Isolation models for embedded and hosted deployments.
Streaming events
The typed event model every surface is built on.