Skip to main content
Karta runs your harness application and exposes it through one uniform API. You push a project; Karta builds an immutable release, serves it, runs each agent turn in an isolated sandbox, streams the result back as typed events, and meters what it costs. This page is the mental model for everything in between.

The request path

1

Authenticate

A client calls the API with an Authorization: Bearer credential - a kt_live_… API key from your backend, or a short-lived session token from an end user’s browser.
2

Check the budget

Karta validates the credential and checks your org’s budget. If a cap is already exhausted it returns 402 Payment Required before running anything - no surprise bills.
3

Resolve the session & release

Karta resolves the session (creating one if needed), confirms it belongs to your org, and pins the turn to the project’s currently active release.
4

Run the harness in isolation

Karta hands the turn to the harness running in an isolated, per-session sandbox. The harness runs its agentic loop - tools, MCP, memory - and emits typed events.
5

Stream back

Karta relays those events to the caller as they happen (SSE), pausing for approval prompts when the agent needs permission to act.
6

Meter

When the turn completes, Karta records token and cost usage against your budget.

Per-session isolation

Every session runs in its own microVM sandbox - a hardware-isolated boundary, not a shared container. One tenant’s agent cannot see another’s filesystem, processes, or memory, and a misbehaving or prompt-injected agent is confined to its own short-lived environment. The sandbox is created for the session and torn down after it, so nothing leaks between users or between runs.
Embedding Karta in your own process instead? See the two isolation models in Multi-tenancy.

Running agents vs. managing them

Karta is split into two planes, and the separation is deliberate - it’s exactly the boundary security and platform teams should look for in anything that runs agent code:

Data plane - runs your agents

The request path: sessions, harness execution in isolated sandboxes, release serving, streaming, and request-time budget enforcement. This is where agent code actually runs.

Control plane - manages your account

The system of record: identity and team roles, API keys, usage metering and budgets, billing, BYOK provider keys, outbound webhooks, and the audit log.
The two are separated by a trust boundary, so the plane that runs agent code holds none of your money-and-identity state. A compromise of a running agent - the highest-risk surface on any agent platform - cannot by itself reach your billing, your team’s credentials, or another tenant’s data. Isolation between what runs and what’s valuable is built in.

Karta delegates; it doesn’t duplicate

The single most important design choice: the harness is the source of truth for conversation history, persistence, resumption, and tool/MCP integration. Karta keeps no second copy. A session is a lightweight handle - metadata, participants, the current agent, pending approvals - not a message store. That’s why there are no sync races, why resuming a session just continues where it left off, and why an example from Claude Code’s or OpenCode’s own docs runs unchanged on Karta.

Streaming is the primitive

Every entry point is event-streamed. A non-streaming request is just an accumulated stream - the same typed events (text, tool use, reasoning, approvals, errors) folded into one response. Build real-time UIs directly, or collect the final result; it’s the same underlying model.

Harness applications

What you actually ship, and how a harness is detected.

Releases

Immutable snapshots, atomic activation, and instant rollback.

Multi-tenancy

Isolation models for embedded and hosted deployments.

Streaming events

The typed event model every surface is built on.