Architecture

Built on proven
distributed primitives.

PlexRun is not a new category of infrastructure. It is a careful composition of battle-tested distributed systems patterns — event queues, durable state, idempotent execution — applied specifically to AI workflow orchestration.

Event-drivenDurable stateIdempotent stepsServerless workersFull observability

System data flow

Layer-by-layer breakdown

SDK & API Layer

Your code defines workflows using decorators or YAML. The SDK validates the step graph locally before sending to the API. Authentication is token-based with project-scoped API keys.

Python SDKTypeScript SDKREST APICLI (plexrun)

Orchestration Engine

The orchestration engine parses the workflow DAG, resolves step dependencies, and dispatches steps to the worker queue in topological order. Each step transition is written to durable state before dispatch — so a crashed worker never loses progress.

DAG schedulerState machineStep routerRetry manager

Message Queue

Steps are dispatched as messages to a durable FIFO queue. Visibility timeout prevents double-execution. Failed steps after N retries are routed to a DLQ for inspection. Fan-out creates N parallel messages from a single step output.

FIFO queueVisibility timeoutDead-letter queueFan-out routing

Worker Pool

Workers are ephemeral, single-purpose processes that pull from the queue, execute one step, and terminate. Each worker runs in isolation with no shared memory. LLM calls are made from within the worker with automatic token tracking.

Serverless workersAuto-scalingIsolated executionLLM client

State & Checkpoint Store

After each step completes, its output is written to the checkpoint store with a conditional write — ensuring idempotency. If a step is retried, the stored output is returned directly without re-executing. Step outputs are inputs to downstream steps.

Durable KV storeIdempotency keysConditional writesStep outputs

Observability Pipeline

Every step emits a trace event: start time, end time, model used, tokens in/out, cost, status, and error (if any). Traces are stored per-execution and queryable via the dashboard or API. OpenTelemetry export allows routing to your existing observability stack.

Execution tracesToken countersCost attributionOpenTelemetry export

Design principles

Event-driven execution

Every step emits and consumes events. This decouples execution from orchestration — the orchestrator doesn't wait for steps; it reacts to their completion events.

Checkpoint-based durability

Workflow state is written to durable storage after every step. Worker crashes, network failures, and LLM timeouts all recover automatically from the last checkpoint.

Idempotent by design

Every step has a content-addressed idempotency key. Retrying a step that already succeeded returns the stored output — no duplicate LLM calls, no side-effect replay.

Serverless-first compute

Workers are ephemeral and auto-scaled. You pay for execution time only. No always-on fleet to manage. Concurrency scales with queue depth automatically.

Want to go deeper?

Read the technical docs or request a live architecture walkthrough.

Read the docs Request a walkthrough

Built on provendistributed primitives.