Architecture
Built on proven
distributed primitives.
PlexRun is not a new category of infrastructure. It is a careful composition of battle-tested distributed systems patterns — event queues, durable state, idempotent execution — applied specifically to AI workflow orchestration.
System data flow
Layer-by-layer breakdown
SDK & API Layer
Your code defines workflows using decorators or YAML. The SDK validates the step graph locally before sending to the API. Authentication is token-based with project-scoped API keys.
Orchestration Engine
The orchestration engine parses the workflow DAG, resolves step dependencies, and dispatches steps to the worker queue in topological order. Each step transition is written to durable state before dispatch — so a crashed worker never loses progress.
Message Queue
Steps are dispatched as messages to a durable FIFO queue. Visibility timeout prevents double-execution. Failed steps after N retries are routed to a DLQ for inspection. Fan-out creates N parallel messages from a single step output.
Worker Pool
Workers are ephemeral, single-purpose processes that pull from the queue, execute one step, and terminate. Each worker runs in isolation with no shared memory. LLM calls are made from within the worker with automatic token tracking.
State & Checkpoint Store
After each step completes, its output is written to the checkpoint store with a conditional write — ensuring idempotency. If a step is retried, the stored output is returned directly without re-executing. Step outputs are inputs to downstream steps.
Observability Pipeline
Every step emits a trace event: start time, end time, model used, tokens in/out, cost, status, and error (if any). Traces are stored per-execution and queryable via the dashboard or API. OpenTelemetry export allows routing to your existing observability stack.
Design principles
Event-driven execution
Every step emits and consumes events. This decouples execution from orchestration — the orchestrator doesn't wait for steps; it reacts to their completion events.
Checkpoint-based durability
Workflow state is written to durable storage after every step. Worker crashes, network failures, and LLM timeouts all recover automatically from the last checkpoint.
Idempotent by design
Every step has a content-addressed idempotency key. Retrying a step that already succeeded returns the stored output — no duplicate LLM calls, no side-effect replay.
Serverless-first compute
Workers are ephemeral and auto-scaled. You pay for execution time only. No always-on fleet to manage. Concurrency scales with queue depth automatically.
Want to go deeper?
Read the technical docs or request a live architecture walkthrough.