Design and test prompts before they hit production. Supports single prompts and multi-step chains, branch testing, and replaying versions against your own real traffic.
Sessions and versions
A session is a prompt (or chain) you’re iterating on. Each save creates a version, so you always have a history to compare and roll back to. Start from scratch or copy an existing project’s configuration.
Builder
Build a chain of steps, where each step is an LLM call or a tool call:
- Pick a model per step (or
auto), set parameters, and write the prompt.
- Reference earlier output with
{{output}} (previous step) or {{step_N_output}} (by index).
- Add A/B branches that split traffic by percentage, and fallback branches that retry on failure.
Run options:
| Action | What it does |
|---|
| Run once | Execute the chain’s main path once. |
| Run with traffic | Sample the A/B branches and apply fallbacks for a single realistic run. |
| Generate dataset | Run many times (e.g. 10 / 50 / 100) in parallel to collect a results dataset. |
Each step’s result card shows content, model, latency, cost, tokens, and any error.
Compare versions
Pick two versions and diff them:
- Prompt — a line-by-line diff with add/remove counts.
- Config — model, audience, and parameter changes.
- Per-sample outputs — paired results side by side (images render inline).
Simulation (pre-flight)
Switch to Simulation mode to run a version against samples of your real org traffic and see pass/fail, cost vs. baseline, and the latency distribution before you deploy.
When a version wins, promote it. Promotion writes the prompt to the registry and updates the project configuration so production picks it up.
Promoted prompts can be invoked by id at request time using the x-orbitrage-prompt-id header, so your application code doesn’t have to hold the prompt text.
Settings
Per session you can set a name, notes, and the active modality (text / image / audio / video — one non-text modality at a time), or delete the session.