Multimodal workflow

Every modality goes through the same OpenAI-compatible client and the same orb_ key. Calls made under one run are grouped into a single workflow in your dashboard — so a multi-step agent that thinks, searches, generates an image, and speaks shows up as one trajectory graph with a node (and cost) per step.

Each API key is bound to a workflow (create one in the dashboard, then use a key attached to it). Every call that key makes is traced under that workflow — you don’t pass a workflow id, and a client can’t reassign it.

Connect

pip install -U orbitrage openai      # Python
npm install orbitrage@latest openai  # Node
export ORBITRAGE_API_KEY=orb_...      # a key bound to your workflow

A full multimodal run

The snippet below does, in one run: text (reasoning) → tools (web search) → image (generation) → speech-to-text → text-to-speech. Pass a shared x-orbitrage-run-id so the steps group into one trajectory.

import os, orbitrage, uuid
orbitrage.init(os.environ["ORBITRAGE_API_KEY"], user_id="customer_42")

from openai import OpenAI
run = uuid.uuid4().hex
client = OpenAI(default_headers={"x-orbitrage-run-id": run})

# 1) text — let the router pick the cheapest capable model
plan = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Plan a 3-step launch tweet."}],
    max_tokens=512,
)

# 2) tools — managed web search, executed server-side (pin a model, not "auto")
research = client.chat.completions.create(
    model="gpt-oss-20b",
    messages=[{"role": "user", "content": "Find one recent LLM pricing fact."}],
    tools=["tavily_orbitrage"],
)

# 3) image
img = client.images.generate(model="gpt-image-2", prompt="a minimalist orange orbit ring")

# 4) speech-to-text
with open("clip.wav", "rb") as f:
    tx = client.audio.transcriptions.create(model="nova-3", file=f)

# 5) text-to-speech
speech = client.audio.speech.create(model="aura-2-thalia-en", voice="aura-2-thalia-en", input=plan.choices[0].message.content)
speech.stream_to_file("out.mp3")

import { orbitrage } from "orbitrage";
await orbitrage.init({ apiKey: process.env.ORBITRAGE_API_KEY, userId: "customer_42" });

import OpenAI from "openai";
import fs from "node:fs";
import { randomUUID } from "node:crypto";
const run = randomUUID();
const client = new OpenAI({ defaultHeaders: { "x-orbitrage-run-id": run } });

const plan = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Plan a 3-step launch tweet." }],
  max_tokens: 512,
});

const research = await client.chat.completions.create({
  model: "gpt-oss-20b",
  messages: [{ role: "user", content: "Find one recent LLM pricing fact." }],
  tools: ["tavily_orbitrage"],
});

const img = await client.images.generate({ model: "gpt-image-2", prompt: "a minimalist orange orbit ring" });

const tx = await client.audio.transcriptions.create({ model: "nova-3", file: fs.createReadStream("clip.wav") });

const speech = await client.audio.speech.create({
  model: "aura-2-thalia-en",
  voice: "aura-2-thalia-en",
  input: plan.choices[0].message.content ?? "Launch ready.",
});
await fs.promises.writeFile("out.mp3", Buffer.from(await speech.arrayBuffer()));

What you see

Open the workflow in app.orbitrage.ai/workflows and the latest run renders as one graph — a node per call, newest run first, each labelled with its provider and modality:

Node	Tier	Provider	Billed
text / plan	`basic`–`high`	routed (open-weight)	per token (+2.5%)
text / plan	`frontier`	your own key (BYOK)	$0 — your provider bills you
tools / search	routed	Orbitrage-managed	tokens + tool price (+2.5%)
image	`image`	Azure `gpt-image-2`	per image tokens (+2.5%)
speech-to-text	`audio`	Deepgram `nova-3`	per minute (+2.5%)
text-to-speech	`audio`	Deepgram `aura-2`	per 1k chars (+2.5%)

Every node records cost_usd split into provider cost + the 2.5% infra margin, and the org’s credit balance is debited by exactly the sum — so the dashboard’s per-run total matches what you were charged to the cent. BYOK nodes record cost_usd = 0 and drop out of that sum; any managed tools they invoked still count, because those ran on Orbitrage’s pooled tool keys. See Observability and Models.

Get Started

Core Concepts

SDKs

Integrations

Examples

Dashboard

Platform

Account & Billing

Multimodal workflow

Connect

A full multimodal run

What you see

​Connect

​A full multimodal run

​What you see

Connect

A full multimodal run

What you see