Skip to main content
Orbitrage is the convenience layer for LLM apps: point your existing OpenAI (or Anthropic) client at our gateway and every call is routed to the best model and traced in your dashboard — cost, tokens, latency, tools, and the full run graph. No SDK rewrite, no OpenTelemetry.
1

Install the latest SDK

The SDK is a thin header-injector; openai is the only peer you need (Orbitrage speaks OpenAI format).
pip install -U orbitrage openai
2

Initialize with your key — and a user id

Call init() once, at the top of your program. Always pass a user_id so every call is attributed to the end-user who triggered it — this is what powers per-user cost, usage, and analytics in the dashboard.
import os, orbitrage

orbitrage.init(
    os.environ["ORBITRAGE_API_KEY"],
    user_id="customer_42",      # attribute calls to THIS end-user
)
Get your orb_ key from app.orbitrage.aiAPI Keys. The SDK points your client at https://api.orbitrage.ai/v1 and injects the key for you — even if you already have OPENAI_API_KEY set.
3

Make a call — pick a model, or let Orbitrage route

Use the OpenAI client exactly as you always have. Name a direct model (recommended while you build — predictable behavior), or use model="auto" to let Orbitrage route to the cheapest capable model.
from openai import OpenAI
client = OpenAI()                       # base_url + key set for you

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",          # direct model — try "grok-4-fast" for speed/cost
    messages=[{"role": "user", "content": "Write a haiku about routing."}],
)
print(resp.choices[0].message.content)
4

See it in the dashboard

Open app.orbitrage.ai/workflows — your call appears with the model, provider, tokens, cost, latency, and (for multi-step agents) the full run graph, all attributed to customer_42.
Orbitrage workflow run graph

Picking a model

You writeWhat happens
model="claude-sonnet-4-6"Pinned — always Claude Sonnet 4.6. Predictable; great for tools + quality.
model="grok-4-fast"Pinned — fast and cheap. A great default for high volume.
model="gpt-4o-mini"Pinned — small, cheap, reliable tool-calling.
model="auto"Auto routing — Orbitrage scores the prompt and picks the cheapest capable model. See Routing.
Start with a direct model while you build, then switch to model="auto" once you want Orbitrage to optimize cost for you. With auto, give reasoning models room — set max_tokens ≥ 512 so the answer isn’t truncated by the model’s internal reasoning budget.

Attributing every call to a user

user_id is the single most useful thing to get right — it unlocks per-user analytics. Set it once for a script, or switch it per request in a server.
# One user for the whole process:
orbitrage.init(api_key, user_id=current_user.id)

# Or switch per request in a long-running server — then build a NEW client
# (already-constructed clients have copied their headers):
orbitrage.set_user(current_user.id)
client = OpenAI()
See Per-user attribution for the server pattern.

Already using a framework?

LangChain, LangGraph, CrewAI, Agno, LlamaIndex, and the Vercel AI SDK all use an OpenAI-compatible client under the hood — point them at the gateway and you get the same routing + tracing. Copy-paste setups:

LangChain

CrewAI

Agno

LlamaIndex

Vercel AI SDK

OpenAI / Anthropic SDK

Next steps

Tool calling

Client tools + hosted managed tools (web search, scrape, calculator) — no keys to wire.

Streaming

Token-by-token streaming, fully traced.

Routing

How auto scores prompts and picks models.

Models

The full catalog of direct model names.