Introduction

Orbitrage is a convenience layer between your code and the model providers. Point your existing OpenAI-compatible client at it and every call is routed to a cost-appropriate model and traced in your dashboard. No new SDK to learn, no telemetry pipeline to run. It speaks OpenAI format across every modality — chat, image generation, and managed audio (speech-to-text and text-to-speech on Deepgram, included). Name the model; Orbitrage routes, translates, bills, and traces behind the scenes. Two access modes, one API:

Open-weight models (Qwen, DeepSeek, GLM, Kimi, MiniMax, Mistral, Nemotron, Gemma, gpt-oss-*) run on our infrastructure and bill to your Orbitrage credits.
Frontier models (Claude, GPT, Gemini, Grok) run on your own provider key at your negotiated rate — Orbitrage adds $0. See BYOK.

Integrate with one prompt

Don’t want to wire it up by hand? Expand the prompt below, copy it, and paste it into your AI coding agent (Cursor, Claude Code, Copilot, Windsurf, …) with “do this to my codebase.” It detects your language + framework, installs the SDK, and wires every LLM call — minimally and idiomatically.

Copy the integration prompt

Paste into your coding agent

Integrate Orbitrage (LLM router + full observability) into THIS codebase. Orbitrage is
OpenAI-API-compatible: existing OpenAI/Anthropic-style clients just point at
https://api.orbitrage.ai/v1 with an `orb_` key and every call is auto-routed + traced.
Make MINIMAL changes — only the integration, never the app logic. Docs: https://docs.orbitrage.ai

1) INSTALL the latest SDK (match the project language):
   • Python:  pip install -U orbitrage openai
   • Node/TS: npm install orbitrage@latest openai

2) INITIALIZE ONCE at the program entrypoint, before any LLM client is constructed. Read the
   key from env ORBITRAGE_API_KEY (never hardcode). Pass user_id = the end-user/customer id
   (use the real per-request id where available, else a sensible constant):
   • Python: import orbitrage; orbitrage.init(os.environ["ORBITRAGE_API_KEY"], user_id="<end_user_id>")
   • Node:   import { orbitrage } from "orbitrage";
             await orbitrage.init({ apiKey: process.env.ORBITRAGE_API_KEY, userId: "<end_user_id>" });

3) MODELS: prefer a DIRECT open-weight model for predictable behavior — "glm-5.2" (quality) or
   "minimax-m3" (fast/cheap). Use "auto" only to let Orbitrage pick the cheapest model, and
   then keep max_tokens >= 512 (auto may pick a reasoning model that truncates short answers).
   Orbitrage serves open-weight models directly and bills them to credits. The closed frontier
   lines — claude-*, gpt-* (except the open gpt-oss-*), gemini-*, grok-* — are BYOK-ONLY: they
   need an enabled provider key for that vendor on https://app.orbitrage.ai/models, run on the
   real provider endpoint with that key, and cost $0 in Orbitrage credits. Without an enabled
   key they return 403 byok_key_required (never a silent pooled fallback), so do NOT default a
   codebase to them — use an open-weight id or "auto" unless the user says they have a key.
   For Claude, do NOT use the Anthropic SDK — call model="claude-sonnet-4-6" via the OpenAI client.

4) WIRE every LLM client found in the repo:
   • Raw OpenAI SDK (OpenAI() / new OpenAI()): after init(), do NOT set base_url — init points it
     at the gateway and forces the orb_ key automatically (even if OPENAI_API_KEY is set). Only
     change the model id.
   • LangChain (Py):   ChatOpenAI(model="minimax-m3", base_url="https://api.orbitrage.ai/v1",
       api_key=os.environ["ORBITRAGE_API_KEY"], default_headers={"x-orbitrage-end-user-id":"<id>"})
   • LangChain.js:     new ChatOpenAI({ model:"minimax-m3", apiKey:process.env.ORBITRAGE_API_KEY,
       configuration:{ baseURL:"https://api.orbitrage.ai/v1",
                       defaultHeaders:{"x-orbitrage-end-user-id":"<id>"} } })
   • CrewAI:           LLM(model="openai/gpt-oss-20b", base_url="https://api.orbitrage.ai/v1",
       api_key=os.environ["ORBITRAGE_API_KEY"])   # LiteLLM validates names: prefix with
       openai/ so it proxies the id through, or litellm.register_model({...}) to allow any id.
   • Agno:             OpenAIChat(id="minimax-m3", api_key=os.environ["ORBITRAGE_API_KEY"],
       base_url="https://api.orbitrage.ai/v1", default_headers={"x-orbitrage-end-user-id":"<id>"})
   • LlamaIndex:       from llama_index.llms.openai_like import OpenAILike
       OpenAILike(model="minimax-m3", api_base="https://api.orbitrage.ai/v1",
                  api_key=os.environ["ORBITRAGE_API_KEY"], is_chat_model=True)  # stock OpenAI class
                  # rejects non-OpenAI model names — use OpenAILike for open-weight ids/auto.
   • Vercel AI SDK:    createOpenAI({ baseURL:"https://api.orbitrage.ai/v1",
       apiKey:process.env.ORBITRAGE_API_KEY, headers:{"x-orbitrage-end-user-id":"<id>"} });
       then provider.chat("minimax-m3")

5) MANAGED TOOLS (optional — no tool API keys to wire): add reserved names to the request
   tools array and Orbitrage runs them server-side, loops the result back, and returns the
   final answer in ONE call: tavily_orbitrage (web search), serper_orbitrage,
   firecrawl_orbitrage, jina_orbitrage, weather_orbitrage, calculator_orbitrage,
   datetime_orbitrage. Pin a model (not "auto") for tool calls. Mix managed names with your
   own function tools in the same array. Example:
     client.chat.completions.create(model="gpt-oss-20b",
       messages=[{"role":"user","content":"What is 77*77? Use the calculator."}],
       tools=["calculator_orbitrage"])   # one call in, final answer out: "77 times 77 is 5929."

6) MULTIMODAL (same client, same key — all routed, billed, traced):
   • Images:  client.images.generate(model="gpt-image-2", prompt="...")
     gpt-image-2 returns the image as BASE64 in data[0].b64_json (data[0].url is None,
     like OpenAI's gpt-image family) — decode b64_json and save/serve it; do NOT read .url.
   • Speech-to-text:  client.audio.transcriptions.create(model="nova-3", file=open("a.wav","rb"))  # -> .text
   • Text-to-speech:  client.audio.speech.create(model="aura-2-thalia-en", voice="aura-2-thalia-en", input="...")  # voice arg is required by the SDK (Orbitrage picks the voice from the model id) -> audio bytes, .stream_to_file("out.mp3")
   Audio is managed on Deepgram (no extra account); STT billed per minute, TTS per 1k chars.

7) PER-USER (multi-tenant server): switch per request with orbitrage.set_user(id) (Python) /
   orbitrage.setUser(id) (Node), then construct a NEW client so it picks up the new id.

8) VERIFY: run the app; confirm calls succeed and show up at https://app.orbitrage.ai/workflows
   attributed to the user_id. For anything unclear or any edge case, consult https://docs.orbitrage.ai.

Get your orb_ key at app.orbitrage.ai → API Keys, then set ORBITRAGE_API_KEY in your environment before running the agent. Each key is bound to a workflow — every call it makes is traced under that workflow, so create one per app/service.

…or one line by hand

import orbitrage
orbitrage.init("orb_xxx", user_id=current_user.id)   # ← one line

from openai import OpenAI
OpenAI().chat.completions.create(
    model="minimax-m3",                                          # auto for automatic routing
    messages=[{"role": "user", "content": "hi"}],
)

import { orbitrage } from "orbitrage";
await orbitrage.init({ apiKey: "orb_xxx", userId: currentUser.id });   // ← one line

import OpenAI from "openai";
await new OpenAI().chat.completions.create({
  model: "minimax-m3",
  messages: [{ role: "user", content: "hi" }],
});

Pass user_id so you can see usage and cost per customer — it’s your own data about your own users. More →

Managed tools, zero keys

Name a reserved tool in your normal tools array and Orbitrage runs it server-side with our key, loops the result back to the model, and returns the final answer — one request, no tool loop, no tool API keys to manage.

resp = client.chat.completions.create(
    model="gpt-oss-20b",                 # pin a model for tool calls (not "auto")
    messages=[{"role": "user", "content": "What is 77 × 77? Use the calculator."}],
    tools=["calculator_orbitrage"],      # ← just the reserved name; we execute it
)
print(resp.choices[0].message.content)   # "77 times 77 is 5929."

const resp = await client.chat.completions.create({
  model: "gpt-oss-20b",                  // pin a model for tool calls (not "auto")
  messages: [{ role: "user", content: "What is 77 × 77? Use the calculator." }],
  tools: ["calculator_orbitrage"],       // ← just the reserved name; we execute it
});
console.log(resp.choices[0].message.content); // "77 times 77 is 5929."

Available: tavily_orbitrage (web search), serper_orbitrage, firecrawl_orbitrage, jina_orbitrage, weather_orbitrage, calculator_orbitrage, datetime_orbitrage. Mix them with your own function tools in the same array. More →

What you get

Routing

model: "auto" picks the cheapest model that can handle the prompt.

Observability

Every call traced — model, tokens, cost, latency — automatically.

Bring your own key

Frontier models run on your provider key — your rate, $0 from Orbitrage.

Run it from Slack

Ask analytics, create projects and keys, get alerts — without the dashboard.

Start here

Quickstart

First call in 5 minutes.

Your framework

OpenAI, Anthropic, LangChain, CrewAI, and more.

Examples

Tool calling, streaming, per-user attribution.

Get Started

Core Concepts

SDKs

Integrations

Examples

Dashboard

Platform

Account & Billing

Integrate with one prompt

…or one line by hand

Managed tools, zero keys

What you get

Routing

Observability

Bring your own key

Run it from Slack

Start here

Quickstart

Your framework

Examples

​Integrate with one prompt

​…or one line by hand

​Managed tools, zero keys

​What you get

Routing

Observability

Bring your own key

Run it from Slack

​Start here

Quickstart

Your framework

Examples

Integrate with one prompt

…or one line by hand

Managed tools, zero keys

What you get

Start here