Chat Completions

Base URL: https://api.orbitrage.ai/v1 · Auth: Authorization: Bearer orb_…

POST /v1/chat/completions

This is the OpenAI Chat Completions API. The only Orbitrage-specific behavior is the model field: pass auto to route, or a concrete id to pin.

Request

string

required

auto (or router / default / orbitrage) to let Orbitrage pick the cheapest capable model, or any model id (e.g. glm-5.2, gpt-5.4, DeepSeek-V4-Flash) to pin one. See Models.

array

required

The conversation, in OpenAI format (role + content). Supports text, image content blocks (for vision models), and tool/assistant messages.

boolean

default:"false"

Stream the response as server-sent events.

array

Tool/function definitions, OpenAI format. Routed to a tool-capable model.

number

Standard sampling controls (temperature, top_p, max_tokens, etc.) pass through to the provider.

Example

curl https://api.orbitrage.ai/v1/chat/completions \
  -H "Authorization: Bearer $ORBITRAGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "system", "content": "You are concise."},
      {"role": "user", "content": "Name three uses for a router."}
    ]
  }'

from openai import OpenAI            # after orbitrage.init(...)
client = OpenAI()
resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Name three uses for a router."}],
)
print(resp.choices[0].message.content)

const resp = await client.chat.completions.create({   // after orbitrage.init(...)
  model: "auto",
  messages: [{ role: "user", content: "Name three uses for a router." }],
});
console.log(resp.choices[0].message.content);

Response

A standard OpenAI chat.completion object. The model that actually served the request is in the model field; the routing decision, cost, and savings are recorded to your telemetry and visible in the dashboard.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "DeepSeek-V4-Flash",
  "choices": [
    { "index": 0, "message": { "role": "assistant", "content": "..." }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 24, "completion_tokens": 38, "total_tokens": 62 }
}

Streaming

With stream: true you receive the usual OpenAI chat.completion.chunk events. Orbitrage transforms the stream so your SDK gets a clean OpenAI feed:

internal routing/summary events are captured and stripped (the SDK never sees unknown event types), and
provider frames with empty choices (e.g. content-filter pre-frames) are rewritten so SDKs don’t crash on choices[0].

Read X-Orbitrage-Overhead-Ms and X-Orbitrage-BYOK from the response headers to see routing overhead and whether your own key was used.

Overview

Endpoints

Request

Example

Response

Streaming

​Request

​Example

​Response

​Streaming

Request

Example

Response

Streaming