Routing

Send model: "auto" and Orbitrage runs your prompt through a six-step pipeline. The goal: the cheapest model that can actually handle this task.

auto vs. pinning a model

Let Orbitrage route

Pass auto (or router, default, orbitrage). The engine scores the prompt and selects a model for you.

Pin a specific model

Pass any concrete model id (e.g. glm-5.2, DeepSeek-V4-Flash, or a BYOK model like claude-sonnet-4-6). Scoring is skipped — the request goes straight there.

# Routed — cheapest capable model:
client.chat.completions.create(model="auto", messages=[...])

# Pinned — exactly this model, no scoring:
client.chat.completions.create(model="glm-5.2", messages=[...])

# Pinned to a frontier model — requires your own Anthropic key (BYOK), $0 from Orbitrage:
client.chat.completions.create(model="claude-sonnet-4-6", messages=[...])

The six-step pipeline

Normalize

The request (chat, Responses, or legacy completion shape) is normalized to a common internal form, so the rest of the pipeline is uniform.

Score

The prompt gets a difficulty score in [0.05, 0.95]. Two signals compete:

70+ heuristics — fast regex rules (sub-millisecond, no model to load) that lower the score for extraction, formatting, classification, and classic exercises; raise it for reasoning, debugging, strategy, and long/complex prompts.
Explicit annotation — a caller-supplied priority overrides the score.

A web-search signal (asking for live/recent data) bumps the score onto a capable tier.

Capability ceiling

If the call declares a capability type, the score is capped so trivial work can’t escalate to an expensive tier. A formatting task is capped low; reasoning and planning are uncapped.

Dial

A per-deployment dial (0.0–1.0) shifts tier thresholds. Lower = conservative (stay cheap longer); higher = aggressive (escalate sooner).

Select tier + model

The score maps to a tier, then the engine picks a concrete model: a vision-capable model when the prompt has images, a code-biased model for code, the cheapest open model otherwise. Long prompts escalate automatically; trivially simple code de-escalates.

Proxy + fallback

The request is proxied to the provider. On an infrastructure error (5xx, 429, connectivity), a fallback chain of 2–5 models is tried across providers. Client errors (4xx, content filters) do not cascade — they return immediately.

Tiers

Models are grouped by capability and cost. Routing climbs only as high as the prompt needs.

Tier	For	Example models
basic	Formatting, classification, extraction, simple chat	`gpt-oss-20b`, `glm-4.7-flash`, `nemotron-nano-9b-v2`, `gemma-3-4b-it`
mid	Everyday chat and code, moderate reasoning	`DeepSeek-V4-Flash`, `FW-MiniMax-M2.5`, `minimax-m3`, `qwen3-32b`
high	Hard reasoning, serious code, long context	`Kimi-K2.6`, `DeepSeek-V3.2`, `glm-5.2`, `qwen3-235b-a22b-2507`
frontier	The hardest tasks, when the dial or an annotation pushes there	`claude-opus-4-8`, `claude-sonnet-4-6`, `gpt-5.5` — BYOK only
image	Image generation (separate endpoint)	`gpt-image-2`

The frontier tier is BYOK-only. Auto routing considers those models only when your organization has an enabled key for the vendor; with no key it stays on the open-weight models Orbitrage serves and never returns byok_key_required.

See Models for the catalog, pricing, vision support, and context windows.

Reading the routing decision

Every routed call records the model it chose and why. On the dashboard’s Routing page (and each span):

Requested → Routed to — the alias you sent vs. the model used
Tier and priority score — what the prompt scored and where it landed
Signals — the heuristics that fired (e.g. code detected, long prompt)
Fallback chain — the models that would have been tried on failure
Saved — the cost difference vs. a frontier baseline

Orbitrage also returns an X-Orbitrage-Overhead-Ms response header, so you can see exactly how much latency it added on top of the provider.

Forcing behavior

Always use one model

Pin the model id on every call. A concrete id is treated as an explicit pin and skips scoring entirely.

Use a frontier model (GPT, Claude, Gemini, Grok)

Save and enable a key for that vendor on the Models page. The call is forwarded to the real provider with your key, billed by them at your rate, and Orbitrage charges $0. Without an enabled key those models return 403 byok_key_required. See BYOK.

Bias the whole project cheaper or smarter

The operator dial shifts tier thresholds for your deployment. Lower it to keep traffic on cheaper tiers; raise it to escalate sooner.

Get Started

Core Concepts

SDKs

Integrations

Examples

Dashboard

Platform

Account & Billing

auto vs. pinning a model

Let Orbitrage route

Pin a specific model

The six-step pipeline

Tiers

Reading the routing decision

Forcing behavior

​auto vs. pinning a model

Let Orbitrage route

Pin a specific model

​The six-step pipeline

​Tiers

​Reading the routing decision

​Forcing behavior

auto vs. pinning a model

The six-step pipeline

Tiers

Reading the routing decision

Forcing behavior