Skip to main content
Orbitrage runs as two planes over one regional data tier:
  • Data plane — carries your production LLM traffic.
  • Control plane — serves the dashboard and analytics.
Keeping them separate means dashboard queries never slow your LLM calls, and the LLM path stays thin.

The pieces

SDK

A ~5-line wrapper (orbitrage) that points your OpenAI-compatible client at the gateway and tags every request with a trace id. No background threads, no span exporters.

Gateway

The public, OpenAI-compatible edge at api.orbitrage.ai/v1. Authenticates your key, gates on credits, optionally swaps in your BYOK provider key, forwards to the engine, and records one telemetry row.

Router engine

Private service that scores the prompt, selects a tier and model, and proxies to the provider with a queue, per-provider concurrency caps, and circuit breakers. Never reachable from the public internet.

Dashboard + Intelligence

The control plane: multi-level analytics, the Intelligence layer (anomalies + trajectories), the Ask Analytics assistant, and account/billing.

A request, step by step

1

Authenticate

The gateway resolves your orb_ key (SHA-256 prefix lookup, cached ~5 min) to a user, org, and workflow. Invalid or revoked keys get 401.
2

Gate on credits

The org’s balance is checked against a short-TTL cache. If credits are exhausted, the call returns 402 before any provider is touched.
3

Resolve BYOK

If a saved provider key matches the requested model, the gateway decrypts it (AES-256-GCM) and forwards it, so your provider account is billed instead of pooled credits.
4

Route

The engine scores the prompt, applies any capability ceiling, adjusts tier thresholds by the operator dial, picks the cheapest capable model, and proxies — with a fallback chain if the primary fails.
5

Stream + record

The response streams straight back to your SDK. Internal scaleasap.* events (routing decision, token counts, latencies, cost) are captured, stripped, and written as one row to your data store — off the hot path, so your latency isn’t taxed by the write.

Two sources of truth, unified

Earlier SDK versions exported OpenTelemetry spans to a separate ingest endpoint. The current architecture is simpler: the proxy is the single source of truth. Every byte of every request crosses the gateway, so there’s nothing extra to export — the gateway writes the canonical routing_steps record itself. (Legacy OTLP span endpoints now return 410 Gone.)
This is why the SDK is so thin and dependency-free: it doesn’t collect or ship telemetry. It only points your client at the gateway and adds trace headers. See Observability.

Infrastructure

The gateway and engine run in the same Azure Container Apps environment in East US 2, so the gateway → engine hop stays inside the VNet (~1–5 ms) with no public-internet round-trip. The engine has internal ingress only and verifies a signed edge header, so it can’t be called directly to bypass auth or billing.
ComponentExposureRole
GatewayPublic (api.orbitrage.ai)Auth, credit gate, BYOK, telemetry
Router enginePrivate (VNet only)Scoring, routing, provider fan-out
DashboardPublic (app.orbitrage.ai)Analytics, Intelligence, account
Data tierPrivatePostgres + RLS, per-org isolation

Multi-tenant isolation

Every telemetry row is stamped with the org_id resolved from your API key, and row-level security ensures one org can never read another’s data. The same boundary holds across the dashboard, the Ask Analytics assistant (which pins org_id server-side — the model never sees it), and the MCP server.