Skip to main content
A provider outage shouldn’t be your outage. The Outage Shield detects when a provider (Anthropic, an Azure deployment, xAI, …) starts failing — with cross-org statistical confidence — and reroutes affected traffic to the best healthy model until it recovers. You detect outages faster than anyone and turn a hard-down into a non-event.

How it works

1

Cross-org detection

The engine watches upstream outcomes per provider. It declares an outage only when, inside a rolling window, failures and the number of distinct affected orgs and the failure rate all cross their thresholds — so a 10-second blip or one bad prompt never trips it.
2

Substitute to a healthy model

Affected requests reroute to the best healthy model of the same (or higher) tier from a different provider. The substitute is billed at its own price.
3

Recover automatically

After a cooldown the engine probes the provider; the first success closes the incident and rerouting stops. A Slack alert fires on both trip and recovery.

Policy

Set your policy on the Outage Shield page:
SettingDefaultMeaning
on_outagesubstitutesubstitute reroutes to a healthy model; fail surfaces the provider error (for workloads that must have the exact pinned model).
BYOK → pooledoffRerouting a BYOK call to pooled spends your Orbitrage credits, so it’s opt-in. Off = a BYOK call fails during its provider’s outage.
Slack alertsonNotify your workspace on trip + recovery.
A per-request x-orbitrage-outage-override: fail header forces a hard-fail for one call, regardless of policy.

Transparency

Substitution is never silent. Every rerouted call is flagged:
HeaderValue
X-ScaleASAP-Failover1 when the call was rerouted.
X-ScaleASAP-Failover-FromThe provider we rerouted away from.
In the dashboard, substituted steps record failover_triggered, failover_from_provider, and failover_substituted_model, and active provider incidents show on the Outage Shield page. The originally-requested model is preserved as primary_model_attempted.