Guardrails

Guardrails let you enforce a content policy on every request before it reaches the model. Detection is rules-based (regex) so it adds no model call and no measurable latency, and everything fails open — a config or parse error always allows the request, never silently blocks live traffic. Guardrails are strictly opt-in: an org with no policy has zero enforcement. Configure them on the Guardrails page. The built-in live-test sandbox runs your policy over sample text so you can see exactly what would fire before you save.

What it detects

Category	Detects	Default action
Email	Email addresses	Redact
Phone	Phone numbers — separated and bare 10-digit, optional country code	Redact
SSN	US social-security numbers	Block
Credit card	13–16 digit card numbers	Block
API key / secret	`sk-…` / `pk-…` style secrets	Block
Name	Explicit self-disclosure (“my name is …”, “Name: …”)	Tokenize
Prompt injection	“ignore previous instructions”, “reveal your system prompt”, …	Block
Content safety	Hate / violence / self-harm / sexual (conservative keyword heuristic)	Flag / Block
Custom rules	Your own regex patterns	Your choice

Actions

Each category resolves to one of three actions:

Action	Effect
Redact	Replace the match with `⟦LABEL·REDACTED⟧` before forwarding. The model never sees the PII.
Tokenize	Replace with a stable `⟦LABEL·TOKEN⟧` placeholder.
Block	Refuse the request with `403` — the model is never called.

Redaction fails closed: if a redacted body can’t be re-serialised, the request is blocked rather than forwarded with the PII intact.

Scope: org default + per-project override

Set an organisation default that applies everywhere, then optionally override it per project (workflow). A project with no override inherits the org default; a project with an override uses its own full policy. Resolution at the gateway is:

project override  →  org default  →  off (opt-in)

Block this user

From any trace in Traces, Block this user adds that end-user to your org’s blocklist; the gateway then 403s every request carrying their x-orbitrage-end-user-id. Open a blocked user’s trace again and the action flips to Unblock this user. (Identify the end-user per request with the SDK’s user_id / the x-orbitrage-end-user-id header — see per-user attribution.)

What’s recorded

Every request carries its guardrail outcome on the routing_steps row, so it shows in Traces and analytics:

Signal	Meaning
`guardrail_blocked`	The request was refused by policy.
`pii_detected` / `pii_types`	PII was found, and which kinds.
`pii_redaction_applied`	Redaction/tokenization ran.
`orbitrage_actions_applied`	The exact actions that fired (e.g. `pii.ssn:block`).

Blocked requests surface as Guardrails Triggered security incidents on Overview and Traces, and the blocked trace records the end-user who triggered it.

Want to verify enforcement end-to-end? Send a request whose body contains a test SSN (123-45-6789) or an injection phrase — with a policy active you’ll get a 403 guardrail_blocked; an email or phone returns 200 with the value redacted.

Get Started

Core Concepts

SDKs

Integrations

Examples

Dashboard

Platform

Account & Billing

What it detects

Actions

Scope: org default + per-project override

Block this user

What’s recorded

​What it detects

​Actions

​Scope: org default + per-project override

​Block this user

​What’s recorded

What it detects

Actions

Scope: org default + per-project override

Block this user

What’s recorded