Skip to main content
Guardrails let you enforce a content policy on every request before it reaches the model. Detection is rules-based (regex) so it adds no model call and no measurable latency, and everything fails open — a config or parse error always allows the request, never silently blocks live traffic. Guardrails are strictly opt-in: an org with no policy has zero enforcement. Configure them on the Guardrails page. The built-in live-test sandbox runs your policy over sample text so you can see exactly what would fire before you save.

What it detects

CategoryDetectsDefault action
EmailEmail addressesRedact
PhonePhone numbers — separated and bare 10-digit, optional country codeRedact
SSNUS social-security numbersBlock
Credit card13–16 digit card numbersBlock
API key / secretsk-… / pk-… style secretsBlock
NameExplicit self-disclosure (“my name is …”, “Name: …”)Tokenize
Prompt injection“ignore previous instructions”, “reveal your system prompt”, …Block
Content safetyHate / violence / self-harm / sexual (conservative keyword heuristic)Flag / Block
Custom rulesYour own regex patternsYour choice

Actions

Each category resolves to one of three actions:
ActionEffect
RedactReplace the match with ⟦LABEL·REDACTED⟧ before forwarding. The model never sees the PII.
TokenizeReplace with a stable ⟦LABEL·TOKEN⟧ placeholder.
BlockRefuse the request with 403 — the model is never called.
Redaction fails closed: if a redacted body can’t be re-serialised, the request is blocked rather than forwarded with the PII intact.

Scope: org default + per-project override

Set an organisation default that applies everywhere, then optionally override it per project (workflow). A project with no override inherits the org default; a project with an override uses its own full policy. Resolution at the gateway is:
project override  →  org default  →  off (opt-in)

Block this user

From any trace in Traces, Block this user adds that end-user to your org’s blocklist; the gateway then 403s every request carrying their x-orbitrage-end-user-id. Open a blocked user’s trace again and the action flips to Unblock this user. (Identify the end-user per request with the SDK’s user_id / the x-orbitrage-end-user-id header — see per-user attribution.)

What’s recorded

Every request carries its guardrail outcome on the routing_steps row, so it shows in Traces and analytics:
SignalMeaning
guardrail_blockedThe request was refused by policy.
pii_detected / pii_typesPII was found, and which kinds.
pii_redaction_appliedRedaction/tokenization ran.
orbitrage_actions_appliedThe exact actions that fired (e.g. pii.ssn:block).
Blocked requests surface as Guardrails Triggered security incidents on Overview and Traces, and the blocked trace records the end-user who triggered it.
Want to verify enforcement end-to-end? Send a request whose body contains a test SSN (123-45-6789) or an injection phrase — with a policy active you’ll get a 403 guardrail_blocked; an email or phone returns 200 with the value redacted.