Guardrails let you enforce a content policy on every request before it reaches the
model. Detection is rules-based (regex) so it adds no model call and no measurable
latency, and everything fails open — a config or parse error always allows the
request, never silently blocks live traffic. Guardrails are strictly opt-in: an org
with no policy has zero enforcement.
Configure them on the Guardrails page. The
built-in live-test sandbox runs your policy over sample text so you can see exactly
what would fire before you save.
What it detects
| Category | Detects | Default action |
|---|
| Email | Email addresses | Redact |
| Phone | Phone numbers — separated and bare 10-digit, optional country code | Redact |
| SSN | US social-security numbers | Block |
| Credit card | 13–16 digit card numbers | Block |
| API key / secret | sk-… / pk-… style secrets | Block |
| Name | Explicit self-disclosure (“my name is …”, “Name: …”) | Tokenize |
| Prompt injection | “ignore previous instructions”, “reveal your system prompt”, … | Block |
| Content safety | Hate / violence / self-harm / sexual (conservative keyword heuristic) | Flag / Block |
| Custom rules | Your own regex patterns | Your choice |
Actions
Each category resolves to one of three actions:
| Action | Effect |
|---|
| Redact | Replace the match with ⟦LABEL·REDACTED⟧ before forwarding. The model never sees the PII. |
| Tokenize | Replace with a stable ⟦LABEL·TOKEN⟧ placeholder. |
| Block | Refuse the request with 403 — the model is never called. |
Redaction fails closed: if a redacted body can’t be re-serialised, the request is
blocked rather than forwarded with the PII intact.
Scope: org default + per-project override
Set an organisation default that applies everywhere, then optionally override it
per project (workflow). A project with no override inherits the org default; a project
with an override uses its own full policy. Resolution at the gateway is:
project override → org default → off (opt-in)
Block this user
From any trace in Traces, Block this user adds that
end-user to your org’s blocklist; the gateway then 403s every request carrying their
x-orbitrage-end-user-id. Open a blocked user’s trace again and the action flips to
Unblock this user. (Identify the end-user per request with the SDK’s user_id /
the x-orbitrage-end-user-id header — see per-user attribution.)
What’s recorded
Every request carries its guardrail outcome on the routing_steps row, so it shows in
Traces and analytics:
| Signal | Meaning |
|---|
guardrail_blocked | The request was refused by policy. |
pii_detected / pii_types | PII was found, and which kinds. |
pii_redaction_applied | Redaction/tokenization ran. |
orbitrage_actions_applied | The exact actions that fired (e.g. pii.ssn:block). |
Blocked requests surface as Guardrails Triggered security incidents on Overview and
Traces, and the blocked trace records the end-user who triggered it.
Want to verify enforcement end-to-end? Send a request whose body contains a test SSN
(123-45-6789) or an injection phrase — with a policy active you’ll get a 403
guardrail_blocked; an email or phone returns 200 with the value redacted.