Deepgram audio is included with Orbitrage as a managed service: your
prepaid credits cover it at the provider rate plus the standard 2.5% infra
markup. You can also bring your own audio provider via BYOK.
Speech-to-text (transcription)
Use the OpenAI transcription endpoint. Setmodel to a Deepgram speech model
(default nova-3):
response_format=verbose_json to receive Deepgram’s full payload (words,
timestamps, confidence).
Text-to-speech
Use the OpenAI speech endpoint. Setmodel (or voice) to a Deepgram Aura
voice (default aura-2-thalia-en). The audio streams back for low latency:
response_format maps to a Deepgram container/encoding: mp3 (default), wav,
opus, flac, aac.
Models
| Model | Type | Use |
|---|---|---|
nova-3 | Speech-to-text | Fastest accurate transcription (default) |
nova-3-multilingual | Speech-to-text | 30+ languages |
nova-3-medical | Speech-to-text | Clinical vocabulary |
nova-2 | Speech-to-text | Cheaper general-purpose |
aura-2-thalia-en | Text-to-speech | Natural English voice (default) |
aura-2-*-en | Text-to-speech | Other Aura-2 voices |
Billing & tracing
Every audio call records arouting_steps row with tier: "audio",
provider: "deepgram", the model, the exact cost, and latency — so it appears
in your dashboard analytics and in the workflow trajectory graph next to your
chat, tool, and image calls. Speech-to-text is billed per minute of processed
audio; text-to-speech per 1,000 characters synthesized (both + 2.5% markup).