Audio — speech-to-text & text-to-speech

Audio works through the standard OpenAI Audio API. Point it at Orbitrage and we route it to a managed Deepgram model — no BYOK or extra account needed — bill it (per minute of audio for speech-to-text, per 1,000 characters for text-to-speech), and trace each call alongside your chat and image calls.

Deepgram audio is included with Orbitrage as a managed service: your prepaid credits cover it at the provider rate plus the standard 2.5% infra fee. Unlike frontier chat models, audio is not BYOK — there’s nothing to configure. You can still bring your own audio provider via BYOK if you prefer.

Speech-to-text (transcription)

Use the OpenAI transcription endpoint. Set model to a Deepgram speech model (default nova-3):

import os, orbitrage
orbitrage.init(os.environ["ORBITRAGE_API_KEY"], user_id="customer_42")

from openai import OpenAI
client = OpenAI()

with open("call.wav", "rb") as f:
    tx = client.audio.transcriptions.create(model="nova-3", file=f)
print(tx.text)

import { orbitrage } from "orbitrage";
await orbitrage.init({ apiKey: process.env.ORBITRAGE_API_KEY, userId: "customer_42" });

import OpenAI from "openai";
import fs from "node:fs";
const client = new OpenAI();

const tx = await client.audio.transcriptions.create({
  model: "nova-3",
  file: fs.createReadStream("call.wav"),
});
console.log(tx.text);

# Raw audio body
curl https://api.orbitrage.ai/v1/audio/transcriptions?model=nova-3 \
  -H "Authorization: Bearer $ORBITRAGE_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @call.wav

# …or a remote URL
curl https://api.orbitrage.ai/v1/audio/transcriptions?model=nova-3 \
  -H "Authorization: Bearer $ORBITRAGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://dpgr.am/spacewalk.wav"}'

The response is OpenAI-compatible:

{ "text": "Yeah, as much as it's worth celebrating…", "duration": 25.93, "model": "nova-3", "provider": "deepgram" }

Pass response_format=verbose_json to receive Deepgram’s full payload (words, timestamps, confidence).

Text-to-speech

Use the OpenAI speech endpoint. Set model (or voice) to a Deepgram Aura voice (default aura-2-thalia-en). The audio streams back for low latency:

resp = client.audio.speech.create(
    model="aura-2-thalia-en",   # the Deepgram Aura voice
    voice="aura-2-thalia-en",   # the OpenAI SDK requires `voice`; Orbitrage selects the voice from the model id
    input="Hello from Orbitrage. This voice is managed for you.",
    response_format="mp3",
)
resp.stream_to_file("hello.mp3")

const resp = await client.audio.speech.create({
  model: "aura-2-thalia-en",
  voice: "aura-2-thalia-en", // required by the SDK; Orbitrage uses the Aura model id for the voice
  input: "Hello from Orbitrage. This voice is managed for you.",
  response_format: "mp3",
});
const buf = Buffer.from(await resp.arrayBuffer());
await fs.promises.writeFile("hello.mp3", buf);

curl https://api.orbitrage.ai/v1/audio/speech?model=aura-2-thalia-en \
  -H "Authorization: Bearer $ORBITRAGE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello from Orbitrage."}' \
  --output hello.mp3

response_format maps to a Deepgram container/encoding: mp3 (default), wav, opus, flac, aac.

Models

Model	Type	Use
`nova-3`	Speech-to-text	Fastest accurate transcription (default)
`nova-3-multilingual`	Speech-to-text	30+ languages
`nova-3-medical`	Speech-to-text	Clinical vocabulary
`nova-2`	Speech-to-text	Cheaper general-purpose
`aura-2-thalia-en`	Text-to-speech	Natural English voice (default)
`aura-2-*-en`	Text-to-speech	Other Aura-2 voices

Billing & tracing

Every audio call records a routing_steps row with tier: "audio", provider: "deepgram", the model, the exact cost, and latency — so it appears in your dashboard analytics and in the workflow trajectory graph next to your chat, tool, and image calls. Speech-to-text is billed per minute of processed audio; text-to-speech per 1,000 characters synthesized (both + 2.5% markup).

Get Started

Core Concepts

SDKs

Integrations

Examples

Dashboard

Platform

Account & Billing

Audio — speech-to-text & text-to-speech

Speech-to-text (transcription)

Text-to-speech

Models

Billing & tracing

​Speech-to-text (transcription)

​Text-to-speech

​Models

​Billing & tracing

Speech-to-text (transcription)

Text-to-speech

Models

Billing & tracing