← All use cases

Voice API that doesn't pretend long audio is fast

For product engineers, backend teams, and CTOs building audio features into their products.

The problem

Most transcription APIs fake synchronous responses for long audio — your request hangs for 30 seconds, then either times out or returns garbage. You end up building your own queue, retry logic, and result polling. That's 3 months of infra work just to get transcription working.

How Orpheus helps

Orpheus gives you the infrastructure: sync for short clips (instant), explicit async for long recordings (visible queue + progress), recoverable uploads for mobile clients, and a guaranteed output shape on every completed job. You ship voice features in days, not months.

  • Sync endpoint for clips under 8MB
  • Async jobs with chunk progress and merge
  • Recoverable upload sessions for weak networks
  • Webhooks for real-time notifications
  • Stable output: text + segments + VTT on every job

Common questions

How is this different from the OpenAI Whisper API?

Whisper API gives you raw transcription with a 25MB limit. Orpheus adds upload recovery, explicit async processing, chunk progress, VTT generation, and a stable output contract — the infrastructure you would otherwise build yourself.

What languages does the API support?

100+ languages with automatic detection. Specify a language code for better accuracy on known-language audio.

Is there a free tier for development?

Yes. The free tool works without an API key for testing. Pro plan ($16/mo) includes API access with unlimited transcription for production use.

Ready to get started?

Try the free tool above, or explore our API for production voice workflows.