Models

Whisper

OpenAI's open-source speech-to-text model. The default for transcription.

Whisper is the de facto open-source speech-to-text model. Multiple sizes (tiny → large), multilingual, available via OpenAI API or self-hosted. For production voice applications, Deepgram and AssemblyAI are common alternatives that beat Whisper on latency for streaming use cases.

Related terms

Voice agent
Multi-modal

Building with Whisper?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →