Technology
OpenAI unveils three audio models for real-time voice tasks
New GPT-realtime models enable live translation, speech-to-text, and context-aware voice interactions for developers

The launch of the application programming interface (API) moves the ChatGPT maker beyond transcription and chat toward agents that can listen, translate, and act during live conversations. Reuters
OpenAI introduced three new audio models for its developer platform on Thursday, aiming to make voice-based software agents more conversational and capable of completing tasks in real time.
The launch of the application programming interface (API) moves the ChatGPT maker beyond transcription and chat toward agents that can listen, translate, and act during live conversations.
The new models— GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—are now available for testing in the company’s developer playground.
GPT-Realtime-2 is designed to handle more complex requests, call external tools, manage interruptions, and maintain context across longer voice sessions.
The second model supports translation from more than 70 languages into 13 output languages, targeting use cases such as customer support, education, and other real-time communication scenarios.
GPT-Realtime-Whisper provides live speech-to-text capabilities, enabling captions, meeting notes, and workflow updates to be generated as a person speaks.
Companies testing the models include online real estate marketplace Zillow, travel platform Priceline, and European telecommunications firm Deutsche Telekom.
Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens, while GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper is priced at $0.017 per minute.