OpenAI unveiled three new audio models in 2026 aimed at developers building voice-based applications, with a focus on real-time task completion and more interactive voice agents.
The three models each serve a distinct function. GPT-Realtime-2 is designed to handle complex requests and manage interruptions during live conversations. GPT-Realtime-Translate provides live translation across multiple languages. GPT-Realtime-Whisper delivers instant speech-to-text conversion, suited for use cases such as live captions and note-taking.
Companies already testing the tools include Zillow and Priceline, signaling early enterprise interest in deploying the models for real-world applications.
The release reflects OpenAI’s stated goal of making voice agents more capable of completing tasks in real time, which may expand how businesses integrate spoken-language interfaces into their products and services.
Source: Tech-Economic Times