OpenAI has presented three new voice models that work in real time at the same time the user speaks to speed up the completion of tasks, translate conversations and speeches and provide transcriptions.
GPT-Realtime models are designed for developers to create new “voice applications” that offer audio experiences in real time, instead of reacting to user requests.
For the company, it is a step forward in interaction with agents, which requires them to understand the context of the conversation they have with people at all times, to adapt to changes that may arise. To achieve this, it has launched three new voice models, belonging to the GPT-Realtime family, as reported in a statement.
GPT-Realtime-2 offers a reasoning at the level of GPT-5 to handle more complex requests (such as analyzing a request, calling tools, or handling corrections or interruptions) while continuing natural conversation.
This model is one GPT-Realtime-Translatereal-time translation, which translates speech from More than 70 input languages to 13 output languagesmaintaining the user’s rhythm. OpenAI has developed it to “create live multilingual voice experiences” in customer service, education, events or media, among others.
Also GPT-Realtime-Whispera new system of speech to text conversion in real time with low latency, which transcribes speech while the user is speaking.
“The models we are launching transform audio in real time, moving from a simple call and response system to voice interfaces that truly can perform tasks: listen, reason, translate, transcribe and act as a conversation unfolds,” the company said.