OpenAI Brings Faster and Smarter AI Voice Models

OpenAI said Thursday that it has added several new voice intelligence features to its API to help developers build apps that can talk, translate, and transcribe conversations in real time.

The company introduced GPT-Realtime-2, a new voice model designed to create realistic conversational voice interactions. Unlike GPT-Realtime-1.5, the updated model uses GPT-5 class reasoning capabilities to handle more complex user requests.

Translation And Transcription Tools

OpenAI also launched GPT-Realtime-Translate, a real-time translation tool designed to keep pace with live conversations.

Ad Powered By Advergic
  Loading ad . . . 
 Ad - Continue scrolling to read

The feature supports more than 70 input languages and 13 output languages for translated speech responses.

In addition, the company introduced GPT-Realtime-Whisper, a speech-to-text model that provides live transcription as conversations happen.

OpenAI said the new models are designed to move voice interfaces beyond simple call and response interactions by enabling them to listen, reason, translate, transcribe, and take action during conversations.

Enterprise And Safety Focus

The company said the new tools could help businesses expand customer service operations while also supporting industries such as education, media, events, and creator platforms.

OpenAI also acknowledged the potential misuse of AI voice tools and said it has added safeguards to reduce risks linked to spam, fraud, and other harmful activities.

According to the company, the system includes triggers that can stop conversations if they violate harmful content guidelines.

All of the new voice models are available through OpenAI’s Realtime API. GPT-Realtime-Translate and GPT-Realtime-Whisper are billed by the minute, while GPT-Realtime-2 pricing is based on token usage.