Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

OpenAI upgrades its transcription and voice-generating AI models


Openai brings new transcription and voice models to API, which is improving the company’s previous releases.

For Openai, the models are compatible with a wider “Agentic” Vision: automated systems that can perform tasks independently on behalf of users. The “agent” concept can be in debateHowever, the head of Openai product, Olivier Godent described a conversation with a conversation that could talk to the customers of a business.

“We will see more agents opened in the coming months,” he said. “And thus the general topic helps customers and adventures that are useful and accurate to customers and developers.”

Openai, the new text speaker, GPT-4O-mini-mini-tts does not provide only more nuanced and realistic speeches, but the previous gene is more “steering” than models that synthesize the previous gene. Tərtibatçılar GPT-4O-mini-mini-mini-mini-mini-mini-tts, məsələn, “dəli elm adamı kimi danışın” və ya “ağıllılıq müəllimi kimi rahat bir səs istifadə et”.

Here’s the “real crime style”, weathered sound:

And here a woman’s “professional” sound example:

Jeff Harris, who is a member of the product workers in Openai, said that in Techcrunch, the purpose of the developers and the “experience” and “context”.

“Different contexts, only apartment, monotonous sound,” Harris said. “If you are in practice of customer support, you can actually have the emotion in the sound because it makes a mistake … our great belief, it’s not only the developers and users are talked about, but not.”

When it comes to Openai’s new speech and textual models, GPT-4O-Transcriber and GPT-4O-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-transcript effectively replace the company Whisper transcription model. New models, brands, branded and various speeches, even in chaotic environments and Epola claims can be better, “in various, high quality audio databases.

Harris added that they are also unlikely to be hapled. Whispering words tend to fabricate – And even all the passages – in conversations, everything is presented to transcripts in medical treatment imagined from race comments.

“[T]HESE models have improved a lot of whisper on this front, “he said. [in this context] means that the words of the models are exactly [and] does not fill the details they do not hear. “

Your march may vary depending on the transcripted language.

According to the internal criteria of Openai, GPT-4O-Transcribe approaches the “word error rate” for a more accurate of two transcriptions, Tamil, Telugu, Malayalam and Kannada, “120%). This will be different from each 10 words from the model from one human transcript in these languages.

Openai transcription results
Results in terms of openai transcription.Photo credits:Open

During a tradition, Openai does not plan to open new transcription models. Company Historically released new versions of whiskey For commercial use under the MIT license.

Harris, GPT-4o-transcript and GPT-4o-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini-mini transcript are not candidates.

“[T]Hey, only in your laptop locally in your laptop, “He continued.”[W]E, we want to make sure that we have left the things in the open source, we are thinking and we have a model who really respects this special need. And we think that the latest user devices are one of the most interesting cases of open source models. “

Updated 20 March 2025, 11:54 AM to clarify pt The Word has updated the benchmark results schedule around the error rate and with a more recent version.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *