Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Move over, Alexa: Amazon launches new realtime voice model Nova Sonic for third-party enterprise development


Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Amazon is known as the best e-commerce giant and then is the Alexa EU sound assistant product at the bottom of the list of remarkable suggestions Amazon Nova and Amazon’s investment anthropic part of the investment took a great intelligence update last month.

Now Alexa will have to prepare a place for a new Amazon Voice AI sister: Today the company applies Amazon Nova SonicA new baseline model designed to create timely time, naturalist, spoken sound interactivity using the Bed of the Amazon’s Web Platform.

Now there are two-way streaming application programming interface (API). And in fact, Amazon is already a speech encoder – a speech encoder – ALEXA +.

“This allows us to benefit different use of different use, while continuing to develop both systems on the basis of both customer feedback and technological progress,” he said.

Clear use include customer support and service, management, data acquisition and entertainment.

Uniform approach

Nova Sonic Sound solves the main problem in AI: splitting technologies.

Traditionally, according to Rohit Prasadi, SVP and artificial intelligence (AGI), the audible common intelligence in the Amazon general intelligence (AGI), according to artificial general exploration (AGI), Dance and spoken synthesis are required.

These complexity often result in the robot, unnatural interaction and growing development of development.

Now Sonic is trying to improve this situation by combining all three different models together.

The Prasad model was the main innovation: “Nova Sonic, three traditionally, not a separate model-speech, text concept and text and text and ‘what’ and ‘.

By holding an acoustic context, tons, cadence and style-nova Sonic help protect the nuances of human conversation.

To recognize the subtleties and questions of live, two-sided sound talks

One of the possibilities of Nova Sonic is the ability to manage live, two-sided conversations. When users respond or caninate if they stop in human speech or subtle behaviors, or in response to the intervene while protecting the context.

“Here, true progress has a real-time, interactive, low retard sound interaction, which will cut the AI ​​medium sentences, and it will still protect the context,” said Prasad. This feature is relevant in scenarios such as customer service, especially responsible and adaptation.

Nova Sonic is also designed to combine without problems with other systems. APIs automatically generate transcripts of the conversational entry that can be used to trigger or interact with ownership instruments. This allows companies to build an AI agents that can perform tasks such as ordering meetings, live data purchase or complex customer surveys.

“You can use Nova Sonic via Amazon Bedrock and combine with any means or property sources, even visual ones, even visually ones,” he said. It is suitable for a suitable area for a wide range of flexibility, enterprise and entertainment, education and travel.

Evaluation performance and industry comparisons

Nova Sonic, Openai’s GPT-4O and Google’s Gemini was rated against other real-time sound models, including Flash 2.0. In the General Eval data set, using a man’s voice, American English-single conversations gained a 69.7% increase rate of 69.7% after earning the gem for English single conversations. Similar boilers were seen with women and English English.

Prasad Nova’s strong performance in the main language markets of Sonic: “Nova Sonic is currently in the best class in the United States and English English, even in real time in both conversational natural and accurate.” He is the best of our best, only two other model-GPT-4Os in real time and GPT-4O mini-in-real-time combinations and generation of generation. This place is still very early and very difficult. “

Working with multilingual opportunities and noisy environments

With the recognition of speech, Nova Sonic is also superior in a multilingual and real world environment. 4.2% (WER) in multilingual Liblispeech benchmark (WER) is GPT-4O transcription, which is more than 36% between a word error (WER), English, French, German, Italian and Spaniards. Noisy, in numerous environments (measured using AMI Benchmark), Nova Sonic, GPT-4O has improved by 46.7% on the transcription.

Expressive sounds and language expansion

Currently, the model supports both men and women and femininity sounds in both men and English. Amazon noted that additional emphasis and languages ​​are in development and will be released in future updates.

Low delay and business friendly

The speed and cost of the cost are part of the application. The third party’s price shows NOVA Sonic shows, Nova Sonic, Openai’s GPT-4O for 1.18 seconds compared to 1.18 seconds and 1.41 seconds and 1.41 seconds and 1.41 seconds.

From the price point of view, the Amazon is the position of Nova Sonic as a solution ready for an enterprise. “GPT-4o has 80% of real-time and resonates with enterprises moving to the placement of superior price performance,” he said.

Early adoption between sectors

According to Amazon, companies in various sectors have already started using Nova Sonic.

The connection center that praises the work of ASAPP, accuracy and natural dialogue applies technology to optimize workflows.

The first (EF), especially uses the model to support language learners with real-time pronunciation feedback for non-local speakers who are not different.

Sports information provider statistics, NOVA Sonic’s low delay and simple structure, the OPTA uses a simple structure and simple structure for interaction with information on the AI ​​chat platform.

Responsible AI and security obligation

In addition to performance and value, Amazon stresses the loyalty of the EU development. The models include security measures installed in the NOVA family and are supported by AWS AI service cards that use the usual use, potential restrictions and ethical rules.

Prasad focused on trust and security: “Trust may include individual identity for US developers, but we put a strong guard to prevent sound cloning or unwanted mimism.” Added: “We are working very difficult to eliminate hallucinations and sound slip. We have the bar we set for the speech, because of the extinct generation must be reliable.”

Amazon Nova Sonic is now available through Amazon Bedrock. Developers and businesses interested in exploring the model can begin to visit https://aws.amazon.com/nova/.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *