Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more
Three-way partnership among AI phone support company ChonelyInference optimization platform Perfectand chip manufacturer Gle The talking progress appealed to one of the most sustainable problems of spoken artificial intelligence: an awkward delays that signal the callers they speak immediately.
Cooperation also allowed reduction of 81.5% to 99.2 per minute of 81.5% to 99.2% of the four-point modeling of GPT-4O to 99.2% 94.7% evaluation 4.5 percent bay. Improvements are between a very specialized AI model between a very specialized AI model between the new opportunities of Grog immediately, the transition between a very specialized AI model installed through the Maitai Optimization Platform.
Achievement solves what industrial experts call “belly“Voice AI – Fine desires that make automatic conversations are also inhumane. Call centers and customer services can be transformative: one of Famely’s customers replaces 350 human agents.
Traditional great language models like Openai GPT-4O Struggle for a long time with things that seem to be a simple problem: Reply enough to save natural conversation flow. Although a few seconds delays are difficulty in text-based interactions, the same pause has a mutual effect during live phone calls.
“One of the things that most people do not differ, it has a very high speed change of large LLM providers, such as Epenai, clams and others,” he said. “If you are talking to a voice AI for 4 seconds, a voice AI, it feels like an eternity – this day today is the feeling of a person today.”
The problem occurs at least once in the desire, ie the unavoidable interaction of standard conversations consists of at least one or two awkward breaks. Given the AI phone agents, these delays have created an important obstacle to accept.
“This type of delay is unacceptable for real-time phone support,” said Bodewes. “In addition to the delay, negotiations and reactions such as human beings are something that Master LLM providers do not crack in the sound world.”
The solution developed that GRG calls the company “Zero-Ladlince Lora Hotswapping“- Ability to switch immediately among many specialized AI model options without any performance penalty. LORA or low-ranking adaptation allows developers to create special changes to existing models, to provide special changes to existing models.
“The combination of a subtle program under a subtle program, a combination of high-speed chip memory, streaming architecture and determination of the Groat,” said Cheelsey Kantor, Grog Chief’s Chief Marketing Officer, Venturebeat said in a meeting with Venturebeat. “Lauras are stored and managed in Sram next to the original model weights.”
The progress of this infrastructure Maitai’ın, Dalsanto’nu ensured that the model performance continuously optimizes the “Proxy-Labilized Orchestra” system. “Maitai plays a thin proxy layer between customers and their model providers,” Dalssanto said. “This allows you to dynamically choose and optimize the best model for each survey, to automatically enforce prices, optimization and drop.”
The system works with each interaction, identifying weaknesses, determining weaknesses, and using customer intervention without using customer intervention. “We collect strong signals that determine where the models are located in the middle of the influx of inference,” Maitai said. “It’s ‘soft spots’, labeled and gradually adjusted to solve special weaknesses without labeled and regressions.”
The results demonstrate significant developments among many performance sizes. Time for the first sign – 73.4% from 661 milliseconds from the EU – 73.4% from 661 millisia, 73.4% were reduced to 176 millisions in the 90s. The total completion periods decreased by 339 millisions from 1446 millisecution of 74.6%.
Perhaps more substantially, the accuracy improvements, starting from 81.5% to 99.2%, followed by a clear loading trajectory between four model iterations, and many customer service scenarios are a level that exceeds human performance.
“We saw about 70% of people who call the EU not able to distinguish between a person,” said Bodewes Venturebeat. “The delay does not stop, which is a special delicate-adjusted model with a special delicate-adjustable model, which is a special delicate-adjustable model and a super low-delay model, which is a super low-delaying model.”
Performance gains are directly translated into the results of business. “One of our greatest customers, using the previous modern models, the previous version has seen a 32% increase in quality leaders,” Bodewes.
Improvement comes as the mounting pressure of call centers to reduce costs while maintaining the quality of service. Traditional human agents require training, planning coordination and significant surface expenses to eliminate AI agents.
“Call Centers really benefit from using the south to replace human agents,” said Bodewes. “One of the call centers we are trying to really replace 350 people agents with a total of 350 people this month. This game is the change in the prospect, because it does not have to manage human support agent, train agents and suitable supply and demand.”
Technology shows special power in specific use. “Only in several areas, including several areas, including printection planning and lead quality, inherited providers,” said Bodewes. The company partnered with the main companies engaged in insurance, legal and car relations.
GRQ’s specialized AI inference chips, called Language processing units (LPU), ensure the hardware fund that is suitable for multiple model approaches. Unlike the general purpose graphics processors used for the EUII inference, LPUs optimize specifically for the consistent nature of language processing.
“LPU architecture is a very fast and predictable level that has a subtle level, and many small ‘delta’ weights (lauras) are in a general base model (lauras), the additional delayed main model (lanules), adding a delayed delay,” Cantor said.
Cloud-based infrastructure also appeals to scale concerns with a historically limited AI placement. “The beauty of a cloud-based solution such as GrogCloud is a dynamic scale for any AI model we offer for groq handles and beautifully adjustable LORA models for groq handles,” Cantor said.
Economic preferences for enterprises seem significant. “The simplicity and efficiency of our system, low energy consumption, low energy consumption and high performance of our hardware allow customers to provide customers to customers without sacrificing their performance,” Kantor said.
One of the most attractive aspects of the partnership is the speed of implementation. Unlike traditional AI placements that may require integration work for several months, Maitai’s approach allows the links to companies that are already using common model.
“We used to provide them with a” Dalssanto “,” Dalssanto, “said” Dalssanto, “said,” Dalssanto “,” Dalssanto, in the same day, we have passed them in the same day.
This rapid placement capability solves a general enterprise concern in connection with the AI project: time applied for a long time that delays the return of investment. Proxy layer approaches can protect the existing API integrations when the companies get access to continuous improvement.
Cooperation AI architecture, monolithic, a wider queue in the enterprise that is far from specialized, special systems. “We observe the growing demand from the teams that benefit their applications smaller, highly specialized workloads and individual adapters,” Dalssanto said.
This trend reflects the adult concept of AI placement problems. Instead of waiting for single models to prefer all tasks, enterprises are increasingly increasing the value of continuous purge solutions according to real-world performance information.
“Multi-Lora Hotswapping companies, traditional cost and complexity allow obstacles to remove accurate, more accurate, more accurate models to remove obstacles,” said Dalssanto. “This is due to how the AI is found and placed in the root.”
The technical fund also allows more complex applications such as technology growth. GrQ infrastructure can support businesses, potentially, potentially, potentially specialized models to use in various customer segments or jobs.
“Multi-Lora is hotwapping, low-delaying a high-precision conclusive,” Dalssanto said. “Our road map is prioritizing infrastructure and optimization to create infrastructure, tools and delicate, applied, applicable results.”
Partnership for a wider spoken AI market demonstrates that the technical restrictions once considered a secret infrastructure and careful system design. As more enterprises placing AI phone agents, the competitive advantages displayed by the south can set up new initial expectations to perform performance and response in automated client interaction activities.
Success also approves the model of AI infrastructure companies working together to solve complex placement problems. This cooperation approach can accelerate the innovation within the EU Sector to provide specialized opportunities to deliver any provider that any provider can get independently. If this partnership is any sign, the period of open-to-date telephone conversations can be completed faster than expected.