Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more

Meta Today has declared a partnership Cerebras systems In order to strengthen new Call APITo provide inference to developers, the traditional GPU-based solutions are faster than 18 times faster.

Announcement at the opening of the meter Llamacon The developer conference in the Menlo Park, positions to compete directly by the company Open, Anthropicaland Google The rapidly growing AI result service market, where they receive billions of tokens to strengthen the events of developers.

“Meta came together to cooperate to ensure an ultra-speed result, where they need to serve developers via the new Lama API,” he said. “We are really happy to announce our first CSP Hyperscaler partnership to give all the developers a Ultra fast result in Cerebras.”

Partnership, the official entry of the Mota’s official entrance to a commercial service, which transforms popular open source Lama models to a commercial service, celebrates the official entry. Meta gathered on the LLA models one billion downloadSo far, the company has not offered the first-sided cloud infrastructure to apply for them for developers.

“This is very excited about Cerebras,” he said. “Openai, Anthropik, Google – AI-AI inherent work has established a new AI business.

A benchmark schedule, Zlama 4, competitors sambanova (747), Grog (747) and GPU-based services, 2648 verses in 2648 verses in 2648 verses 2648 verses in 2648 verses. (Credit: Cerebras)

Break the speed barrier: how can cerebras supercharges llama models

The thing that allocated from META is a dramatic speed increase by Cerebras specialized by AI plugs. Cerebras delivers the system 2600 tokens per second For Llama 4 scout, compared to about 130 verses for Chatgpt, compared to about 25 token per second, according to Deepmarks, Artificial analysis.

“If the API-to-API, the Gemini and GPTs, all, all, all working in GPU speed between about 100 tokens per second,” he said. “And it is good to chat 100 tokens in seconds, but to think too slow. It’s very slow to agents. Today is fighting people.”

This speed provides completely new categories of applications that have previously impossible, real-time agents, real-time agents, spoken low delay sound systems, interactive code generation and instant a very step-step application.

This Call API In the AI strategy, first of all, a model provider to become a model AI infrastructure company, represents an important change in the EU’s AI strategy. By offering an API service, Meta protects the committance of the income of the income from the income of the moment and opening models.

“Meta is now in the work of selling signs and is excellent for the American AI ecosystem,” he said. “They bring a lot to the table.”

API will offer starting, delicate arrangement and assessment Llama 3.3 8B Modelallows developers to create information, train on it and test the quality of special models. Meta will not use customer information to bring up its models and the models that can be transferred to other hosts using the Llama API, which can be clearly different from some competitors.

Cerebras will strengthen the new service of the meter through the network Information centers In North America, including Dallas, Oklahoma, Minnesota, Minnesota, Montreal and California.

“At the moment, all our information centers serving inference are in North America,” he said. “We will serve meta with full capacity of the market. The workload will be balanced along all of these different information centers.”

Business regulation, NVIDIA is watching the model similar to the “Hiperscaler” model similar to how to maintain large cloud providers. “They distinguish the computed blocks that their developer can serve the population,” he said.

Outside of cerebras, Meta has declared a common partnership with GRQ Developers provide a quick result option to give a very high performance alternative to a traditional GPU-based result.

Meta entering the inference base with superior performance measurements was able to disrupt the dominated command Open, Googleand Anthropical. By combining the popularity of open source models with the capacity of more rapid distrust, the Meta appoints itself as a giant opponent in the commercial AI space.

“META 3 billion users, hyper-scale information and” Cerebras “are in a unique state with a large developing ecosystem.” Integration of Cerebras “Meta LeapFrog helps about 20x on performance on Openai and Google.”

This partnership for Cerebras is a special stage and its approval of its specialized AI device approach. “We have been building a waffle engine for many years and we knew that the first ratio of technology has always been, but as a result of another’s Hiperscale cloud.

This Call API Currently, it is available as a limited preview with a broader rolling planning in the coming weeks and months. Interested developers, including Ultra Speed LLAM 4, can request early access by selecting Serebras from the model options within the Lama API.

“If you imagine a relatively small company, we have a relatively small company, only the standard program of the meter, a API button, then suddenly, verses, the verses are a giant offer.” “We are just great for us to be at the back end of the whole development of the meter’s ecosystem.”

The selection of a specialized silicone choice of methane: In the next stage of AI, it is not what your models are, but it’s not what it might think. This is not just a feature of speed in the future – this is the whole point.

Daily Definitions from Daily Works Daily

If you want to surprise your boss, you covered your VB diary. We provide an internal bucket because they work with companies from regulation shifts to practical places, so you can share ideas for the maximum ROI.

Read we read Privacy policy

Thank you for your subscription. Check more VB bulletins are here.

An error occurred.

[ad_2]
Source link

Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

Break the speed barrier: how can cerebras supercharges llama models

Leave a ReplyCancel Reply

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch

Break the speed barrier: how can cerebras supercharges llama models

Leave a ReplyCancel Reply

Trending now

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch