HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH

[ad_1]

Want smarter ideas in your inbox? Sign up for our weekly newsletters to get what is important for businesses, information and security leaders. Subscribe now

China AI Beginner DeepSeek, an office of high flyer capital management in Hong Kong, was a bit of a month from a month broadcast Hit open source model DeepSeek’s latest version, R1-0528.

Like their predecessor, like DeepSeek-R1 Suggested AI and global business communities The more cheaper is taught and all that is done on the amazing tasks, all of which are adapted to all developers and other AI laboratories and developers available in R1-0528 and are adapted in accordance with Permisive Apache 2.0 license and remixes.

This week, a 24-year-old German company TNG Technology Consulting GmbH has left one This kind of adaptation: Deepseek-TNG R1T2 ChimeraIts latest model in the family of the Great Language Model (LLM). R1T2 gives you a remarkable impetus in efficiency and speed, hits upwards 90% of R1-0528 exploratory evaluator scoresIn the case of the answer Less than 40% of R1-0528.

It produces shorter answers to translate directly The faster result and lower calculation costs. Model Card TNG, AI code is broadcast for the new R1T2 of the sharing community, the company is called “regular R1 faster than R1 faster than R1-0528)” (DeepSeek from DeepSeek).

The answer was incredibly positive from the AI developer society. “Damn! DeepSeek R1T2 – 200% faster than R1-0528 and R1, 200% faster than R1,” Vaibhav (VB) Srivastav, a high leader, In x. “DS V3, R1 & R1-0528 is significantly more than R1, which is made by experts with experts – and this is a MIT licensed in the MIT-licensed, Hugging face.”

This Earnings TNG Method (AOE) Method (AOE) Method – Made of “LLS” by selecting weight tensors (internal parameters) described in a Paper published in May In the archive, the peers reviewed the open opening online magazine.

The original R1T presents a new “Tri-Mind” configuration connecting the three parent model, R1T2, R1T2, three parent models: DeepSeek-R1-0528, DeepSeek-R1 and Deepseek-v3-0324. The result is a model designed to maintain high thinking ability by significantly reducing the cost costs.

R1T2 is built without more beautiful adjustment or retraining. R1-0528, R1’s structured conceited patterns and V3-0324, the reasoning power of the formed thinking and the use of R1, offers a more efficient but skillful model for businesses and research use.

How much experts are different from experts (MOE) (MOE)

Mixed experts (MOE) is a conventional architectural design of different components or “specialists”. In Moe LLM, such as DeepSeek-V3 or Michtral, only one bottom of the model of the model of the model in the forward transfer of any Token (for example, 8 to 8. This allows you to count higher parameters and specialize in very large models, while managing fine costs – because only part of the network is assessed.

Experts (AOE) is a model combination technique, not an architecture. Weight tensors are used to create a new model from many pre-made Moe models by interpolating the voter.

“Experts” in AOY belong to the combined model components – expert tensors usually redirected within Moe layers – experts are not dynamically activated in the timeline.

AOGE’s application, AOE’s application, part of a model that is the most responsible for specialized justification – instead of specialized specialist – often, while more efficient and focused layers of models like V3-0324. This approach allows Chimera models to inherit the strongest parent models or delays or delay in increasing the ability to delay.

Performance and speed: Screens actually show what

According to benchmark comparisons provided by TNG, R1T2 reaches 90% and 92% As measured by AIME-24, AIME-25, AIME-25 and GPQA-Diamond test sets, the main performance of the smartest parent, DeepSEEK-R1-0528.

However, unlike DeepSeek-R1-0528 – long-term thoughtful reasoning is tended to produce long, detailed answers – R1T2 is designed to be shorter. Provides similar intelligent answers when using significantly less words.

“Speed” measures in terms of TNG, rather than paying attention to raw processing time or sign time per second Token number of access to the answer – Practical proxy for both cost and delay. According to the Shared Etalons by TNG, it creates answers using R1T2 About 40% of Token Required by R1-0528.

This translates a 60% reduction in exit lengthReduces the time of direct results and calculates loads, increases answers to 2x or up to 200%.

Compared to the original DeepSeek-R1, R1T2 is also around 20% shorter on averageOffers meaningful earnings of efficiency for high transmission or sensitive placement.

This efficiency does not come to intelligence. As shown in the benchmark schedule in the Technical Document of the TNG, R1T2 is sitting in the desired area for intelligence pricing curves. Protects reasoning quality when minimizing the verbocybidity – a critical result for enterprise applications where the speed of the transfer speed and costs are all items.

Placement considerations and availability

R1T2 is broadcast under the MIT license, which is now available on Hugging, ie open sources and is installed and installed on commercial applications.

Note that the model is well fits for common thinking tasks, the function from the DeepSEEK-R1 Lineage is not recommended for the required cases required for the use of the function. These can be solved in future updates.

The company also advises European users to evaluate the compatibility of EU EU ACT Act entered into force on August 2, 2025.

Enterprises operating in the EU should consider relevant provisions or consider using the suspension model after this date if the requirements cannot be met.

However, US companies are local and consisting of users or other peoples based on us and other people no This is subject to the terms of the EU ACT Act, which should give you a significant comfort when using a free, fast, open source source basing model. If you are serving users in the EU, some The provisions of the EU Law will still be applied.

TNG has developed chimera options in advance through platforms such as OpenRouter and chutes, reported to be operated on billions of verses daily. The release of R1T2 is another evolution in the existence of this public.

Consulting GmbH on TNG technology

Established in January 2001, TNG Technology Consulting GmbH Bavaria is used in Germany and more than 900 people with a high concentration of Phuds and technical specialists.

The company serves customers in industries such as software development, artificial intelligence and devoS / cloud services, telecommunications, insurance, automobiles, e-commerce and logistics.

TNG acts as a consulting partnership based on values. A unique structure based on operational research and self-management principles, supports the culture of technical innovation.

As demonstrated in publication of public releases and experts as R1T2, it actively contributes to open source communities and research.

What does the enterprise mean for technical decision makers

CTOs, AI platform owners provide R1T2 material benefits and strategic choices for engineering leaders and procurement groups:

Lower result costs: With less output verses per task, R1T2 reduces GPU time and energy consumption, translating direct infrastructure savings – especially in high transmission or real-time environments.
High justification quality without surface: Like R1-0528, it protects most of the grounds of high-level models, but is long-term. This is ideal for structured structured tasks (math, programming, logic), preferred by short answers.
Open and alterable: MIT licenses, special hosting, model alignment or allows you to customize full placement control that allows you to train more in adjustable or ventilated environments.
The interrupt modes: Aoe approaches allow for a future model of modulation, institutions to enterprises from scratch, and the recombosing for renters to recombide.
Caveats: Enterprises, which rely on the function of the function, tool use or advanced agent orchestra, although future Chimera updates can apply to these gaps.

TNG encourages researchers, developers and enterprise users to explore the model, testing behavior and give feedback. R1T2 is available in chimera huggingface.co/tngtech/deepseek-tng-r1t2-chimeraand can be directed to technical surveys edit@tngtech.com.

TNG research paper is available for technical information and benchmark methodology Archive: 2506.14794.

Daily Definitions from Daily Works Daily

If you want to surprise your boss, you covered your VB diary. We provide an internal bucket because they work with companies from regulation shifts to practical places, so you can share ideas for the maximum ROI.

Read we read Privacy policy

Thank you for your subscription. Check more VB bulletins are here.

An error occurred.

[ad_2]
Source link

HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH

How much experts are different from experts (MOE) (MOE)

Performance and speed: Screens actually show what

Placement considerations and availability

Consulting GmbH on TNG technology

What does the enterprise mean for technical decision makers

Leave a ReplyCancel Reply

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch

How much experts are different from experts (MOE) (MOE)

Performance and speed: Screens actually show what

Placement considerations and availability

Consulting GmbH on TNG technology

What does the enterprise mean for technical decision makers

Leave a ReplyCancel Reply

Trending now

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch