Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Nvidia’s new Llama-3.1 Nemotron Ultra outperforms DeepSeek R1 at half the size


Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Like Meta turns off questions and criticism of the new Llama 4 model familyGraphic Working Unit (GPU) Master Nvidia has released a new, fully open source model (LLM) based on the old Model Llama-3.1-405B instruction model – Vaunted Rival Deepseek R1 open source rationale model.

LLA-3.1-Nemotron-ultra-253B-V1 is a dense 253 billion parameter designed to support advanced thinking, instructions and AI auxiliary works. It was Initially, the annual GPU Technology Conference of NVIDIA (GTC) noted in March.

The release continued attention to the optimization of NVIDIA architectural innovations and performance with target post-education.

Last night, April 7, 2025 was announced The model code is now open on Hugging facewith open weights and post-training information. It is designed to work effectively in both “reasoning” and “reasoning” modes, and allows developers to pass between high-complex reasoning tasks and simple results based on system desires.

Designed for effective results

LLA-3.1-Nemotron-Ultra-253B develops in the previous NVIDIA business in the previous case of NVIDIA. Its architecture – a specially designed (NAS) process with a nervous architectural search, the process of thrown focuses, fecunded nourishing networks (FFNS) and variable FFN compromising proportions.

This architectural overhaul reduces memory trace and calculation requirements without a serious impact on the quality of access, which ensures the placement on a 8x H100 GPU node.

According to NVIDIA, according to NVIDIA, it is a model that offers a strong performance that allows you to be more efficient to place in the data center environment. Additional hardware matches include support for the NVIDIA B100 and Hopper microarchitectures, both BF16 and FP8 configuration configurations.

Training for teaching and adapting

Nvidia has developed a base model via a very phase post-training pipeline. This is a delicate regulation between domains such as domains such as math, code production, conversation, conversation, conversation and instrument use, the study of relative policy to further the following and justification performance (GRPO)

The model is constantly 88 billion signs of a knowledge of more than 65 billion in 65 billion.

Teaching databases include sources such as Finneweb, Buzz-V1.2 and filling. Post-training challenges and answers, including the combination of the public corporation and synthetic generation methods, including databases that have taught the difference between the model’s reasoning modes.

Improved performance in numerous domains and criteria

The assessment results show remarkable savings when the model works in a motivated mode. For example, in the Math500 Benchmark, the performance is effective by up to 80.40% to 97.00% in the standard mode.

Similarly, according to AIME25, the results increased by 16.67% to 72.50% and increased by Livecodebeng, 29.03% to 66.31%.

Performance earnings are also like BFCL V2 and function content, as well as the general question (GPGA), as well as the general question (GPGA), this model is up to 56.0% compared to 76.01% of the model.

These phasis were carried out in the length of 32,000 board in length and repeated up to 16 times to ensure each test accuracy.

Compared to DeepSEEK R1 is the most modern MOE model with a total parameter of 671 billion3.1-Nemotron-ultra-253B shows competitive results in positions (89,45 vs 83.3) and Livecodebeng coding tasks (66.31 vs 65.9).

Meanwhile, DeepSeek R1, especially AIME25 (79.8 vs 72.50), Aime25 (79.8, 72.50) and somewhat removed a little extracurricular dominance of a little beyond (97.3 and 97.300).

These results show that despite the fact that NVIDIA’s suggestions or math-heavy categories are slightly followed by MOE alternatives or Moe alternatives related to basic training errors.

Use and integration

Model Hugging Face is compatible with the Transformers Library (version 4.48.3) and supports the login and exit sequence to 128,000.

Developers can manage reasoning behavior through system desires and select coding strategies based on task requirements.

NVIDIA for reasoning tasks recommends using a temperature pattern (0.6) worth 0.95. Deterministic speeches are preferred to gentle coding.

LLA-3.1-Nemotron-ultra-253B supports English and several additional languages, English and several additional languages, including German, French, Italian, Portuguese, Indian, Portuguese, Indian, Spanish and Thailand.

CointBot Development, AI Agent Workstreams are also suitable for the use of common LLM as an expanded generation (dwarf) and generations of code.

Licensed for trade usage

The NVIDIA open model license and the LLAM 3.1 community is ready for the model commercial use, managed by the License Agreement.

NVIDIA stressed the importance of the teams encouraged to assess the importance of the development of the EU development, model alignment, security and biased surveys for their unique use.

Olexi Kuchayiev, ​​post-AI model training director in NVIDIA, The ad was shared with xStating that the team is excited to release it, it explained this in a 253B model and was released with open weight and information.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *