Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Now it’s TikTok parent ByteDance’s turn for a reasoning AI: enter Seed-Thinking-v1.5!


Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Started with the announcement Openai’s O1 model In September 2024, but he really departed DeepSeek R1 was released in January 2025.

Most of the largest AI model providers and trainers are a little longer, better, better, better, better, better, better, better, better, better, better, better, better, “substantial, better” response, and interrogate them for true.

Tiktok’s Chinese website media giant parent is the last to join the party announcement and Publishing a technical document Seed think thinking-v1.5 is an approaching large language model (LLM) designed to develop justification performance in both science, technological, mathematics and engineering (hull) areas and general purpose fields.

The model is not yet available to download or use or not to use, and what the license conditions will be – for everything or anywhere you will use and modify and change it will be / or open source / free. However, technical paper presents some notable details that are worth the beginning when they are available.

Approve Meta’s New Llama 4 and Mistral’s confusion Before that, the seeds are built using a specialist (MOE) architecture.

These architectural models are more effective, essentially, the capabilities of many models are designed by combining the capabilities of a model that specializes in a different domain.

In this case, the MOE show that the architecture of the seed thinking is only 20 billion in the total of 200 billion parameters.

Bytebeation, it is called Technical document to GitHub This seed thinks-v1.5, prioritize the generation of structured thinking and thoughtful response.

Together with the Seed Thinking-V1.5, DeepSEEK R1, Google’s new broadcasting twins 2.5 Pro and O3-mini-satisfaction, even the two of these two exceeds these two Arg-AGI againstAs a model of EU’s goal or “sacred grail”, a model seen as the EU “Holy Grail” as a model of the most economically valuable tasks for the definition of Openai.

The largest modern models are placed in a compact, but skillful alternative, the seed thinking-v1.5 achieves competitive benchmark results and provides reinforcement learning (RL), training information in the curation and the AI ​​infrastructure.

Performance criteria and model focus

Seed thinking-v1.5 shows strong performance in a set of challenging tasks, 86.7%, 2024, 55.0% to @ 8, GPGA science from 77.3% to 55.3% pass. These results are close to or adaptable models of models like Openai’s O3-Mini-High and Google 2.5 Pro on specific reasoning sizes.

In non-reasonable positions, the model was evaluated by human choice comparisons, and DeepSEEK has received a higher earnings rate of 8.0%, only the strengths of the strengths of the strengths or mathematics.

Saturation in general criteria such as AIME, instilliably remembered, remembering and better discriminatory model presented a new, harsh mathematics criterion. This and the CodelFerces evaluation set is expected to be released in order to support future research.

Information strategy

Training information played a central role in the development of the model. The team for controlled subtle adjustment (SFT), including 300,000 samples, including 300,000 samples (root, logic and coding tasks) and 100,000 checkable problems as playing creative writing and role.

Information for RL education:

  • Verified Problems: 100,000 seriously filtered root questions and logic puzzles stem from elite competitions and expert opinions.
  • Inspired positions: People preferred databases are directed to open instructions evaluated using pairs of mowers.

Root data, advanced mathematics, bent a lot with accounting for more than 80% of the problem. Additional logic information Sudoku and tasks such as 24-point puzzles, difficulty adjustable to adapt to model progress.

Reinforcement learning approach

The study of the seed thinking-v1.5 is equipped with a special actor-critical-critical (VAPO) and Prostor-Cradi-Gradient (DAPO) frameworks, and developed policy frameworks in known instances in RL training. These methods are in the spotlight to increase the premium signal and increase training stability, especially in the long chain (COT) settings.

Reward models play an important role in RL outputs. Bytebee Bytettet presented two key tools:

  • Touch-Validator: A rule-based llm that checks whether the created and reference answers are mathematically equivalent.
  • Seed meditation-inspect: A step-by-step justification-based judge that increases the sequence of the doomsday and hacking the prize.

This two-speed premium system allows you to assess the NUSAN for both simple and complex tasks.

Infrastructure and scaling

To support extensive scale exercises effectively, Ray built a system of a system managed by Ray Groups and managed by Ray Groups and managed in its co-operated.

A prominent innovation, the flow roller (SRS) separating from the performance of the model evolution. Accelerate the speed of iteration by managing partially completed generations in model versions asynchronously. This architecture is reported to be provided with 3 × times RL periods.

Additional infrastructure techniques include:

  • Mixed accuracy for memory deposits (FP8)
  • Automatic adjustment for expert parallelism and Kernel Moe efficiency
  • Bytecheckpoint for continuous and flexible control
  • Autotuner to optimize parallelism and memory configuration

Man’s assessment and real world effect

To evaluate the alignment with human-centered advantages, a person’s test was held in a number of domains, including creative writing, humanities and general conversations.

Seed thinking-v1.5, the DeepSeek R1, which consistently, is reinforcing the application to the real world user needs.

The development group notes that the models, which showed a strong generalization in the examinated tasks, creative domains, demonstrated a consequence of the establishment and hardness installed in mathematical training.

What does technical leaders, information engineers and enterprise mean for decisions

The maintenance of technical lead to the management of the extensive language models is the ability to think that the placement-seed-Titmakk-v1.5 is integrated into the enterprise.

The modular learning process included in direct reasoning information and multi-phase reinforcement learning is especially attractive to the teams looking at the development of LLM while maintaining a fine reputation.

Bytebetate, the actions to provide seed thinking mechanisms for seedflying and seed thinking modeling, offering the proposal mechanisms for a more reliable modeling in placing models in customer-facial or adjustable environments.

For teams that are often intensified in tight deadlines and limited bandwidth, the stability of the model activated by reinforcement learning by innovations such as vapo and dynamic selection – can reduce iteration periods and subtle adjustment for specific tasks.

From an orchestra and placement prospects, the model’s hybrid infrastructure approach – FP8 offers significant gains in teaching and hardware, including the flow shift (SRS) and support.

These features would be valuable to engineers responsible for measuring LLM transactions in clouds and on-prev systems. Seed thinking-v1.5, taught with the mechanisms to adapt reward feedback based on the dynamics of working hours and speaks directly to the difficulties of managing problems between domains.

To ensure reliability, reproduct and sustainable integration of new tools, you can develop a plan to build a system level design, healthy, multi-modal orchestra systems.

For information engineering experts, the approach, including training information – a serious filter, expansion and expert inspection, strengthens the importance of data quality, such as model performance. This can be inspired to provide more intentional approaches to the database development and verification pipelines.

Future outlook

Seed thinking-v1.5 is the result of cooperation within the Seed LLM Systems group, which is led by Yonghui Wu, and HaiBin Lin, a long time.

The project also draws efforts such as Danimo 1.5 Pro and includes the techniques and data resistant in RLHF.

When the team looked upon, the team pays attention to training efficiency and premium modeling for checking tasks, plans to pursue the supply of strengthening techniques. The public release of domestic criteria such as unusual is designed for more extensive progress in thoughtful AI research.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *