Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more
Very small language models (SLMS) can make higher language models (LLMS) in based positions New research By Shanghai AI Laboratory. The authors can be superior to a SLM, complex mathematical criteria, which are 1 billion parameters with proper tools and timing measurement methods of testing techniques.
This new models are looking for new ways to use these new models in different environments and applications, because the ability to place SLMS in ink ink thinking tasks.
Test time scale (TTS) is the process of giving additional calculation extensions to the LLMS to improve their performance in different positions. Models of basic thinking like Openai O1 and DeepSeek-r1Use “Internal TTS” so that this means “think” it is taught to cultivate “think” for a long string philosophy (Cot) Tokens.
An alternative approach is “external TTS”, which is enhanced by the model performance (as a name). External TTS is suitable for repeating models to substantiate tasks from better adjustment. External TTS setup consists of a “policy model” with a process award (PRM), which evaluates the answers to the basic LLM and policy model. These two components are combined together with a sample or search method.
The easiest setup is “the best NN”, where the policy model creates a lot of answers and the PRM chooses one or more best answers to create the final answer. Uses more advanced external TTS methods. The “beam search” takes more than one step to the model answer.
For each step, many reply works through patterns and PRM. He then chooses one or more appropriate candidates and creates the next step of the answer. And “in search of different checkers” (DVTs), the model creates several departments of answers to create more different candidates before synthesized in the final answer.
Choosing the right TTS strategy depends on many factors. The authors of the study conducted a systematic investigation in how different policy models and PRMS affect the efficiency of TTS methods.
Their finds show that the effectiveness depends largely on politics and PRM models. For example, search-based methods for small policy models are the best n. However, the best n for major policy models is more effective, because the models have better thinking and need a reward model to check each step of their justification.
Their finds also show that the correct TTS strategy depends on the difficulty of the problem. For example, for small policy models of less than 7B parameter, the best works better for easy challenges, the search for the beam is better for more difficult problems. Different tree search for political models between 7B and 32B settings is a good performance for easy and medium challenges and the beam search works best for challenging challenges. However, for large policy models (72B settings and more), the best N-of-Of N is the optimal method for all difficulties.
Based on these findings, they can create developers Computing-Optimal TTS strategies To solve the problem of policies, it takes into account the policy model, PRM and the difficulty of the problem for the best use of the calculation budget.
For example, researchers found it LLA-3.2-3B The Model, Llama-3.1-405B with a compouple-optimal TTS strategy, has a math-500 and AIME24, two complex mathematical criteria. This shows that a SLM’s report can be superior to a model that is 135x larger when using the optimal TTS strategy.
In other practices, they found that a Qwen2.5 model with 500 million parameter can be won GPT-4O With the right calculation-optimal TTS strategy. Using the same strategy, using the 1.5B distilled version of DeepSek-R1 Outpersek-R1 in Math-500 and AIME24.
With both the training and the result account, the findings, with accurate scaling strategies, the SLMS 100-1000x can prefer larger models with less flopes.
The results of researchers show that compute-optimal TTS are significantly increasing the basis of language models. However, as the policy model grew older, the improvement of TTs gradually decreases.
“This shows that the effectiveness of the TTS is directly related to the policy model of the policy,” said, “Researchers are writing.” For models with weak thinking, the test-time calculation for models, for models with strong thinking earnings are limited. “
Research confirms that the calculation-optimal test timed scaling methods can perform better than the larger models of SLSMs. When focusing on this research math criteria, researchers plan to expand their work for other justifications such as coding and chemistry.