Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

QwenLong-L1 solves long-context reasoning challenge that stumps current LLMs


Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Alibaba Group introduced Qwenlong-l1A new frame that allows you to have a large language models (LLS) over the extremely long period of time. It can open a new enterprise application to understand and understand the concepts of extensive documents such as development models, detailed corporate documents, long financial statements or complex legal contracts.

Long formally thinking problem for AI

In great models of thinking (LRMS) last, especially Learning reinforcement (RL) has significantly improved the possibility of solving problems. Studies show that when RL is taught with beautiful regulation, LRMS acquires similar skills “Slow thinking“In places where complex strategies to solve complex tasks.

However, these improvements are commonly seen when working around 4,000 toxen, with relatively short text pieces of models, primarily. The ability to scale these models for longer contexts (eg 120,000 token) scale remains the main problem. Such a long-form justification provides the ability to carry out all the context and a very step-by-step analysis. “This restriction creates an important barrier for practical applications that require foreign knowledge such as foreign knowledge, where the LRMS must collect and process information from the knowledge-intensive environment,” said Qwenlong-L1 paper.

Researchers delivered these difficulties to the concept of “long-term justifier RL”. Unlike short-contective justifications, long-term RL models are often due to the long-term relying RL models for a long time and require a clear acquisition of the location. Only then can this create chains based on the information included.

For this, training models result in difficult and frequent inefficient learning and unstable optimization processes through the RL. Models are eager to unite on good solutions or lose their ability to explore various reasoning ways.

Qwenlong-L1: A multi-stage approach

Qwenlong-L1 is a reinforcing learning framework designed to help LRMs from the skills with short texts to generalize in long contexts. The framework, carefully, increases short contextual LRMs available through a multi-stage process:

Warming Controlled Fine Adjustment (SFT): The model is initially subjected to a SFT phase where the samples are taught to the samples of long-contextual justification. This stage builds a solid foundation that accurately provides the model’s extensive information from long entrances. It helps to understand the context, create logical reasoning chains and develop the basic opportunities in removing answers.

Curriculum managed staged RL: At this stage, the model, the target length of the input documents is gradually taught with many stages growing. This systematic, step-by-step approach helps to adapt the model’s justification strategies to longer context to longer. When the models are suddenly taught to long texts, often prevent the instability visible.

Retrospective example of aware of the difficulty: The final training phase includes difficulty examples of previous training stages, and ensures that the model continues to learn from the heaviest problems. It prioritizes difficult events and encourages the model to explore ways of different and complex thinking.

Qwenlong-l1 process (Source: Archive)
Qwenlong-l1 Process Source: Archive

Outside of this structured training, Qwenlong-L1 uses a different reward system. For short-contained reasoning, exercises often rely on rates based on strict rules (eg, the correct answer in the math problem). This combines a verification based on the rules that ensure accuracy by checking the accuracy by checking the correct compliance “LLM-AS-A-HAK“This referee compares the semanticity of the response to the ground, which can be more comfortable and can be better managed by various roads, it can be stated in relation to long, nuisant documents.

To put the Qwenlong-L1 test

Alibaba team evaluated Qwenlong-L1 using the document question (DOCQA). This scenario is quite relevant with the need to understand the integrated documents to answer the AI’s complex questions.

Experimental results for seven prong time DOCQA stands showed the opportunities of Gwenlong-L1. Note that the Gwenlong-L1-32B model (based DeepSeek-R1-R1-DEPSEK-QWEN-32B) The performance comparable to anthropic was obtained To think about Claude-3.7 Sonnetand models that come out like Openai O3-Mini and qwen3-235b-a22b. The Small Gwenlong-L1-14B model also went over Google Twins 2.0 Flash Thinking and qwen3-32b.

Source: Archive
Source: Archive

An important finding associated with Real-World applications is the result of how RL training is to develop basic long-term reasoning behavior. The paper notes are better in “substantiated”, “Subgoal settings”, “subgoal settings”, “subgoal settings”, “check-in and”) and “inspection”) and “checkout” and “check”) and “substitutions”) and “checking”) and “substitute.”

For example, a base model can be placed in the financial document inappropriate or compressed in a loop of excessive information, and the training model of qwenlong-l1 demonstrated effective self-reflection. Successfully filtering these distracting details, withdraws from the wrong ways and come to the correct answer.

Methods such as QWENLONG-L1 can significantly expand the benefits of the AI ​​in the company. Potential applications include legal technological (analyzing thousands of pages of thousands of pages), finance (deep research on annual reporting or financial documents for risk assessment or investment opportunities) and customer service (information on customer support). Researchers released Code for qwenlong-l1 recipe and Weights for Training Models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *