qwq-32b-launches-high-efficiency-performance-reinforcement | VentureBeat

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more

Qwen TeamA division of Chinese e-commerce giant Alibaba Open Source Qwen large language models (LLS) develop, develop, developed QWQ-32BA new 32 billion-billion-parameter specification model designed to improve performance in complex problem solving tasks through the reinforcement learning (RL).

The model is available as open weight Hug face and ModelCop Under the Apache 2.0 license. This is accessible to commercial and research use, so enterprises can immediately use products and applications (even where customers downloads to use customers).

You can also log in for individual users Qwen Chat.

Questions responded to Alibaban’s Openai’s original substantial model O1

Qwq, short, first for questions were presented Alibaba on November 2024 As an open source justification model aimed at competing with Openai’s O1-Preview.

Initially, model, math and coding tasks are designed for logical justification and planning by reviewing and purifying a technique, which is especially effective in their own answers, math and coding tasks.

The initial version of QWG, 32 billion parameter and 32,000 Token context length, in mathematical assessments such as AIMATIC and math, stressed the ability to emphasize scientific substantiation as well as GPGA.

Despite its strong, the early iterations of the QWQ, the Programming criteria such as Livecodebench, which protected the edge of Openai models. In addition, as in many ways of thinking, QWQ faces challenges as a language mixing and random circular reasoning loops.

However, the decision of the Alibaba’s Apache 2.0 license, the decision of developers and enterprises could freely adapt him to the open alternatives and ensured him to trade and recognize it as OP.

Since the initial release of QWC, the AI view developed rapidly. The restrictions of traditional LLM seem more clearly, reducing the reduction income in improving performance.

This landslide has been interested in a new category of AI systems, etc., to increase accuracy of majority models (LRMS). This includes Openai’s O3 series and the massively successful DeepSeek-r1 Rival Chinese Lab DeepSeek from Hong Kong quantitative analysis company to manage high flyer capital.

A new report Similar to web traffic analysts and research firm!

*Credit*: *Similarweb, AI Global Global Global Secory Sector Trends in Generative AI*

GWQ-32B, Alibaba’s latest iteration, developing this progress and structured self-survey, integrating these progress and establishes this in-think-focused AI field as a serious opponent.

Expansion of performance by learning multi-stage reinforcement

Traditional instructional models often fight difficult thinking tasks, but Qwen’s team research shows that the research of the RL can significantly improve the capacity of a model in solving complex problems.

The QWQ-32B is based on this idea by implementing a multi-stage RL training approach to increase mathematical substantiation, coding skills and common problem solving.

Model, bu modellərin bəzi hissələrindən daha az parametrdən daha az parametrlərə baxmayaraq, Deepseek-R1, O1-Mini və Deepseek-R1-R1-R1-R1-R1-R1-R1-R1-R1-32B kimi aparıcı alternativlərə qarşı göstərilmişdir.

For example, during processing DeepSEEK-R1 671 billion parameter (37 billion Activated), QWQ-32B can get comparable performance with a smaller trace that is usually required 24 GB Vram in a GPU (Compared to NVIDIA H100S 80GB)) 1500 GB Vram To run the full DeepSEEK R1 (16 NVIDIA A100 GPUs) – to emphasize the efficiency of QWEN’s RL approach.

QWQ-32B is followed by a variety of language modeling and several optimization includes:

64 Transformer layer IP, Swiglu, RMSNorm and attention Qkv bias;
Generalized survey attention (GQA) for surveys for 40 focus heads and 8 for 16-button value pairs;
Length of 131,072 verses 131,072 to allow long-running entrances to work better;
Prethra, control and multi-stage training, including RL RL.

The RL process for QWQ-32B was executed in two stages:

Mathematics and coding focus: Taught using a accuracy confirmation for the code execution server for model, mathematical justification and coding tasks. This approach has ensured that the answers arising are confirmed for correctness before strengthening.
Common ability strengthening: In the second stage, the model was training based on general premium models and rule-based valves. This stage improved man’s alignment and agent justification instruction without violating math and coding capabilities.

What does the enterprise mean for decision makers

For enterprise leaders – CEO, CTOs, IT leaders, team managers and AI application developers-QWQ-32B represent a turn that can support the EU work decision and technical innovations.

With RL-in-based capabilities, model, automated data analysis, strategic planning, software, software development and clever automation, can provide more accurate, structural and contextual concepts.

Companies that use the complex problem, use the help of ECWQ-32B can find an attractive choice for solving, coding help, financial modeling or customer service automation. In addition, its open weight allows the domain to repair and customize the model for domain specific applications without the property restrictions, an agile choice for enterprise AI strategies.

The fact that the Chinese e-commerce giant can create some security and biased concerns for non-Chinese users, especially when using the QWEN chat interface. However, as in DeepSeek-R1, it shows that the model is connected with the use of loading and offline or re-preparation, which will be easily exceeded. And is an alternative to DeepSseek-R1.

Early reactions of AI power users and those who influence

The release of the QWQ-32B has focused on an investigation and development community that shares its initial impressions of the EU research and development community, with several developers and industrial experts:

Hug face Vaibhav Srivastav (@reach_vb) The supplier stressed the speed of QWQ-32B Hyperbolic laboratoriesCalls “Blaxla fast” and compared to the highest level models. He also “model of the model of the” model of the “Apache 2.0 license” and Openai O1-Mini “model” “model”.
AI news and rumors publisher Rod (@kimmonismus) Admired the performance of the model, despite the fact that the GWQ-32B is sometimes the 20 times smaller, Deepseek-R1 was amazed by the performance of interest. “Holy Moly! Qwen cooked!” They write.
Yuchen Jin (@yuchenj_uw), Co-founder of hyperbolic laboratories and CTO, Was released by noting that they gained efficiency. “Small models are so strong! Alibaba Qwen, Deepseek-R1 (671b) and Openai O1-Mini, Deepseek-R1 (671b), Deepseek-R1 (671b) was released by QWQ-32B!”
Another hugs member member, Eric Kaunismäki (@erikkaum) In case of accessible to developers without extensive installation, the model stressed the comfort of placement by sharing the model is available to place a click.

Agent opportunities

QWQ-32B, based on the environment, provides an agent, which allows you to dynamically regulate substantiation processes.

For optimal performance, the Qwen team recommends using the following result settings:

Temperature: 0.6
Overlay: 0.95
Top: 20-40
Yarn scales: It is recommended to work in a long sequence from 32,768 verses

The model supports placement using high transmission framework, VLLM. However, the current meeting of the VLLM only supports a static yarn scale that protects a stable scale factor regardless of access lengths.

Future developments

Qwen’s team sees QWQ-32B as the first step of the RL to increase the skills of thinking. When we looked up, the team plans:

Explore further expanding RL to improve model intelligence;
Combine agents with RL for long horizontal justification;
Continue the development of optimized basic models for RL;
Go to artificial general intelligence (AGI) through more advanced training techniques.

QWEN Team with QWE 32B, the scale is the main driver of the next generation of AI models that can produce high performers and effective thinking systems.

Daily Definitions from Daily Works Daily

If you want to surprise your boss, you covered your VB diary. We provide an internal bucket because they work with companies from regulation shifts to practical places, so you can share ideas for the maximum ROI.

Read we read Privacy policy

Thank you for your subscription. Check more VB bulletins are here.

An error occurred.

[ad_2]
Source link

qwq-32b-launches-high-efficiency-performance-reinforcement | VentureBeat

Questions responded to Alibaban’s Openai’s original substantial model O1

Expansion of performance by learning multi-stage reinforcement

What does the enterprise mean for decision makers

Early reactions of AI power users and those who influence

Agent opportunities

Future developments

Leave a ReplyCancel Reply

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch

Questions responded to Alibaban’s Openai’s original substantial model O1

Expansion of performance by learning multi-stage reinforcement

What does the enterprise mean for decision makers

Early reactions of AI power users and those who influence

Agent opportunities

Future developments

Leave a ReplyCancel Reply

Trending now

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch