DeepCoder delivers top coding performance in efficient 14B open model

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Researchers at Together ai and Agent Like, they released DeepCoder-14b, a new coding model that gives effective performance comparable to leading property models Openai’s O3-Mini.

Built on the DeepSEEK-R1, this model gives high-performance code generation and reasoning opportunities to integrate real-world applications. Important teams, training information, training information, code, notes and system optimization that can help researchers improve their work and move forward, have completely emerged.

Competitive coding capabilities in a small package

The research team shows that DeepCoder-14b, LivecodeBeng (LCB), is a strong way between several difficult coding criteria, including code amplifiers and humaneval +.

“Our model demonstrates strong performance between all coding criteria … O3-mini (low) and O1),” Researchers A Blog Post describes the model.

Interestingly, despite the training of coding tasks, the model, 73.8% score in 2024 Benchmark (Deepseek-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-R1-Dear-Qwen-14b) shows a mathematical justification compared to 4.1%. This indicates that developed thinking skills can be done effectively in other domains through the code RL.

DeepCoder-14B performance
Credit: Together AI

The most surprising aspect is obtained only with 14 billion parameters to this performance level. This makes it smaller and more potential to run more than many border models.

Deepjoder’s performance updates

Researchers solved some basic problems when developing the model Training coding models use of reinforcement learning (RL).

The first problem was treated for training information. Requires reliable premium signals showing that the learning of the fixture is correct. Researchers are suffering from the relative scarcity of such information, unlike mathematics, where high quality and inspected data are easily available on the Internet. ”

To solve this problem, DeepCoder team applied a serious pipeline to collect samples from different databases and transferring them from filters for reliability, complexity and repetition. This process gave 24,000 high-quality problems with a solid foundation for effective RL training.

The team has a simple reward function that only provides a positive signal, if the Created Code has passed the selected section tests for the problem within a certain period of time. Combined with high quality training samples, this result is prevents the learning of recommendations as the resulting printing of the model is to print remembered answers for public tests or optimize the main problem for simple strangers.

The main training algorithm of the model is based on the group’s relative policy optimization (GRPO), proven reinforcing learning algorithm Very successful in DeepSEEK-R1. However, the team made a number of changes to allow more stable and allow the model to improve the model to improve the model for a longer period of time.

GRPO +
GRPO + DEEPCODER-14 allows the loan to last longer without collapse: AI Together

Finally, the team extended the context window of the model, initially increases its education and gradually in the sequence of shorter thinking. They also prepared a filtering method to prevent the model of the model when the model creates substantiation chains while resolving a difficult wish.

Iterative context extension
Deepcoder 32k Problems have been prepared, but 64K task credits were able to solve: AI together

Researchers explain the main idea: “To protect long-term justification when enabling effective training … This technique is exceeding the current context limits, but exceeds the length of the length, but the length of the length is faced by the unreasonable sequence.”

Training gradually scale from 16K to 32k Context window, and as a result, the model can solve the problems required until 64k 64K.

Optimizing a long contextual RL training

Great models of tasks, which require long-formed sequences such as coding or complex thinking, are computing intensively and slow. A large swolleneck is a “sample” step, which has thousands of monthly marks for an example of a model of the model. The response length changes some answers end later than others, slow down GPUs and slowing the entire training loop.

The team to accelerate this, prepared a ver-pipeline with an optimized extension of the open source Sport Library for The study of strengthening from human opinion (Rlhf). The main innovation called “one-time pipeline” reorganizes an example of response and model updates to reduce the speech and accelerator free time.

One-time pipeline
One-time pipeline

Their experiences showed 2x acceleration to encoding RL tasks compared to the basic applications of the single pipeline. This optimization has an open source as part of a VERL-pipeline for a deep term (2.5 weeks in 32 H100) for a deep term (2.5 weeks in 2.5 weeks).

The impact of the enterprise

Researchers have made all works to train all works for training and all work that works in 14B Entrusted and Hug face under the license that receives permission.

“We allow our database, code and training control, we allow the Society to increase our work and train the RL to train for everyone,” the researchers write.

DeepCoder-14B shows a broader, accelerator trend in the AI ​​landscape strongly: highly skilled, but efficient and open accessible models.

For the world of enterprise, this shift shows more options and higher access to developed models. The most advanced performance is no longer a domain of those who want to pay the rights of hyperscalers or premium API. Models such as DeepCoder, developed code can strengthen the organization of all sizes to adjust the generation of generation and justification, solutions in their environments.

This trend can increase the most competitive and innovative ecosystem, which is managed by an introductory and more competitive and innovative ecosystem, progress open source cooperation.


[ad_2]
Source link

Leave a Reply

Your email address will not be published. Required fields are marked *