Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

[ad_1]

Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Anthropical released Shut down 4 and Clod sonnet 4 Today, a sharp increase in the bar because AI can achieve without human intervention.

The flagship of the company Opus 4 model Attention was focused on a complex open source refactoring project for about seven hours during the test Racut – A leap that quickly transforms the AI ​​from a quick response, turns a real cooperation in the day, which is able to solve long projects in the afternoon.

This marathon performance, records a very careful quantum leap of previous AI models. Technological effects are deep: AI systems can now complete the complex software engineering projects from the concept, keep the context and focus on all working days.

Anthrop claims Shut down 4 Gained a score of 72.5% SwenzaSerious software engineering benchmark, outperforing Openai’s GPT-4.154.6% hit when it started in April. Success builds an anthropic as a giant opponent in an increasingly crowded AI market.

Comparative Etalons indicate coding and reasoning tasks along coding and reasoning tasks (credit: anthropic)

Out of response quickly: Rationale Revolution changes the AI

The AI ​​industry is dramatically bulging for substantiating models in 2025. These systems work through problems before making more human thinking processes as a human being, simply adaptation.

Openai launched this turn with him “O” series Following last December, Google Gemini 2.5 Pro with experimental “Deep thinking“Opportunity. DeepSek’s R1 model Unexpectedly, in a competitive price point seized a market share with the ability to solve the exceptional problem.

This gives the main evolutionary signal when Pivot people use the EU. According to Poe’s words Spring 2025 AI Model Tendencies In the report, the model use, increased by 2% to 10% of all AI interactions and jumped five-fold in four months. Users see AI as a thoughtful partner for complex problems, not a simple question-answer system.

The share of the uniforms in early 2025 was evaluated the new AI models as they seized the user’s interest. (Credit: Poe)

They distinguish themselves by integrating new models of Claude Using the tool to the direct justification process. The approach at the same time, the approach is more closely closer to the previous systems that collect data before starting human understanding before previous systems. The ability to take a break, search for information and include new findings during justification creates a more natural and effective problem solving experience.

Balances speed with a deep mode of architecture

Anthropic AI appealed to a continuous friction point in user practice hybrid approach. Both clods provide 4 models, both models, simple surveys and expansion of stretching thoughts for fine problems for complex problems – even eliminate nervous delays applied to simple questions.

This binary regime function protects the mixed interactions they expect when opening more depth analytical capabilities. The system dynamically separates the sources of thinking based on the complexity of the thinking, which strikes a balance that earlier thinking models.

Memory persistence stops as another progress. Claude 4 models produce basic information from documents, create summary files and can keep these knowledge between sessions when relevant permits are given. This ability is to resolve the “Amnesia problem”, which restricts the usefulness of AI in long-lasting projects where it should be stored in these days or weeks.

The technical application is similar to how human experts are developing knowledge management systems, the AI ​​is automatically informed in optimized structured formats for future search. This approach allows Closay to build an increasingly elegant concept of complex fields during extensive interaction periods.

Competitive landscape AI leaders stronger as a battle for market share

The announcement of anthropy emphasizes the progress of competition in advanced EU. Openai’s only five weeks GPT-4.1 FamilyFaced with models that challenged or exceeded anthropical basic sizes. Google has been updated Twins 2.5 row At the beginning of this month, Meta was released recently Llama 4 Model Multimodal capabilities and 10 million Token context windows.

Each major laboratory woke up to different strengths in this increasingly specialized market. Openai leads common justification and Instrument integrationExcel in Google Multimodal understandingAnd anthropic now claims the crown for continuous performance and professional Coding apps.

Strategic results for enterprise customers are important. Organizations are now facing more complex decisions for the placement of AI systems for special use of a model that dominates all sizes. This decomposition benefits advanced customers who can use the powerful aspects of the AI ​​specializing in simple, unified solutions.

Anthropic, with general release, expanded the integration of closed development work flows Clamping code. The system now supports background tasks GitHub actions and connecting with local Vs code and Jetbras Wants the attention of changes to the proposals.

GitHub’s decision to include Klod Sonnet as a base model for 4 new coding agents GitHub Copilot Provides significant market verification. With the Microsoft Development Platform, this partnership is diversifying the AI ​​partnership instead of trusting only single providers of large technology companies.

Completed model releases with new API capabilities for anthropic and developers: a code execution, MCP connector, files and close to one hour. These features allow the enterprise to create more complex AI agents that can continue with complex work flows for admission.

Transparency problems appear to be more complicated by models

Anthropin April research paper “Basic models do not always say that they think“These systems show about examples when they convey the process of thought. Their investigation was found Claude 3.7 Sonnet Important hints used to solve problems increase significant questions about only 25% of time – the transparency of the AI ​​substantiation.

This study is noteworthy: Models become even more opaque because they are more skilled. The seven-hour autonomous coding session, which demonstrates the tolerance of Clode Opus 4, also shows how difficult people will be so difficult to inspect such extended substantial chains.

Now the industry is facing a paradox that brings transparency to reduce the growing ability. This tension will require new approaches to the balance sheet with a combination of an anthropic self-explanatory, an anthropic self acknowledged, but not yet fully resolved.

Continuous AI cooperation takes the form of the future

The seven-hour autonomous work session of Klaude Opus 4 offers a look at the future role in the knowledge of the EU. As the models develop extended attention and improved memory, it is increasingly able to work more owners and complexity with minimal human control.

These progress points to a deep slip to build knowledge of organizations. Tasks that require persistent person’s attention can now be assigned to the AI ​​systems that are noticed for hours or even days or even days. Economic and organizational effects, especially the lack of talent, will be important in domains such as a program developing and high labor costs.

Claudie 4 As we mix the line between human and machine intelligence, we face a new reality in the workplace. We do not think that the EU could match human skills, but our most productive teammates could not adapt to a future that can be more digitally than a person.


[ad_2]
Source link

Leave a Reply

Your email address will not be published. Required fields are marked *