close
close

Anthropic Overtakes Openaai: Claude Opus 4 Coded Seven Hours Nonstop, states record SWE-bench-score and Formapes Enterprise AI


Take part in our daily and weekly newsletters to receive the latest updates and exclusive content for reporting on industry leaders. Learn more


Anthropic today published Claude Opus 4 and Claude Sonnet 4 and dramatically increased the bar for what AI can achieve without human intervention.

The company's flagship opus 4 model has concentrated on a complex open source refactoring project for almost seven hours during the tests at Rakuten, which is transformed from a quick reaction tool into a real employee that is able to deal with taglanges.

This marathon performance marks a quantum leap over the minutes of earlier attention span of earlier AI models. The technological effects are profound: AI systems can now cope with complex software engineering projects from the conception to completion, the context and the focus throughout the working day.

Anthropic claims Claude Opus 4 achieved a score of 72.5% at SWE-Bench, a strict software engineering benchmark, exceeds Openai's GPT-4.1, which achieved 54.6% at the start in April. Anthropic, as an impressive challenge, defines the performance on the increasingly overcrowded AI market.

Comparative benchmarks show that Claude 4 models (left) outperform competitors through the coding and argumentation tasks, with Claude Opus 4 achieving a score of 72.5% for the critical SWE-Bench test. (Credit: Anthropic)

Beyond the quick answers: The Revolution argumentation changes the AI

In 2025, the AI ​​industry was dramatically swirling with the argumentation of models. These systems work methodically before simulating human -like thinking processes instead of only sample adjustments against training data.

Openaai initiated this shift last December last December with its “O” series, followed by Google's Gemini 2.5 Pro with its experimental “Deep Think” function. The R1 model from Deepseek unexpectedly recorded the market share with its extraordinary problem-solving skills at a competitive price.

This pivot point signals a fundamental development in the use of people. According to the report on the usage trends for KI model consumption of POE 2025 from Poe, the use of the argumentation model rose from 2% to 10% of all AI interactions on five times in just four months. Users are increasingly viewing AI as a thought partner for complex problems and not as simple questions about the question that answer.

The proportion of argumentation news rose in early 2025 when new AI models recorded the interest of the user. (Credit: Poe)

Claude's new models differ by integrating the tool use directly into your argumentation process. This simultaneous research and rescue approach reflects the human cognition closer than earlier systems that have collected information before the analysis started. The ability to pause, to search for data and to involve new knowledge during the argumentation process creates a more natural and effective experience experience.

Dual-mode architecture compensates for the speed with depth

Anthropic has addressed a persistent friction point in the AI ​​user experience with its hybrid approach. Both Claude 4 models offer almost instant answers to simple queries and expanded thinking for complex problems. The frustrating delays of earlier argumentation models are removed, which were imposed on simple questions.

This dual mode functionality preserves the biting interactions that users expect while unlocking deeper analytical functions if necessary. The system dynamically assigns the resources based on the complexity of the task and is correct that did not achieve earlier argumentation models.

Memory is another breakthrough. Claude 4 models can extract key information from documents, create summary files and be transferred to this knowledge across the sessions in all permissions. This ability solves the “amnesia problem”, which has limited the usefulness of AI in long-term projects, in which the context must be maintained for days or weeks.

Technical implementation works similarly to how human experts develop knowledge management systems, whereby the AI ​​automatically organizes information in structured formats that are optimized for the future access. This approach enables Claude to build an increasingly refined understanding of complex areas about longer interaction periods.

The competitive landscape is intensified when the AI ​​leaders are fighting for the market share

The timing of Anthropic's announcement underlines the accelerating pace of competition in the advanced AI. Only five weeks after Openaai had launched their GPT 4.1 family, Anthropic counted with models that challenge or outperform them in important metrics. Google updated its Gemini 2.5 installation at the beginning of this month, while Meta recently published its Llama 4 models with multimodal functions and a 10 million token context window.

Each main laboratory has worked out pronounced strengths on this increasingly specialized market. Openai generally leads thinking and the integration of tools, Google in multimodal understanding, and Anthropic now claims the crown for continuing performance and professional coding applications.

The strategic effects on corporate customers are considerable. Organizations are now exposed to increasingly complex decisions, which AI systems for certain applications are to be provided without dominating a single model across all metrics. This fragmentation benefits sophisticated customers who can use specialized AI strengths, while companies that are looking for simple, uniform solutions.

Anthropic has expanded the integration of Claude into development workflows with the general publication of Claude Code. The system now supports background tasks via Github actions and integrates native into VS code and jetbrain environments, whereby the proposed code arrangements are displayed directly in development files.

Github's decision to include Claude Sonnet 4 as the basic model for a new coding agent in Github Copilot offers significant market validation. This partnership with Microsoft's development platform indicates that large technology companies diversify their AI partnerships instead of relying exclusively on individual providers.

Anthropic has added its model releases with new API functions for developers: a code design tool, an MCP -Connector, the files -Pi and formulated caching for up to an hour. These characteristics enable the creation of more sophisticated AI agents who can pass through complex workflows – its companies for the introduction of companies.

Transparency challenges arise when models become more demanding

Anthropic's research in April: “Models of argument don't always say what they think”, revealed the patterns in the way these systems convey their thinking processes. Her study showed that Claude 3.7 Sonett mentioned decisive information with which problems were solved only in 25% of cases – which raises significant questions about the transparency of the AI ​​argument.

This research shows a growing challenge: if models are able to become more opaque. The seven -hour autonomous coding session, which shows the endurance of Claude Opus 4, also shows how difficult it would be for humans to completely check such expanded argument chains.

The industry is now facing a paradox in which an increasing ability brings decreasing transparency. The treatment of this voltage requires new approaches to AI supervision, which compensate for the performance with explanation – an anthropic challenge has recognized, but has not yet been completely solved.

A future of persistent AI cooperation takes shape

The seven -hour autonomous working session of Claude Opus 4 offers an insight into the future role of AI in knowledge work. When models develop an expanded focus and improved memory, they are increasingly similar to employees as tools – able to achieve a persistent, complex work with minimal human surveillance.

This progress indicates a profound shift of the way in which companies will structure knowledge work. Tasks that once continuously needed human attention can now be delegated to AI systems that maintain the focus and context for hours or even days. The economic and organizational effects will be significant, especially in areas such as software development in which talent shortages exist and the labor costs remain high.

When Claude 4 blurred the border between human and machine intelligence, we are facing a new reality at the workplace. Our challenge is no longer whether AI can keep up with human skills, but to adapt to a future in which our most productive teammates may be more digital than human.

Leave a Comment