AI enters its new phase: from ChatGPT to Claude and Gemini, which performs tasks better without human help

The race for large models (LLMs) enters a quieter and more decisive phase: executing complex tasks without human intervention.

OpenAI (ChatGPT) releases a new version and, according to measurements, takes the lead over Anthropic and Google. The key is to understand what makes each model different and where they mark real advantages.

Los benchmarks (metrics) such as METR Time Horizons, Chatbot Arena+ and Epoch AI function as radars of this evolution. They measure precision, consistency, capacity prolonged reasoning, robustness to ambiguity and performance in chained tasks. They combine human evaluations, automated tests and simulated environments where models must solve real problems, not just answer questions.

Performance metrics are obtained from standardized tests that allow models to be objectively compared. To evaluate the speed of text generation, we apply 220 instruction combinations in different scenarios, measuring production in tokens per second. The analysis integrates precision, consistency and response rate in complex tasks.

METR introduces a key indicator: how long a model can sustain operational coherence without deviating. Chatbot Arena+ crosses thousands of blind comparisons in real scenarios, prioritizing human preference. Epoch AI, on the other hand, analyzes scalability, efficiency and technical progress, detecting structural leaps beyond marketing.

According to the average of the three benchmarks, OpenAI achieves close to 92%Anthropic is located around the 89% and Google is around 86%. The differences are not abysmal, but they mark consistent advantages in complex tasks. These are not technical ties: each extra point translates into fewer errors and greater operational reliability.

The improvement cycles are no longer annual: each 6 to 9 months A clear competitive leap emerges in some of these references. Advancement is not through larger models, but through finer architectures, optimized training and better use of external tools. The value is no longer in the accumulated knowledge, but in the ability to execute and sustain results.

The three models work under subscription and their base plans are around 20 dollars monthly. The distance in front of the free versions is clear: lower reasoning capacity, more restrictions on use, limited access to new features and lower precision in complex tasks. Paying doesn’t add convenience: it enables performance.

ChatGPT-5.4

OpenAI marks the latest breakthrough with this long-awaited launch. The focus stops being on the conversation and shifts to direct execution. The model does not simply interpret language: it operates on the system, navigates interfaces and completes complex workflows. The concept of assistant is diluted; an operational agent with practical autonomy emerges.

The “Native Computer Use” function synthesizes that change. GPT-5.4 observes the screen in real time, interprets visual elements and translate instructions in concrete actions. Natural language is converted into commands executable on Windows or macOS, eliminating the friction between intention and result.

The model recognizes buttons, menus and dynamic fields as a human user would do. Control mouse and keyboard, complete forms, manage files and automate repetitive tasks. The promise is not speed, but direct replacement of manual processes that consume time and attention.

The architecture combines computer vision, pixel mapping and access to system APIs. Each action is planned based on the current state of the interface, captured in image sequences. Thus, a simple request can activate complex chains: searching for data, processing it and dumping it into documents without intervention.

Close Work 4.6

Answer from another angle. It competes not for operational control, but for cognitive depth. Introduces differentiated modes: instant responses for simple tasks and extended reasoning for complex problems. The latter deploys step-by-step thinking with transparent summaries of the process.

The model prioritizes traceability and auditability. Every decision can be explained, each conclusion is based. This positions it as a critical tool in environments where precision matters more than speed: software development, strategic analysis or validation of complex hypotheses.

Plus, Claude maintains consistency over long sessions, even with thousands of steps. Run tools in parallel, adjust strategies and validate results without losing alignment. In business automation contexts, this operational persistence makes a tangible difference compared to more reactive models.

Anthropic’s constitutional approach adds a distinctive layer. The training incorporates explicit principles based on ethical frameworks and human rights. The goal is not only to avoid errors, but to reduce systemic risks: biases, manipulation or misuse in sensitive contexts.

Gemini-3.1-Pro

It is located somewhere in between. Integrates advanced reasoning with strong multimodal capacity. Your competitive advantage appears in fluid interaction between text, image, video and structured datawhich expands the range of tasks you can tackle without relying on external integrations.

In benchmarks, Gemini excels at tasks that combine multiple formats and require fast synthesis. However, its performance in extended reasoning still lags behind Claude, while its direct execution ability does not reach the operational level proposed by GPT-5.4.

The comparison between OpenAI and Anthropic exposes two philosophies. GPT-5.4 relies on autonomous action in real environments; Claude Opus 4.6 prioritizes deep, controlled thinking. One replaces tasks, the other reduces uncertainty. Both advance, but in directions that respond to different needs.

For users with a monthly subscription, the impact is immediate. GPT-5.4 allows you to delegate complete flows: email management, data analysis, report generation. Claude raises the quality of complex decisions: debugging advanced, strategic planning, scenario evaluation with greater rigor.

The actual differential appears in hybrid tasks. Automating processes while validating each step with sound reasoning reduces errors and accelerates results. The combination of models, rather than choosing just one, is emerging as the most efficient strategy in demanding professional environments.

The market thus enters a phase of functional specialization. There is no longer a universal “best model”, but systems optimized for different types of intelligence: operational, analytical or multimodal. The competitive advantage is transferred to the user who knows when to use each one.

AI enters its new phase: from ChatGPT to Claude and Gemini, which performs tasks better without human help

ByEditor

ChatGPT-5.4

Close Work 4.6

Gemini-3.1-Pro

By Editor

Related Post

“Unrepeatable” and “too big”: this is how the Artemis II astronauts describe their trip to the Moon

Apple deletes Lebanon from its Maps?

Chimpanzees drifted into a brutal civil war in Africa

One thought on “AI enters its new phase: from ChatGPT to Claude and Gemini, which performs tasks better without human help”

Leave a Reply Cancel reply

You missed

“Unrepeatable” and “too big”: this is how the Artemis II astronauts describe their trip to the Moon

Bitter bankruptcy of a well-known Viennese sales company

Trump warns that he will not apologize to the Pope and accuses him of having “a problem with law and order”

Apple deletes Lebanon from its Maps?

The Observatorial