Artificial intelligence could not cope with problems that mathematicians knew how to solve

The world has been filled with news where artificial intelligence solves all problems – if not better, at least faster than a human.

In late May, Open AI announced that he found a better answer with artificial intelligence to a math problem by a Hungarian Paul Erdos decided 80 years ago. After succeeding, he challenged others to find a better solution.

Independent mathematicians confirmed that the artificial intelligence was successful in this.

Quietly in done in the test however, artificial intelligences could not do the same as top mathematicians.

The intelligence of the artificial intelligence did not seem to be enough when a suitable solution model was not available in the training material or on the internet, says science magazine Nature.

Four artificial intelligences were given ten mathematical problems to solve. The assignments were received from mathematicians who had solved the problem themselves but had not published the solution.

This is how we got to test the ability of artificial intelligence to solve its own problems.

To the test only AI models that are publicly available participated. Therefore, Google’s Aletheia, which is built specifically to solve mathematical problems, was absent from the game. Also out was Anthropic’s as yet unreleased Claude Mythos.

Open AI’s Chat GPT 5.5 Pro, on the other hand, participated in the competition. The other three participants were Chat GPT, Google’s Gemini and the publicly available Claude, on which teams from different universities had additionally built a control layer for the task.

For the best the result was achieved by Chat GPT tuned by the Swiss ETH University. Even that only managed to solve six out of ten problems.

After the test, ETH’s team tentatively found out why three of the tasks were left unsolved by each artificial intelligence.

In some cases, they were missing one or more “critical insights” that people had used to solve the problem, the group stated.

In other tasks, the approach was correct, but the AI ​​did not successfully cope with the details.

Artificial intelligence were again prone to hallucinating. They also left sources unreported, even though references were specifically requested.

They could quote an earlier decision verbatim, but did not give a reference to the source they were quoting.

By Editor