Artificial intelligences have already reached human level in many areas in demanding tests – Science

Artificial intelligence researchers at Stanford University: tests need to be redesigned to be smarter and smarter.

Artificial intelligence skills have risen to the level of an ordinary person in various tests.

Stanford University The HAI Research Institute investigates annually in their reports, what different versions of artificial intelligence are capable of. HAI researching artificial intelligence published the review for the seventh time in April.

The 500-page review says that the best artificial intelligence competes with humans in at least three areas – or are our assistants in them.

Artificial intelligence can classify images. It understands the text in English and deduces everyday things based on what it sees.

Artificial intelligence was already able to do this in the last decade, but fumbling. In the years 2022–2023, the level of artificial intelligence increased dramatically.

You can see the difference in different tests. In 2021, artificial intelligence could only solve 6.9 percent of math problems.

Stanford’s MATH test has about 12,500 math problems.

Last year, for example, Open AI’s artificial intelligence already solved 84.3 percent of math problems. A person’s skills are reflected in tests by the 90 percent limit.

In tests for example, the artificial intelligence evaluates the cat on the table.

The artificial intelligence had to be able to predict whether the cat can safely jump off the table. The artificial intelligence also had to evaluate whether the table can withstand the cat’s weight.

A person learns these things already in diapers, if he is allowed to move and try, for example, the weights of different objects.

Control of facts was tested TruthfulQAwith the help of It has 817 questions related to the health, law, economy and politics of the United States.

Language models are improving all the time. GPT–4, published at the beginning of 2024, achieved the highest scores in the comparison. They were almost three times higher than the score of GPT-2, an artificial intelligence tested in 2021.

The winner in producing the images was the artificial intelligence Midjourney, which, for example, depicted the familiar Harry Potter character from books and movies much better than in 2022.

The text-to-image models evaluate AIs in 12 key factors that are essential in everyday use. Here OpenAI’s DALLE–2 got the best score.

Of success despite this, there are also many problems with artificial intelligence. Popular large language models (LLM), for example, are still prone to “hallucinations”.

A person can easily get the models confused or furious with a few easy tricks.

In demanding mathematics and design in general, artificial intelligence still has work to do.

Stanford the university now needs new tests. They can not only better compare artificial intelligences, but also look for areas where the abilities of humans and artificial intelligences differ from each other.

We should find areas where people still have a clear head start, he says website New Atlas.

A new language model called GPT–5 is expected to be introduced by the summer. Its skills may well influence the new tests.

By Editor

Leave a Reply