Artificial Intelligence May Not Be Able to Overtake Humans

New research has revealed weaknesses in one of the most successful artificial intelligence (AI) systems – a bot that plays the board game Go and can beat the world’s best human players – showing that the supposed superiority of AI over humans may not be so clear in the future. The study raises questions about whether more general AI systems may be free of vulnerabilities that could compromise their safety and reliability, and indeed their claims to being “superhuman.”

“The paper leaves a significant question mark over how to achieve the ambitious goal of building robust real-world AI agents that people can trust,” says Huan Zhang, a computer scientist at the University of Illinois Urbana-Champaign, who was quoted by Nature as saying.

The analysis, posted online as a preprint and not peer-reviewed, uses what are called adversarial attacks, which is feeding AI systems inputs designed to trick them into making mistakes, either for research or nefarious purposes. For example, certain prompts can “jailbreak” chatbots, making them provide malicious information that they were trained to suppress. In Go, two players take turns placing black and white stones on a grid to surround and capture the other player’s stones.

In 2022, researchers reported that they had trained adversarial AI bots to defeat KataGo, the best open-source AI system for playing Go, which typically beats the best human players handily. Their bots consistently found ways to beat KataGo, even though they weren’t exactly good at it overall: amateur humans could beat them. What’s more, humans could figure out the bots’ tricks and adopt them to beat KataGo.

Was this an isolated case, or did that work highlight a fundamental weakness in KataGo and, by extension, other AI systems with seemingly superhuman capabilities?

To investigate, new researchers led by Adam Gleave, CEO of FAR AI, a nonprofit research organization in Berkeley, California, and a co-author of the 2022 paper, used adversarial bots to test three ways to defend Go AIs from such attacks. The first defense was one that KataGo developers had already implemented after the 2022 attacks: feed KataGo examples of board positions involved in the attacks and have it play itself to learn how to play against those positions.

But the authors of the latest paper found that an adversarial bot could learn to beat even this updated version of KataGo, winning 91 percent of the time.

The second defense strategy the Gleave team tried was iterative: train one version of KataGo against the opposing bots, then train the attackers against the updated KataGo, and so on, for nine rounds. But this didn’t produce an unbeatable version of KataGo either.

As a third defense strategy, the researchers trained a new AI system from scratch to play Go. KataGo is based on a computational model known as a convolutional neural network (CNN). The researchers suspected that CNNs might focus too much on local details and lose sight of global patterns, so they created a Go player using an alternative neural network called a vision transformer (ViT).

But their adversary bot found a new attack that helped it win 78 percent of the time against the new ViT system. In all of these cases, the adversary bots, while capable of beating KataGo and other high-level Go-playing systems, were trained to find hidden vulnerabilities in other AIs, not to be complete strategists. “The bots are pretty weak, we beat them ourselves pretty easily,” Gleave says.

And with humans able to use adversarial bot tactics to beat expert AI systems, does it still make sense to call those systems superhuman? David Wu, a New York City computer scientist who first developed KataGo, says that strong Go AIs are “superhuman on average” but not “superhuman at worst.” Gleave says the findings could have broad implications for AI systems, including the large language models that power chatbots like ChatGPT. “The key takeaway for AI is that these vulnerabilities are going to be hard to fix,” Gleave says. “If we can’t solve the problem in a simple domain like Go, then in the near term there seems little prospect of solving similar problems like jailbreaks in ChatGPT.”

By Editor

Leave a Reply