ChatGPT health feature fails to detect high-risk cases

ByEditor

Mar 6, 2026

ChatGPT health features fail to detect “high-risk emergencies” or when people need immediate care, according to a new study.

Health questions are one of the most common uses of chatbots of artificial intelligence like ChatGPT, according to its creators, OpenAI. Its popularity is such that earlier this year the company introduced a new tool, ChatGPT Health, designed specifically to help people with their well-being, and the company says tens of millions of people are already using it.

But a new study suggests the system could miss important emergencies and can’t be trusted to reliably tell someone they need immediate medical attention.

“Large language models have become the first choice of medical consultation for patients, but in 2026 they are less secure at the clinical extremes, where the criteria distinguish between missed emergencies and unnecessary alarms,” said Isaac S. Kohane of Harvard Medical School, who was not involved in the research. “When millions of people use an AI system to decide if they need emergency care, the stakes are high. Independent assessment should be routine, not optional.”

The urgent need to test whether the system was secure led to an accelerated study by the Icahn School of Medicine at Mount Sinai, published in Nature Medicine.

The work arose from the recognition that ChatGPT was used for potentially crucial cases, but that research into its effectiveness was relatively limited. The gap between both factors motivated the study, according to the researchers.

“We wanted to answer a very basic but crucial question: If someone is experiencing a real medical emergency and comes to ChatGPT Health for help, will it clearly tell them to go to the ER?” said lead author and urologist Ashwin Ramaswamy. The researchers found that this was not the case, at least in enough cases to question its reliability.

The researchers found, for example, that the system’s alerts were “inverted”: the higher a person’s risk of self-harm, the less likely an alert would be triggered. This finding was “particularly worrying and surprising,” they said.

In the research, doctors created 60 scenarios covering 21 medical specialties. These ranged from relatively low-risk situations that might only require home care to actual medical emergencies. The researchers used 16 different contextual conditions, such as race and gender.

The researchers found that the tool generally handled obvious emergencies well, but was not effective enough in more than half of the cases where doctors decided the person would need urgent care. While it was effective for typical emergencies, it was less effective in detecting cases where the danger might be less immediate or obvious, they noted.

The work is published in a paper titled “Performance of ChatGPT Health in a Structured Test of Triage Recommendations,” which has been quickly published on Nature Medicine.

ChatGPT health feature fails to detect high-risk cases

ByEditor

By Editor

Related Post

Meta sued because AI glasses collected sensitive information

Drugs for obesity and diabetes reveal great potential against addictions

The father of the website asked to prohibit social networks for minors under 16 and targeted TikTok

One thought on “ChatGPT health feature fails to detect high-risk cases”

Leave a Reply Cancel reply

You missed

Meta sued because AI glasses collected sensitive information

Under fire, construction of White House Golden Ballroom expected to be approved in April

Real Madrid wins ‘in extremis’ and on the rebound in Vigo

Crosetto: "Attack on Iran outside international law but it cannot be stopped"

The Observatorial