Can AI be trusted in emergencies? Study raises red flags on ChatGPT health

A new study published in the journal Nature Medicine has raised serious safety concerns about ChatGPT Health, a consumer-facing AI tool launched by OpenAI earlier this year.

The research found that the tool under-assessed more than half of simulated emergency medical cases, in some instances recommending routine care when urgent hospital treatment was needed.

The study, titled “ChatGPT Health performance in a structured test of triage recommendations,” was led by Dr Ashwin Ramaswamy and colleagues at the Icahn School of Medicine at Mount Sinai.

It was published online on February 23, 2026, just weeks after ChatGPT Health’s public launch on January 7.

HOW THE STUDY WAS CONDUCTED

Researchers designed 60 detailed, clinician-written medical scenarios, known as vignettes. These cases covered 21 clinical areas, including heart disease, respiratory illness, mental health crises, and metabolic disorders.

Each case was tested under 16 different conditions, resulting in 960 total responses from ChatGPT Health.

The goal was to check whether the AI could correctly recommend the level of medical urgency. For example, whether a patient should go to the emergency department immediately or seek care within a few days.

AN “INVERTED U” PATTERN OF PERFORMANCE

The findings showed what researchers described as an “inverted U-shaped” performance curve.

The AI handled moderate medical situations reasonably well. It also correctly identified many classic emergencies such as stroke and severe allergic reactions (anaphylaxis).

However, performance dropped significantly at the extremes, particularly in high-risk emergencies.

In gold-standard emergency cases, ChatGPT Health is under-triaged 52% of the time. This means it recommended less urgent care than was medically appropriate.

In several scenarios, it suggested patients seek evaluation within 24–48 hours instead of going immediately to the emergency department.

Examples included:

  • Diabetic ketoacidosis, a life-threatening complication of diabetes
  • Impending respiratory failure, which requires urgent medical intervention

In such cases, delays could potentially result in serious harm.

MENTAL HEALTH CRISIS RESPONSES WERE INCONSISTENT

The study also examined how the AI responded to mental health emergencies, including suicidal thoughts.

Researchers found that crisis intervention messages, such as directing users to call the 988 Suicide and Crisis Lifeline, were triggered inconsistently.

Surprisingly, the AI was more likely to activate crisis messaging when no specific suicide method was described, and less likely when a clear method was mentioned.

This inconsistency raised concerns about reliability in high-stakes mental health situations.

INFLUENCE OF EXTERNAL BIAS

Another important finding involved what researchers called “anchoring bias.” When scenarios included family or friends minimising symptoms, the AI was significantly more likely to recommend non-urgent care.

In these edge cases, recommendations shifted toward less urgent advice with an odds’ ratio of 11.7. Researchers said this suggests that contextual wording can strongly influence AI output.

The study found no statistically significant differences based on race, gender, or access-to-care barriers. However, the authors noted that the data did not fully rule out the possibility of meaningful disparities.

RESEARCHERS URGE CAUTION

The authors emphasised that their research was based on simulated cases and conducted at a single time point. They stressed the need for prospective, real-world validation before AI tools like ChatGPT Health are widely relied upon for medical triage.

The rapid timeline of the study, submitted January 15, accepted February 20, and published shortly after, reflects what experts say is an urgent need to assess the safety of AI systems already being used by millions of people.

OPENAI’S RESPONSE

OpenAI welcomed independent research but highlighted the study’s limitations. A company spokesperson said the findings may not reflect typical real-life usage and added that the model undergoes continuous updates and improvements.

The company also noted that ChatGPT Health is designed to provide general health information and is not intended to replace professional medical advice.

GROWING DEBATE OVER AI IN HEALTHCARE

The findings have added fuel to an ongoing debate about the readiness of large language models for direct consumer health decision-making.

Experts say AI tools can improve access to medical information, especially in areas with limited healthcare services.

However, they warn that blind spots in emergency triage could lead to delayed treatment or unnecessary harm.

As AI systems become more deeply integrated into everyday healthcare decisions, researchers say strong clinical validation, clear safety guardrails, and transparent limitations will be essential.

For now, the study serves as a reminder that while AI may assist with health information, emergency symptoms should always be evaluated by qualified medical professionals without delay.

Latest

My child has cold, is it safe to give her bananas? Expert says…

Experts say that the belief that bananas worsen cold and cough likely stems from their soft texture and the perception that they may increase mucus production.

New COVID variant ‘Cicada’ spreading in US: Is it a new threat? Check symptoms and state-wise details

A new COVID-19 variant called ‘Cicada’ (BA.3.2) is spreading across more than half of US states, according to health officials. While it shows immune-evadin

Brown Sugar or honey – which is better for weight loss? Expert highlights myths vs facts

Weight loss: With white sugar being often seen as the key contributor to weight gain, many fitness enthusiasts - inspired by social media - go in search of a be

Health Exclusive: Why women nutritional needs change with age: From periods, hormonal changes to menopause

Women’s nutritional needs shift as per their age mainly because of changes in hormones, metabolism, fertility, and bone health. The 20's are for building our

Is Zombie Apocalypse happening? Experts explain the medical reality behind growing ‘zombie drug’ fears

Videos from Chandigarh and Bengaluru showing alleged 'zombie drug' effects went viral on social media in March–April 2026, sparking widespread panic and specu

Topics

300/300 with no guesswork: Gurugram boy gets 100 percentile in JEE, eyes IIT and MIT

Kabir Chhillar’s journey to becoming a JEE Main topper highlights the power of smart preparation and discipline. With family support and coaching guidance in

Tim Cook net worth: How rich is Apple’s outgoing CEO after 15-year run?

Tim Cook's wealth story as he exits Apple's top spot

Sensex opens 300 points up, Nifty tops 24,400; Adani Ports, ICICI Bank gain nearly 2%

Sensex opens 300 points up, Nifty tops 24,400; Adani Ports, ICICI Bank gain nearly 2%

Who is new Apple CEO John Ternus? 25 years in one place and blank LinkedIn profile, he is insider

Apple CEO Tim Cook is stepping down. He will be succeeded by John Ternus, the chief of hardware engineering at Apple. This marks the biggest leadership shift at

Apple shares rise 1% after Tim Cook exit, John Ternus named CEO

Tim Cook exits as CEO, John Ternus steps up to lead Apple

End of an era: Tim Cook steps down as Apple CEO, read his full letter to the community here

Apple CEO Tim Cook has announced that he is stepping down from his role at the Cupertino giant after almost 15 years at the helm. Cook wrote a letter to the App

Who is John Ternus, the man set to replace Tim Cook at Apple

A fresh era for Apple as John Ternus takes over from Tim Cook

Migrant workers return to Bengal to protect their mandate amid SIR fears

People will vote to elect representatives for the 294 seats of the West Bengal Assembly in two phases on April 23 and 29. The results will be announced on May 4
spot_img

Related Articles

Popular Categories

spot_imgspot_img