30.1 C
Delhi
Saturday, February 28, 2026

Can AI be trusted in emergencies? Study raises red flags on ChatGPT health

A new study published in the journal Nature Medicine has raised serious safety concerns about ChatGPT Health, a consumer-facing AI tool launched by OpenAI earlier this year.

The research found that the tool under-assessed more than half of simulated emergency medical cases, in some instances recommending routine care when urgent hospital treatment was needed.

The study, titled “ChatGPT Health performance in a structured test of triage recommendations,” was led by Dr Ashwin Ramaswamy and colleagues at the Icahn School of Medicine at Mount Sinai.

It was published online on February 23, 2026, just weeks after ChatGPT Health’s public launch on January 7.

HOW THE STUDY WAS CONDUCTED

Researchers designed 60 detailed, clinician-written medical scenarios, known as vignettes. These cases covered 21 clinical areas, including heart disease, respiratory illness, mental health crises, and metabolic disorders.

Each case was tested under 16 different conditions, resulting in 960 total responses from ChatGPT Health.

The goal was to check whether the AI could correctly recommend the level of medical urgency. For example, whether a patient should go to the emergency department immediately or seek care within a few days.

AN “INVERTED U” PATTERN OF PERFORMANCE

The findings showed what researchers described as an “inverted U-shaped” performance curve.

The AI handled moderate medical situations reasonably well. It also correctly identified many classic emergencies such as stroke and severe allergic reactions (anaphylaxis).

However, performance dropped significantly at the extremes, particularly in high-risk emergencies.

In gold-standard emergency cases, ChatGPT Health is under-triaged 52% of the time. This means it recommended less urgent care than was medically appropriate.

In several scenarios, it suggested patients seek evaluation within 24–48 hours instead of going immediately to the emergency department.

Examples included:

  • Diabetic ketoacidosis, a life-threatening complication of diabetes
  • Impending respiratory failure, which requires urgent medical intervention

In such cases, delays could potentially result in serious harm.

MENTAL HEALTH CRISIS RESPONSES WERE INCONSISTENT

The study also examined how the AI responded to mental health emergencies, including suicidal thoughts.

Researchers found that crisis intervention messages, such as directing users to call the 988 Suicide and Crisis Lifeline, were triggered inconsistently.

Surprisingly, the AI was more likely to activate crisis messaging when no specific suicide method was described, and less likely when a clear method was mentioned.

This inconsistency raised concerns about reliability in high-stakes mental health situations.

INFLUENCE OF EXTERNAL BIAS

Another important finding involved what researchers called “anchoring bias.” When scenarios included family or friends minimising symptoms, the AI was significantly more likely to recommend non-urgent care.

In these edge cases, recommendations shifted toward less urgent advice with an odds’ ratio of 11.7. Researchers said this suggests that contextual wording can strongly influence AI output.

The study found no statistically significant differences based on race, gender, or access-to-care barriers. However, the authors noted that the data did not fully rule out the possibility of meaningful disparities.

RESEARCHERS URGE CAUTION

The authors emphasised that their research was based on simulated cases and conducted at a single time point. They stressed the need for prospective, real-world validation before AI tools like ChatGPT Health are widely relied upon for medical triage.

The rapid timeline of the study, submitted January 15, accepted February 20, and published shortly after, reflects what experts say is an urgent need to assess the safety of AI systems already being used by millions of people.

OPENAI’S RESPONSE

OpenAI welcomed independent research but highlighted the study’s limitations. A company spokesperson said the findings may not reflect typical real-life usage and added that the model undergoes continuous updates and improvements.

The company also noted that ChatGPT Health is designed to provide general health information and is not intended to replace professional medical advice.

GROWING DEBATE OVER AI IN HEALTHCARE

The findings have added fuel to an ongoing debate about the readiness of large language models for direct consumer health decision-making.

Experts say AI tools can improve access to medical information, especially in areas with limited healthcare services.

However, they warn that blind spots in emergency triage could lead to delayed treatment or unnecessary harm.

As AI systems become more deeply integrated into everyday healthcare decisions, researchers say strong clinical validation, clear safety guardrails, and transparent limitations will be essential.

For now, the study serves as a reminder that while AI may assist with health information, emergency symptoms should always be evaluated by qualified medical professionals without delay.

Latest

Why are period cramps worse on the first day than on the fourth? Doctors explain

The first day of your period often feels the most painful. Doctors explain why cramps ease by the fourth day for most women.

Living with a rare disease in India: What the healthcare system doesn’t see

Living with Isaacs' syndrome, a rare neurological condition, has made experience with healthcare challenging. Despite millions affected, support is limited.

Living on borrowed time: Tremendous cost of rare diseases in India

Revathi B, an engineering student in Bengaluru, hopes to...

Gastroenterologist answers 13 common questions about eating curd: Can it cause cough and cold?

Curd, or yoghurt, offers many health benefits, and can be incorporated in our regular diet, explains Dr Manickam. 

AIIMS-trained gastroenterologist lists 5 ways to reduce risk of early-onset colon cancer: Avoid ultra-processed food…

Cancer in on the rise in people below the age of 50, and colon cancer is one of the most common diagnosis. Dr Sethi explains how to best protect oneself.  

Topics

Yash carries ‘faceless’ Kiara Advani in Toxic’s first single Tabaahi, fans livid: ‘Will they ever show actresses’ faces’

On Friday, the makers of Toxic: A Fairy Tale for Grown-Ups unveiled the poster of the first single of the film, Tabaahi.

BTS star Jungkook claims people ‘want to kill me’ in disturbing drunk live video, leaves fans worried

Though the live session by BTS after member Jungkook has been taken down, a clip from the stream is gaining massive traction online.

Lionel Messi tackled by pitch invaders in Inter Miami’s chaotic Puerto Rico friendly

A pitch invasion turned messy as Lionel Messi was knocked over during Inter Miami’s Puerto Rico friendly. The Argentine star quickly got back up, shrugged it

The Kerala Story 2 sees low opening occupancy Kerala, some screenings cancelled

The Kerala Story 2: Goes Beyond opened to low occupancy in parts of Kerala, with some screenings reportedly cancelled due to lack of audience. Advance booking o

Soldiers on the streets. What’s behind South Africa’s plan to deploy army in high-crime areas

South Africa's President Ramaphosa will deploy the army to combat organized crime and gang violence in high-crime areas.

‘What if I’m fired tomorrow?’ Techies grapple with rising home loan EMIs and mounting lifestyle costs amid job layoffs

AI layoff fears spark debate over EMIs exceeding ₹1 lakh, lifestyle costs, and housing risks; Experts advise higher down payments and financial buffers

Why are period cramps worse on the first day than on the fourth? Doctors explain

The first day of your period often feels the most painful. Doctors explain why cramps ease by the fourth day for most women.

When Paul McCartney almost quit music

A new documentary takes on the post-Beatles period when critics hated McCartney, and fans blamed him for breaking up the band.
spot_img

Related Articles

Popular Categories

spot_imgspot_img