ChatGPT, Gemini and Grok confidently generate dangerous medical advice half the time, study finds

While there has been a lot of debate about the use of AI for healthcare, a new study published in the medical journal BMJ Open has found that around half the advice given by popular AI chatbots is false. The study, first reported by Bloomberg, evaluated five major AI platforms to highlight the growing health risks associated with generative AI platforms.

What did the study find?

The research published this week tested ChatGPT, Gemini, Meta AI, Grok, and DeepSeek and asked each of the chatbots 10 questions across five health categories. Out of the total responses generated, the researchers found that 50 percent contained problematic medical information. Furthermore, the study noted that nearly 20 percent of the generated answers were classified as highly problematic.

The researchers from the US, Canada, and the UK also found that the AI models performed relatively well when handling closed-ended questions concerning established medical topics, such as cancer and vaccines. However, the models struggled significantly to provide safe answers for open-ended queries or complex health subjects like nutrition and stem cells.

A major concern raised in the report is the authoritative tone these models adopt despite lacking clinical judgment or the licences to issue medical diagnoses. The research noted that the AI chatbots delivered answers to the health questions with confidence and certainty even when they could not provide a complete and accurate list of medical references to support their claims.

Out of the tested chatbots across the 10 questions, the researchers say there were only two refusals to answer a question, both of which came from Meta AI.

The authors of the study point out that a major risk for the deployment of these chatbots without proper oversight and public education could lead to them amplifying the spread of misinformation.

“These systems can generate authoritative-sounding but potentially flawed responses,” the researchers explained in the report. They added that the findings “highlight important behavioural limitations and the need to reevaluate how AI chatbots are deployed in public-facing health and medical communication”.

The new study comes at a time when AI companies have been positioning their AI tools to have a bigger say in healthcare. OpenAI launched its ChatGPT Health earlier this year, which allows users to share their personal health data with the popular AI chatbot to receive more grounded results.

Meanwhile, Anthropic also launched Claude for Healthcare, which allows its paid users in the US to securely connect their medical records.

Hot topics

World

Business

Politics

Tech

Hot topics

World

Business

Politics

Tech

What did the study find?

Topics

Related Articles

Categories

Latest

Newsletter