AI chatbots provide poor answers to medical questions half the time, study finds

Onehealth RSS

AI chatbots provide poor answers to medical questions half the time, study finds

Home » AI chatbots provide poor answers to medical questions half the time, study finds

AI chatbots provide poor answers to medical questions half the time, study finds

Center for Infectious Disease Research and Policy (CIDRAP EU)

April 17, 2026

A study published in BMJ Open suggests that half of answers provided by five publicly available artificial intelligence (AI)–driven chatbots in response to medically related questions are inaccurate and incomplete.

Led by a researcher from the University of California at Los Angeles, the study involved an audit of the chatbots Gemini (Google), DeepSeek (High-Flyer), Meta AI (Meta), ChatGPT (OpenAI), and Grok (xAI).

Rapid adoption despite many flaws

In February 2025, the team asked 10 questions of each chatbot in five categories: cancer, vaccines, stem cells, nutrition, and athletic performance. Researchers also prompted the chatbots to produce scientific references. They asked 10 open- and closed-ended questions designed to resemble common information-seeking medical and health questions and information tropes found online and in academic discussion.

The probes were also developed to point models toward misinformation or advice counter to medical standards, a method increasingly used to “stress test” AI chatbots and detect behavioral vulnerabilities. The chatbots had to provide pre-defined responses, often with only one correct answer, that agreed with scientific consensus, while open-ended questions usually required them to generate several responses in list form.

Two experts from each category rated the chatbot responses as non-, somewhat, or highly problematic, or potentially harmful. Citations were scored for accuracy and completeness, and each response was given a Flesch Reading Ease score.

The chatbots “have been rapidly adopted across research, education, business, marketing and medicine,” the authors wrote. “Most interactions, however, come from non-experts using chatbots like search engines, including for everyday health and medical queries.”

Reference quality poor, incomplete

About half (49.6%) of responses were problematic, with 30% considered somewhat problematic and 19.6% deemed highly problematic. Response quality didn’t differ significantly by chatbot, but Grok generated significantly more highly problematic responses than would be expected under a random distribution. Gemini, on the other hand, produced the fewest highly problematic responses and the most non-problematic ones.

Chatbot performance was strongest in regard to questions about vaccines (mean z-score, –2.57) and cancer (–2.12) and weakest on stem cells (+1.25), athletic performance (+3.74), and nutrition (+4.35).

Chatbot responses were consistently given with confidence and certainty, with few caveats or disclaimers; of 250 total questions, only two (0.8%), on anabolic steroids and non-traditional cancer therapies, were met with refusals to answer, both from Meta AI. Reference quality was poor, with a median completeness score of 40%.

Open-ended prompts generated 40 highly problematic responses—significantly more than expected—and 51 non-problematic responses—significantly fewer than expected. The opposite was true of closed-ended prompts.

Chatbots rely on limited scientific content

Chatbot hallucinations and made-up citations precluded all chatbots from providing a 100% accurate reference list. Response readability was scored as “difficult,” or complex enough that the reader would need at least some college to understand.

By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences.

“By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences,” the authors noted. “They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments.”

In addition, chatbots also base their responses in part on Q&A forums and social media while limiting scientific content to publicly available studies, which make up only 30% to 50% of published research. “While this enhances conversational fluency, it may come at the cost of scientific accuracy,” the researchers wrote.

Study limitations are the inclusion of only five chatbots, limiting the findings’ generalizability in a rapidly evolving field. Also, real-world chatbot queries aren’t all adversarial, an approach that may have overestimated the prevalence of problematic content.

“The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields,” the researchers concluded. “Continued deployment without public education and oversight risks amplifying misinformation.”

Creator: Center for Infectious Disease Research and Policy (CIDRAP EU)

Source URL

Events and Conferences

Major One Health Conferences to Attend in July 2026

June 2, 2026

Ecosystems and Health

Can Coral Reef Survival Last Another Century?

May 29, 2026

Sustainable Practices

5 Smart Water Use Technology Actions for the Future

May 28, 2026

Most Recent

Book and Article Reviews

One Health Book Review: A Comprehensive Analysis

February 12, 2025

Careers

Value of MPH in One Health: Career Benefits and Impact

February 12, 2025

Community Stories

Malaria Control Strategies: One Health in Action

February 28, 2025

AI chatbots provide poor answers to medical questions half the time, study finds

AI chatbots provide poor answers to medical questions half the time, study finds

AI chatbots provide poor answers to medical questions half the time, study finds

April 17, 2026

Rapid adoption despite many flaws

Reference quality poor, incomplete

Chatbots rely on limited scientific content

Source URL

Related Posts

Major One Health Conferences to Attend in July 2026

Can Coral Reef Survival Last Another Century?

5 Smart Water Use Technology Actions for the Future

Most Recent

One Health Book Review: A Comprehensive Analysis

Value of MPH in One Health: Career Benefits and Impact

Malaria Control Strategies: One Health in Action

Spheres of Focus

Infectious Diseases

Climate & Disasters

Food &
Water

Natural
Resources

Built
Environments

Technology & Data

Featured Posts

Wildfire Preparedness Safety Steps That Protect Lives

Space Technology Real World Impact on Life on Earth

Trees Protect Human Health: Why Forests Matter Now

Immunization Awareness Importance: 5 Facts That Save Lives

One Health,
One Society:
Bridging Disciplines
for a Healthier World

MENU

NEWSLETTER

FOLLOW US

AI chatbots provide poor answers to medical questions half the time, study finds​

AI chatbots provide poor answers to medical questions half the time, study finds​

AI chatbots provide poor answers to medical questions half the time, study finds​

April 17, 2026

Rapid adoption despite many flaws

Reference quality poor, incomplete

Chatbots rely on limited scientific content

Source URL

Related Posts

Major One Health Conferences to Attend in July 2026

Can Coral Reef Survival Last Another Century?

5 Smart Water Use Technology Actions for the Future

Most Recent

One Health Book Review: A Comprehensive Analysis

Value of MPH in One Health: Career Benefits and Impact

Malaria Control Strategies: One Health in Action

Spheres of Focus

Infectious Diseases

Climate & Disasters

Food & Water

Natural Resources

Built Environments

Technology & Data

Featured Posts

Wildfire Preparedness Safety Steps That Protect Lives

Space Technology Real World Impact on Life on Earth

Trees Protect Human Health: Why Forests Matter Now

Immunization Awareness Importance: 5 Facts That Save Lives

NEWSLETTER

AI chatbots provide poor answers to medical questions half the time, study finds

AI chatbots provide poor answers to medical questions half the time, study finds

AI chatbots provide poor answers to medical questions half the time, study finds

Food &
Water

Natural
Resources

Built
Environments