Researchers find 5 popular AI chatbots give wrong medical advice half the time

Hanno Labuschagne

Journalist
Staff member
Joined
Sep 2, 2019
Messages
6,489
Reaction score
4,779
Researchers find 5 popular AI chatbots give wrong medical advice half the time

Artificial intelligence-driven chatbots are giving users problematic medical advice about half the time, according to a new study, highlighting the health risks of the technology that’s becoming increasingly integral in day-to-day life.

Researchers from the US, Canada and the UK evaluated five popular platforms — ChatGPT, Gemini, Meta AI, Grok and DeepSeek — by asking each of them 10 questions across five health categories.

[Bloomberg]
 
A quick way to Darwin oneself, it's almost like your GP's got WebMD open on his/her phone while trying to diagnose you.
 
giphy.gif
 
If you try to diagnose yourself online, you'll find that an itchy nose is the symptom of five different types of cancer, the clap, a heart murmur, and at least two STIs. LLMs are using all that info to incorrectly diagnose you.
 
When will we stop calling it AI, it's just a learning module, all it does is collate info it finds, incorrect or not, it doesn't even know what it is doing.
+1
It basically comes down to how it’s trained and how it weighs information. AI doesn’t “know” anything—it just predicts the most likely answer based on patterns and confidence. So if it’s highly confident in something, it might argue that it’s correct, even if it’s wrong. But when you push back with better evidence, it can adjust its response because that shifts the weighting of what it considers more reliable.


On the bright side, at least it’s not like Googling your symptoms and immediately being told you’ve got two weeks to live 😅. Mine’s apparently been overdue by about 7 years now.
 
+1
It basically comes down to how it’s trained and how it weighs information. AI doesn’t “know” anything—it just predicts the most likely answer based on patterns and confidence. So if it’s highly confident in something, it might argue that it’s correct, even if it’s wrong. But when you push back with better evidence, it can adjust its response because that shifts the weighting of what it considers more reliable.


On the bright side, at least it’s not like Googling your symptoms and immediately being told you’ve got two weeks to live 😅. Mine’s apparently been overdue by about 7 years now.
Yeah now you've got 50/50 chance :-P
 
+1
It basically comes down to how it’s trained and how it weighs information. AI doesn’t “know” anything—it just predicts the most likely answer based on patterns and confidence. So if it’s highly confident in something, it might argue that it’s correct, even if it’s wrong. But when you push back with better evidence, it can adjust its response because that shifts the weighting of what it considers more reliable.


On the bright side, at least it’s not like Googling your symptoms and immediately being told you’ve got two weeks to live 😅. Mine’s apparently been overdue by about 7 years now.


images
 
I've used ChatGPT for basic health related questions, like help interpreting very standard medical test results. I obviously discussed with my GP as well, but the additional context was handy.

Also, found this breakdown interesting:
Problematic responses also varied considerably by category. Chatbots performed comparatively better in vaccines (mean z-score –2.57) and cancer (–2.12) but underperformed in stem cells (+1.25), athletic performance (+3.74) and nutrition (+4.35). Vaccines and cancer are domains rife with misinformation, but the associated research is characterised by well-structured arguments, high-quality studies and frequent reinforcement of foundational concepts. These features may facilitate a model’s ability to reproduce content more accurately.
 
Researchers find 5 popular AI chatbots give wrong medical advice half the time

Artificial intelligence-driven chatbots are giving users problematic medical advice about half the time, according to a new study, highlighting the health risks of the technology that’s becoming increasingly integral in day-to-day life.

Researchers from the US, Canada and the UK evaluated five popular platforms — ChatGPT, Gemini, Meta AI, Grok and DeepSeek — by asking each of them 10 questions across five health categories.

[Bloomberg]
and what are hit rates for GPs / other first line medical providers?
 
My GP always jokes about using AI or Google lol. He said there are some good use cases, because AI can reason on a lot of data and look for patterns etc so its good at identifying things, but it doesn't have human logic or experience.
 
Consumer-optimised generative AI-driven chatbots were selected for inclusion: Gemini (2.0, Google; version available December 2024), DeepSeek (V3, High-Flyer; version available December 2024), Meta AI (Llama 3.3, Meta; version available December 2024), ChatGPT (3.5, OpenAI; version available November 2022) and Grok (2, xAI; version available August 2024). Models were treated as closed/proprietary deployments, since underlying model weights and training data were not available for evaluation. We selected these models because, at the time of our analysis, they were the most accessible and popular public-facing platforms,35 and though subscription versions may yield greater accuracy,12 19 we opted to use the free (unpaid) versions that are most often accessed by the general public.36 Specialist medical chatbots (eg, ChatDoctor) were excluded from the analysis as they are either inaccessible to non-professionals or rarely used for public health queries.

Lets see
Gemini 2.0
Deepseek V3
Llama 3.3
ChatGPT 3.5.

That is about 1.5 years out of date.
 
When will we stop calling it AI, it's just a learning module, all it does is collate info it finds, incorrect or not, it doesn't even know what it is doing.

When the venture capital funds dry up and the companies actually have to start making a profit.
 
Top
Sign up to the MyBroadband newsletter
X