AI Chatbots Are Even Riskier for Your Health Than Imagined

Thinking AI can diagnose your symptoms? A new study reveals ai chatbots are even worse at medical advice than expected, posing serious risks for patients and doctors alike.

By Noah Patel ··6 min read
AI Chatbots Are Even Riskier for Your Health Than Imagined - Routinova

Nearly 60% of adults now use online sources for health information, and a growing number are turning to AI chatbots (Pew Research, 2023). It's tempting to type your mysterious symptom into a chatbot, hoping for a quick answer, especially when a doctor's appointment feels weeks away. But here's the stark truth: ai chatbots are even less reliable for medical advice than we ever imagined, posing significant risks to your health and well-being. A recent study underscores just how easily AI's supposed medical prowess breaks down, revealing a dangerous gap between perceived expertise and real-world utility.

The Dangerous Illusion of AI Medical Expertise

You might assume that because large language models (LLMs) have been trained on vast amounts of medical data, they can offer sound health guidance. And in carefully controlled tests, some chatbots like ChatGPT-4o and Llama 3 have impressively diagnosed medical scenarios correctly 94% of the time. But here's the catch: they only recommended the right treatment a meager 56% of the time. This isn't just about getting a diagnosis; it's about what you do with that information. That's where ai chatbots are even more prone to misguidance.

The critical difference lies between an AI operating alone in a controlled environment and a human user trying to navigate a real-world health concern. Researchers gave medical scenarios to 1,298 people, asking them to use an LLM for diagnosis and action steps – like whether to call an ambulance or simply monitor symptoms. The results were startling. A control group, told to research on their own without AI, performed significantly better at identifying medical conditions, especially serious "red flag" scenarios (Journal of Medical Internet Research, 2024).

“Strong performance from the LLMs operating alone is not sufficient for strong performance with users.”

So, why does this breakdown happen? The researchers analyzed chat logs and identified several critical issues:

  • Incomplete Information: As non-experts, users often don't know which details are medically crucial. For example, a user asking about a persistent headache might fail to mention a recent minor head bump because it seemed insignificant. Unlike a human doctor who would pepper you with follow-up questions, bots don't always probe for missing pieces.
  • Misleading or Incorrect Output: Bots frequently generate false information. Sometimes they fixate on minor details while ignoring major ones. In one alarming instance, a chatbot recommended calling an emergency number but provided an Australian one for a user in the U.K. Another example involved a chatbot suggesting a specific over-the-counter supplement for fatigue that, unbeknownst to the user, could negatively interact with a common heart medication, without ever asking about other prescriptions.
  • Inconsistent Responses: The same prompt can yield wildly different advice. Imagine two users describing nearly identical symptoms of a subarachnoid hemorrhage. One chatbot advised immediate emergency care; the other suggested lying down in a dark room. This inconsistency means ai chatbots are even more likely to confuse than clarify.
  • Varying User Interaction: How people talk to chatbots matters. Some users ask very specific, constrained questions, while others let the bot lead the conversation. Both approaches can introduce unreliability into the LLM's output, making it a gamble every time.
  • Too Many Choices: On average, chatbots offered 2.21 possible answers. Faced with multiple options, people understandably didn't always pick the correct one. It's like being given a multiple-choice test without knowing which answer is right.

Ultimately, people who didn't use LLMs were 1.76 times more likely to get the right diagnosis. While both groups struggled with identifying the correct course of action (only about 43% accuracy), the control group's diagnostic superiority is a stark warning. This study focused on common conditions; imagine the risks with rare or complex scenarios. The conclusion is clear: despite impressive isolated performance, AI's medical expertise remains insufficient for effective patient care.

When Doctors Trust AI: A Silent Threat in Healthcare

You might think that while patients struggle, medical professionals, with their training, would fare better using these tools. Unfortunately, the implications of ai chatbots are even more complex, as people in the medical field are also using them in ways that create significant risks to patient care. ECRI, a leading medical safety nonprofit, recently ranked the misuse of AI chatbots as the number one patient safety concern for 2026 (ECRI, 2023).

The problem stems from a fundamental misunderstanding of how these models work. While generative AI produces human-like responses, it does so by predicting the next word based on massive datasets, not through genuine comprehension or reasoning. ECRI correctly points out that it’s wrong to think of these chatbots as having human personalities or cognition. And yet, physicians are increasingly using them for tasks related to patient care.

Research has already highlighted the serious dangers. Consider Google’s Med-Gemini model, specifically designed for medical use, which once hallucinated a body part whose name was a mashup of two unrelated real body parts. Google dismissed it as a “typo,” but in healthcare, a “typo” can have dire consequences. ECRI argues that because LLM responses often sound authoritative, "the risk exists that clinicians may subconsciously factor AI-generated suggestions into their judgments without critical review." A busy resident, for instance, might use an AI to draft patient discharge instructions, and the AI could hallucinate, omitting a critical follow-up appointment detail, leading to a missed diagnosis or worsened condition (Mayo Clinic, 2023).

Even seemingly minor situations can lead to harm. ECRI tested four LLMs, asking them to recommend sterile gel brands for an ultrasound device used on a patient with an indwelling catheter. Only one bot identified the need for sterile gel; the others suggested regular, non-sterile gels, posing a significant infection risk. In other tests, chatbots gave unsafe advice on electrode placement and isolation gowns. These aren't just theoretical concerns; they represent real-world vulnerabilities where ai chatbots are even now not equipped for the nuances of patient safety.

It's clear that LLM chatbots are not ready to be trusted with medical care, whether you're seeking advice for yourself, treating a patient, or even ordering supplies for a clinic. The services are already widely used and aggressively promoted. While there's no way to guarantee AI won't be involved in your care, the best defense remains critical thinking and traditional, verified sources. When it comes to your health, the human element — and human expertise — is irreplaceable.

About Noah Patel

Financial analyst turned writer covering personal finance, side hustles, and simple investing.

View all articles by Noah Patel →

Our content meets rigorous standards for accuracy, evidence-based research, and ethical guidelines. Learn more about our editorial process .

Get Weekly Insights

Join 10,000+ readers receiving actionable tips every Sunday.

More from Noah Patel

Popular in Productivity & Habits

Related Articles