Tech and Telecom

AI Health Chatbots Are Not Helping Patients Make Better Decisions

Asking artificial intelligence tools about medical symptoms does not help people make better health decisions than traditional methods such as standard internet searches, according to a new study published in Nature Medicine.

The authors said the findings are significant as more people increasingly turn to AI chatbots for medical guidance, despite limited evidence that these tools offer safer or more effective advice.

How The Study Was Conducted

Researchers led by the University of Oxford Internet Institute worked with a group of doctors to design 10 medical scenarios. These ranged from mild conditions such as a common cold to severe emergencies, including brain bleeding caused by haemorrhage.

Ad Powered By Advergic
Loading ad . . .
Ad - Continue scrolling to read

The scenarios were first tested without human participants using three large language models: OpenAI’s ChatGPT 4o, Meta’s Llama 3, and Cohere’s Command R+. The models correctly identified the medical condition in 94.9% of cases, but selected the correct next step, such as seeking urgent care, in only 56.3% of cases. The companies did not respond to requests for comment.

Human Use of AI Shows No Advantage

The researchers then recruited 1,298 participants in Britain and asked them to assess symptoms using either AI tools, their own experience, standard internet searches, or the National Health Service website.

When participants made decisions themselves, relevant medical conditions were identified in fewer than 34.5% of cases. The correct course of action was chosen in less than 44.2% of cases, a result no better than participants using traditional information sources.

Poor AI to Human Interaction

Adam Mahdi, a co-author of the study and associate professor at Oxford, said the results highlighted a significant gap between AI’s technical ability and its effectiveness when used by people.

He said the knowledge exists within AI systems, but that information does not consistently translate into useful guidance during real-world interactions, indicating the need for further research.

Where AI and Humans Go Wrong

The team reviewed around 30 interactions in detail and found that users often provided incomplete or inaccurate symptom descriptions. At the same time, AI systems sometimes produced misleading or incorrect responses.

In one example, a patient describing symptoms of a subarachnoid haemorrhage was correctly advised to go to the hospital after mentioning a stiff neck, light sensitivity, and the “worst headache ever.” Another patient describing similar symptoms but using the phrase “terrible headache” was instead advised to lie down in a dark room.

Researchers plan to conduct similar studies across different countries and languages to assess whether AI performance changes over time or in different settings.

The study received support from data company Prolific, the Dieter Schwarz Stiftung, and the UK and US governments.

Share
Published by
Afaq Wajdan Malik