
The chatbots eventually came up with the right diagnosis more than 90 percent of the time when provided with all pertinent information, the study found.
Nevertheless, some people using such large language models such as ChatGPT and Grok may act on erroneous medical advice spit out in the earlier phase, according to the head of the research team. Those people may seek out biopsies when they donโt have cancer or delay treatment for headaches that are actually signs of a stroke.
โYou canโt just trust what the chatbot says,โ Dr. Marc Succi, executive director of the MESH Incubator, an innovation center at MGB, said in an interview. โItโs a good start but there needs to be oversight and a human in the loop.โ
By human, Succi meant a health care provider who can interview the patient, review the individualโs medical history, run diagnostic tests, narrow down the potential diagnoses to the correct one, and start treatment right away if the problem is urgent.
โPart of being a doctor is forming an initial differential and then narrowing down the possibilities with pointed questions and things like a physical exam, the proper lab test โ narrow, narrow, narrow until the final diagnosis,โ said Succi, an associate professor of radiology at Harvard Medical School.
The research team compared 21 general-purpose large language models, including the latest versions of ChatGPT, DeepSeek, Claude, Gemini, and Grok. Researchers gradually fed each chatbot information on 29 published medical cases involving bread-and-butter conditions ranging from heart failure to ectopic pregnancies.
The chatbots performed poorly when coming up with differential diagnoses based on age, gender, and symptoms. It was only after researchers provided the results of a physical examination and laboratory tests that the chatbots began to home in on the correct diagnosis.
โThese models are great at naming a final diagnosis once the data is complete, but they struggle at the open-ended start of a case, when there isnโt much information,โ said Arya Rao, lead author, MESH researcher, and MD-PhD student at Harvard Medical School.
Given that thousands of MGB patients lack primary care physicians amid a shortage of such practitioners, the stateโs largest health care system last September launched an AI app called Care Connect that features a chatbot. It take inquiries 24/7, questions patients, reviews medical records, and sets up a telehealth appointment with a physician in as little as half an hour.
Dr. Rajesh Patel, vice president of digital patient experience at MGB, said the health systemโs chatbot is very different from those used by people seeking diagnoses.
โThe AI chatbot in Care Connect is used to perform medical intake to help expedite appointment[s] with our clinicians — it does not diagnose,โ he said in a statement. โPatients always see a real clinician who diagnoses, treats, and follows up with our patients …. This model actually supports the findings of this study, that physician involvement is still key in medical decision making.โ
Jonathan Saltzman can be reached at jonathan.saltzman@globe.com.
Original source: us