Radiology Site

Mass General Brigham study explores medical advice by ChatGPT, other chatbots

April 13, 2026

Mass General Brigham study explores medical advice by ChatGPT, other chatbots

The chatbots eventually came up with the right diagnosis more than 90 percent of the time when provided with all pertinent information, the study found.

Nevertheless, some people using such large language models such as ChatGPT and Grok may act on erroneous medical advice spit out in the earlier phase, according to the head of the research team. Those people may seek out biopsies when they don’t have cancer or delay treatment for headaches that are actually signs of a stroke.

“You can’t just trust what the chatbot says,” Dr. Marc Succi, executive director of the MESH Incubator, an innovation center at MGB, said in an interview. “It’s a good start but there needs to be oversight and a human in the loop.”

By human, Succi meant a health care provider who can interview the patient, review the individual’s medical history, run diagnostic tests, narrow down the potential diagnoses to the correct one, and start treatment right away if the problem is urgent.

“Part of being a doctor is forming an initial differential and then narrowing down the possibilities with pointed questions and things like a physical exam, the proper lab test — narrow, narrow, narrow until the final diagnosis,” said Succi, an associate professor of radiology at Harvard Medical School.

The research team compared 21 general-purpose large language models, including the latest versions of ChatGPT, DeepSeek, Claude, Gemini, and Grok. Researchers gradually fed each chatbot information on 29 published medical cases involving bread-and-butter conditions ranging from heart failure to ectopic pregnancies.

The chatbots performed poorly when coming up with differential diagnoses based on age, gender, and symptoms. It was only after researchers provided the results of a physical examination and laboratory tests that the chatbots began to home in on the correct diagnosis.

“These models are great at naming a final diagnosis once the data is complete, but they struggle at the open-ended start of a case, when there isn’t much information,” said Arya Rao, lead author, MESH researcher, and MD-PhD student at Harvard Medical School.

Given that thousands of MGB patients lack primary care physicians amid a shortage of such practitioners, the state’s largest health care system last September launched an AI app called Care Connect that features a chatbot. It take inquiries 24/7, questions patients, reviews medical records, and sets up a telehealth appointment with a physician in as little as half an hour.

Dr. Rajesh Patel, vice president of digital patient experience at MGB, said the health system’s chatbot is very different from those used by people seeking diagnoses.

“The AI chatbot in Care Connect is used to perform medical intake to help expedite appointment[s] with our clinicians — it does not diagnose,” he said in a statement. “Patients always see a real clinician who diagnoses, treats, and follows up with our patients …. This model actually supports the findings of this study, that physician involvement is still key in medical decision making.”

Jonathan Saltzman can be reached at jonathan.saltzman@globe.com.

Original source: us

gx0h8