ChatGPT surprised with accuracy in making medical decisions

A Mass General Brigham study showed that chatbot ChatGPT was about 72 percent accurate in making general clinical decisions, from making possible diagnoses to final diagnoses and care management decisions. The AI chatbot performed equally well in primary care and emergency settings across all medical specialties.

“There are no real benchmarks, but we think this result is at the level of someone fresh out of medical school. It shows us that large language models in general have the potential to augment medical practice and support clinical decision-making with impressive accuracy,” noted Brigham Mass General CEO Marc Succi.

Changes in AI technology are occurring at a rapid pace and are transforming many industries, including healthcare. However, the ability of large language models to fully assist clinical care has not yet been explored.

In this study of how large language models could be used in clinical counseling and decision-making, Succi and his team tested the hypothesis that ChatGPT could work through the entire clinical encounter with a patient and ultimately make a diagnosis.

The researchers found that overall, ChatGPT was about 72 percent accurate and 77 percent accurate in making a definitive diagnosis. ChatGPT was the lowest in making differential diagnoses – 60 percent. In making clinical management decisions, such as determining which drugs to treat a patient with after receiving the correct diagnosis, the accuracy was 68 percent.

Other notable findings of the study included that ChatGPT responses showed no gender bias and its overall performance was stable in both primary and emergency settings.

Source: Science Daily