Despite the initial hype surrounding generative artificial intelligence (AI), recent research findings are casting doubt on its potential benefits in medical settings. A series of studies published by renowned academic hospitals have highlighted significant limitations of large language models (LLMs) in healthcare applications. Contrary to industry claims of time and cost savings, these studies suggest that LLMs may not deliver on their promises.
One study conducted at the University of California, San Diego, found that using an LLM to respond to patient messages did not reduce the workload of clinicians. In another study, researchers at Mount Sinai discovered that LLMs exhibited poor performance in mapping patient illnesses to diagnostic codes, a crucial task in healthcare. Most concerningly, a study at Mass General Brigham revealed that an LLM made safety errors when responding to simulated questions from cancer patients, including a potentially lethal error.
These findings raise significant concerns about the safety and efficacy of using LLMs in medical settings. While generative AI holds promise in various industries, its application in healthcare requires a cautious approach and further research to address these limitations before widespread adoption can be considered.