When it comes to artificial intelligence, the hype, hope, and foreboding are suddenly everywhere. But the turbulent tech has long caused waves in health care: from IBM Watson’s failed foray into health care (and the long-held hope that AI tools may one day beat doctors at detecting cancer on medical images) to the realized problems of algorithmic racial biases.
But, behind the public fray of fanfare and failures, there’s a chaotic reality of rollouts that has largely gone untold. For years, health care systems and hospitals have grappled with inefficient and, in some cases, doomed attempts to adopt AI tools, according to a new study led by researchers at Duke University. The study, posted online as a pre-print, pulls back the curtain on these messy implementations while also mining for lessons learned. Amid the eye-opening revelations from 89 professionals involved in the rollouts at 11 health care organizations—including Duke Health, Mayo Clinic, and Kaiser Permanente—the authors assemble a practical framework that health systems can follow as they try to roll out new AI tools.
And new AI tools keep coming. Just last week, a study in JAMA Internal Medicine found that ChatGPT (version 3.5) decisively bested doctors at providing high-quality, empathetic answers to medical questions people posted on the subreddit r/AskDocs. The superior responses—as subjectively judged by a panel of three physicians with relevant medical expertise—suggest an AI chatbot such as ChatGPT could one day help doctors tackle the growing burden of responding to medical messages sent through online patient portals.
This is no small feat. The rise of patient messages is linked to high rates of physician burnout. According to the study authors, an effective AI chat tool could not only reduce this exhausting burden—offering relief to doctors and freeing them to direct their efforts elsewhere—but it could also reduce unnecessary office visits, boost patient adherence and compliance with medical guidance, and improve patient health outcomes overall. Moreover, better messaging responsiveness could improve patient equity by providing more online support for patients who are less likely to schedule appointments, such as those with mobility issues, work limitations, or fears of medical bills.
AI in reality
That all sounds great—like much of the promise of AI tools for health care. But there are some big limitations and caveats to the study that makes the real potential for this application harder than it seems. For starters, the types of questions that people ask on a Reddit forum are not necessarily representative of the ones they would ask a doctor they know and (hopefully) trust. And the quality and types of answers volunteer physicians offer to random people on the Internet may not match those they give their own patients, with whom they have an established relationship.
But, even if the core results of the study held up in real doctor-patient interactions through real patient portal message systems, there are many other steps to take before a chatbot could reach its lofty goals, according to the revelations from the Duke-led preprint study.
To save time, the AI tool must be well-integrated into a health system’s clinical applications and each doctor’s established workflow. Clinicians would likely need reliable, potentially around-the-clock technical support in case of glitches. And doctors would need to establish a balance of trust in the tool—a balance such that they don’t blindly pass along AI-generated responses to patients without review but know they won’t need to spend so much time editing responses that it nullifies the tool’s usefulness.
And after managing all of that, a health system would have to establish an evidence base that the tool is working as hoped in their particular health system. That means they’d have to develop systems and metrics to follow outcomes, like physicians’ time management and patient equity, adherence, and health outcomes.
These are heavy asks in an already complicated and cumbersome health system. As the researchers of the preprint note in their introduction:
Drawing on the Swiss Cheese Model of Pandemic Defense, every layer of the healthcare AI ecosystem currently contains large holes that make the broad diffusion of poorly performing products inevitable.
The study identified an eight-point framework based on steps in an implementation when decisions are made, whether it’s from an executive, an IT leader, or a front-line clinician. The process involves: 1) identifying and prioritizing a problem; 2) identifying how AI could potentially help; 3) developing ways to assess an AI’s outcomes and successes; 4) figuring out how to integrate it into existing workflows; 5) validating the safety, efficacy, and equity of AI in the health care system before clinical use; 6) rolling out the AI tool with communication, training, and trust building; 7) monitoring; and 8) updating or decommissioning the tool as time goes on.