Under preparation cancer patient Because it’s the oncologist’s job to make difficult decisions. However, they don’t always remember to do it. At the University of Pennsylvania Health System, doctors are encouraged to talk about their patients’ treatments; end-of-life preferences Through artificial intelligence algorithms that predict the likelihood of death.
However, this is not a set-it-and-forget-it tool. A 2022 study found that routine technical diagnostics have degraded algorithms during the COVID-19 pandemic, worsening death predictions by 7 percentage points.
It may have had an impact on real life as well. The study’s lead author, Ravi Parikh, an oncologist at Emory University, told KFF Health News that the tool could help doctors have important discussions with patients that could potentially lead to stopping unnecessary chemotherapy. He said he had tried unsuccessfully hundreds of times to get the program to start.
He believes that some algorithms designed to enhance medicine have been weakened in this era. pandemicIt’s not just the case at Penn Medicine. “Many institutions do not regularly monitor the performance of their products,” Parikh said.
Algorithm glitches are one facet of a dilemma that computer scientists and doctors have long known, but that hospital executives and researchers are beginning to puzzle over. Artificial intelligence systems require consistent monitoring and staffing to keep them in place and operating properly.
In short, new tools require more people and more machinery to keep from messing up.
“Everyone thinks that AI will increase our access, our capabilities, and improve things like care,” said Nigam Shah, chief data scientist at Stanford Health Care. “That’s all great and good, but if it increases the cost of care by 20%, is it even possible?”
Government officials are concerned that hospitals lack the resources to implement these technologies at their own pace. “I’ve looked broadly,” FDA Commissioner Robert Califf told a recent committee on AI. “I don’t believe there is a single health system in the United States that can validate AI algorithms that are deployed in clinical health systems.”
AI is already widespread In healthcare. Algorithms are used to predict a patient’s risk of death or deterioration, suggest patient diagnoses and triage, and record and summarize visits. save doctors’ work And to Approve an insurance claim.
If technology evangelists are right, technology will become ubiquitous and profitable. Investment firm Bessemer Venture Partners has identified about 20 health-focused AI startups each on track to generate $10 million in annual revenue. The FDA has approved approximately 1,000 artificial intelligence products.
It’s difficult to evaluate whether these products work. It’s even harder to assess whether they will continue to work or whether they have developed software equivalent to engines with blown gaskets or leaks.
A recent study from Yale Medicine evaluated six “early warning systems” that alert clinicians when a patient’s condition may deteriorate rapidly. Dana Edelson, a physician at the University of Chicago and co-founder of the company that provided one of the algorithms for the study, said the supercomputer ran through the data for several days. This process was fruitful and we found significant differences in performance between the six products.
For hospitals and healthcare providers, choosing the algorithm that best fits their needs can be a challenge. The average doctor doesn’t have a supercomputer on hand, and there’s no consumer report for AI.
“We don’t have standards,” said Jesse Ehrenfeld, immediate past president of the American Medical Association. “There is nothing I can point to you today as a standard for how to evaluate, monitor, and observe the performance of algorithmic models as you deploy them, whether they are AI-enabled or not.”
Perhaps the most common AI product in the doctor’s office is something called ambient documentation, a technology-enabled assistant that listens and summarizes a patient’s examination. So far this year, Rock Health investors have tracked $353 million flowing into these document preparation companies. But Ehrenfeld said, “Currently, there is no standard by which to compare the output of these tools.”
And it’s a problem when even a small error can have fatal consequences. A team at Stanford University sought to use large-scale language models, the technology underlying popular AI tools such as ChatGPT, to summarize a patient’s medical history. They compared the results to what the doctors wrote.
“Even in the best case, the model had a 35% error rate,” said Stanford’s Shah. In medicine, “If you’re writing a summary and forget one word, like ‘fever,’ that’s a problem, right?”
In some cases, the reasons why an algorithm fails are quite logical. Its effectiveness can be compromised if the underlying data changes, for example, if a hospital switches testing providers.
But sometimes the pit yawns open for no apparent reason.
Sandy Aronson, a technology executive with Massachusetts General Brigham’s Personalized Medicine Program in Boston, said his team has developed an application aimed at helping genetic counselors find relevant literature on DNA mutations. When tested, the product suffered from “non-determinism,” meaning it had problems when asked the same questions. Asking the question multiple times over a short period of time yielded different results.
Aronson is excited about the potential for large-scale language models to summarize the knowledge of busy genetic counselors, but says “the technology needs to improve.”
What is an institution to do when indicators and standards are sparse and errors can occur for strange reasons? Invest a lot of resources. Shah said it took Stanford University eight to 10 months and 115 person hours just to audit the fairness and reliability of the two models.
Experts interviewed by KFF Health News floated the idea of artificial intelligence monitoring the artificial intelligence and some (human) data experts monitoring both. Everyone agrees that this will require organizations to invest more money, but this is a difficult challenge given the reality of hospital budgets and the limited supply of AI technology experts. is.
“It’s great to have the vision of melting icebergs to monitor models,” Shah said. “But is that really what I want? How many more people do I need?”
KFF Health News is a national newsroom that produces in-depth journalism on health issues and is one of KFF’s core operating programs and an independent source of health policy research, polling, and journalism. is.