News You Need to Know Today
View Message in Browser

Primary care AI | Healthcare AI newsmakers | Partner news

Tuesday, March 26, 2024
Link to Twitter Link to Facebook Link to Linkedin Link to Vimeo

Northwestern Logo ●  Nabla Logo ●  UCLA Health

artificial intelligence for family medicine primary care

Large language AI put to the test for potential adopters in primary care

ChatGPT is only so-so at letting physicians know if any given clinical study is relevant to their patient rosters and, as such, deserving of a full, time-consuming read. On the other hand, the popular chatbot’s study summaries are an impressive 70% shorter than human-authored study abstracts—and ChatGPT pulls this off without sacrificing quality or accuracy and while maintaining low levels of bias.

These are the findings of researchers in family medicine and community health at the University of Kansas. Corresponding author Daniel Parente, MD, PhD, and colleagues tested the large language model’s summarization chops on 140 study abstracts published in 14 peer-reviewed journals.

Finally, while at it, the researchers developed software—“pyJournalWatch”— to help primary care providers quickly but thoughtfully review new scientific articles that might be germane to their respective practices.

The research is current in the March edition of Annals of Family Medicine. Noting that they used ChatGPT-3.5 because ChatGPT-4 was only available in beta at the time of the study, the authors offer several useful observations regardless of version. Here are five.

1. Life-critical medical decisions should for obvious reasons remain based on full, critical and thoughtful evaluation of the full text of articles in context with available evidence from meta-analyses and professional guidelines.

‘We had hoped to build a digital agent with the goal of consistently surveilling the medical literature, identifying relevant articles of interest to a given specialty, and forwarding them to a user. Chat-GPT’s inability to reliably classify the relevance of specific articles limits our ability to construct such an agent. We hope that in future iterations of LLMs, these tools will become more capable of relevance classification.’

2. The present study’s findings support previous evaluations showing ChatGPT performs reasonably well for summarizing general-interest news and other samples of nonscientific literature.

‘Contrary to our expectations that hallucinations would limit the utility of ChatGPT for abstract summarization, this occurred in only 2 of 140 abstracts and was mainly limited to small (but important) methodologic or result details. Serious inaccuracies were likewise uncommon, occurring only in a further 2 of 140 articles.’

3. ChatGPT summaries have rare but important inaccuracies that preclude them from being considered a definitive source of truth.

‘Clinicians are strongly cautioned against relying solely on ChatGPT-based summaries to understand study methods and study results, especially in high-risk situations. Likewise, we noted at least one example in which the summary introduced bias by omitting gender as a significant risk factor in a logistic regression model, whereas all other significant risk factors were reported.’

4. Large language models will continue to improve in quality.

‘We suspect that, as these models improve, summarization performance will be preserved and continue to improve. In addition, because [our] ChatGPT model was trained on pre-2022 data, it is possible that its slightly out-of-date medical knowledge decreased its ability to produce summaries or to self-assess the accuracy of its own summaries.’

5. As large language models evolve, future analyses should determine whether further iterations of the GPT language models have better performance in classifying the relevance of individual articles to various domains of medicine.

‘In our analyses, we did not provide the LLMs with any article metadata such as the journal title or author list. Future analyses might investigate how performance varies when these metadata are provided.’

Parente and co-authors conclude: “We encourage robust discussion within the family medicine research and clinical community on the responsible use of AI large language models in family medicine research and primary care practice.”

The study is available in full for free.

 

 Share on Facebook Share on Linkedin Send in Mail

The Latest from our Partners

Activeloop, TowardsAI, & Intel Disruptor Introduce Free Generative AI Certification with Focus on Biomedical Use Cases - Learn how to apply Generative AI to biomedical data to build pill recognizers, fine-tune open-source large language models for biomedical use cases and more in the free certification by Activeloop, Towards AI & Intel. Enroll today.

 Share on Facebook Share on Linkedin Send in Mail
artificial intelligence in healthcare

Industry Watcher’s Digest

Buzzworthy developments of the past few days.

  • Who pays the damages when a physician follows a care recommendation from AI and medical harm results? The question continues to swirl around U.S. healthcare like a dust devil closing in on an inflatable bounce house at a backyard birthday party. Yes, quite a lot like that, if you think about it. In Politico, the nerve-rattling medico-legal scenario gets a fresh thinking-through in an article published March 24. In any number of cases, reporter Daniel Payne points out, doctors may be liable if they use AI and liable if they don’t. And it won’t help that diagnostic AI can be positioned as analogous to GPS: “It’s up to the driver to stay on the road, no matter what instructions are given” by the device, Payne writes. “Making the situation especially thorny for doctors: They also could open themselves up to litigation if they eschew AI.” Several subject matter experts weigh in. Read the whole thing.
     
  • Here’s a somewhat related item. This one’s from the ‘You Can’t Keep Everyone Safe From Everything’ file. A growing number of elderly people are using inflatable hip airbags. Many no doubt do so at the suggestion of their doctors. The devices use sensors to trigger inflation upon detecting a fall in progress. They’re pretty good at preventing hip fractures, which is the aim for which they were invented. However, there have also been reports that the bags “have some cultural biases, for example, for people who pray on the ground. The airbags detect that as a fall and keep blowing up.” Reuters includes the example in an article on AI’s potential to “extend healthcare to all” in Europe.
     
  • On AI, Amazon is thinking big, bold and personal. As Steve Jobs and friends prophesied a computer in every home back in the ’80s—and then helped make it happen with Apple—so some lead innovators at Amazon are envisioning (and shooting to deliver) a “personal AGI for everyone.” That’s AGI, as in artificial general intelligence. The kind of AI that learns and, in effect, thinks for itself. Picture a bot like that living in your desktop computer and/or smartphone. In healthcare, for example, “Imagine a patient whose physician is automatically consulted by their AGI based on a change in some vital metrics and then care suggestions are brought,” Vishal Sharma, Amazon’s VP for AGI, tells Axios before acknowledging: “There’s more fundamental work that needs to be done.”
     
  • Of course, Amazon’s personal AGI planners will have their skeptics. First to snicker will be AI watchers who’ve been airing their doubts over ChatGPT and other iterations of generative AI. The U.K. tabloid Daily Mail rounds up musings from members of this gallery in an article posted March 25. Quoted in the piece are Prof. Gary Marcus of New York University (“We are starting to see signs that generative AI might be a dud”), GenAI vendor exec Dom Couldwell of DataStax (“This area has seen so much hype; it is growing up in public”) and others. More here.
     
  • Meanwhile GenAI for healthcare is quietly working some medical wonders. In one, a young adult used the chatbot to correctly put his doctors on the trail of a rare form of diabetes with a genetic origin. The diagnosis “basically took me from like one of these people that was like counting every single thing I eat, to just eating whatever I want,” the patient, Cooper Myers of Texas, tells a TV station in Austin.
     
  • This may not portend anything for healthcare AI. Then again, for those thinking about the technology’s future legal wranglings, it may. Tennessee has become the first state to guard musicians against bad actors who would use AI to profit off artists’ faked voices. The statute is called the ELVIS act not only for the Volunteer State’s most famous son but also for Ensuring Likeness, Voice and Image Security. A little creative naming for the old-time rock and roll fans out there. Local coverage here.
     
  • In the Caribbean, a tiny island is making a bundle off AI. The island is Anguilla, which registered .ai as its internet domain years back and now finds tech companies clamoring for a piece of that action. As reported in the Spanish newspaper El País, the British Overseas Territory humors some of these suitors and charges them for the privilege. As for the receipts: “Despite the island’s small size and population of around 16,000, domain registration revenue is significant. Estimates indicate registrations could bring in €72 million ($78.3 million) by 2025.” Get the rest.
     
  • Research news roundup:
     

 

 Share on Facebook Share on Linkedin Send in Mail

Innovate Healthcare thanks our partners for supporting our newsletters.
Sponsorship has no influence on editorial content.

Interested in reaching our audiences, contact our team

*|LIST:ADDRESSLINE|*

You received this email because you signed up for newsletters from Innovate Healthcare.
Change your preferences or unsubscribe here

Contact Us  |  Unsubscribe from all  |  Privacy Policy

© Innovate Healthcare, a TriMed Media brand
Innovate Healthcare