Stanford HAI lookback: Best-read blog posts of 2024
The year 2023 saw a shift toward open-source AI models, surging investments in generative AI and increasing AI regulation.
That year also witnessed the release of 149 foundation models—the most ever, up to that point—along with private investments in AI topping $67 billion in the U.S. alone.
Meanwhile AI reached or surpassed human-level performance in many benchmarks.
Why look back at 2023 now, with 2025 nearly upon us? Because this week Stanford’s Institute for Human-Centered Artificial Intelligence—aka “HAI”—is revisiting its 10 most-read blog posts of 2024. And the 2023 findings went up in April 2024.
Despite the advances noted above, “concerns about job security, AI product safety and the need for regulatory measures are on the rise,” writes Shana Lynch, HAI’s content chief, in introducing the list. “[Y]ounger and more educated demographics [are] particularly attuned to AI’s impact on employment.”
Here are excerpts from five of the 10 items in Lynch’s Dec. 9 listicle.
1. Large language models in healthcare: Close but not yet there.
Despite the promise of LLMs in healthcare, “we have some major challenges to overcome before they can safely be integrated into clinical practice,” Lynch writes, quoting Stanford scholars. Current evaluations of LLMs, for example, “often rely on curated data rather than real-world patient information, and evaluation efforts are uneven across healthcare tasks and specialties.” More:
‘The research team recommends more rigorous, systematic assessments using real patient data and suggests leveraging human-guided AI agents to scale evaluation efforts.’
2. Much research is being written by LLMs. Too much?
Stanford’s James Zou and his team found that nearly 18% of computer science papers and 17% of peer reviews include AI-generated content, Lynch writes. The rapid adoption “underscores both the potential benefits and ethical challenges of LLMs in research,” she adds. More:
‘Zou argues for more transparency in LLM usage, noting that, while AI can enhance clarity and efficiency, researchers must remain accountable for their work to maintain integrity in the scientific process.’
3. GenAI generates erroneous medical references.
Researchers found that even the most advanced LLMs frequently hallucinate unsupported claims or cite irrelevant sources, with models like ChatGPT-4’s retrieval-augmented generation producing unsupported statements up to 30% of the time, Lynch writes. More:
‘As AI tools become increasingly common in healthcare, experts urge for more rigorous evaluation and regulation to ensure these systems provide reliable, evidence-based information.’
4. NLP helps detect mental health crises.
As mental health needs surge, Stanford medical students Akshay Swaminathan and Ivan Lopez developed a natural language processing tool called Crisis Message Detector 1 (CMD-1) to improve response times for patients in crisis. “Tested on data from mental health provider Cerebral,” Lynch reports, “CMD-1 achieved 97% accuracy in identifying urgent cases and reduced patient wait times from over 10 hours to 10 minutes.”
‘The project highlights the potential of AI to support clinicians by streamlining workflows and enhancing crisis response in healthcare settings, and underscores the importance of collaborative, interdisciplinary development to meet clinical needs effectively.’
5. Privacy in an AI era: How do we protect our personal information?
Potential for misuse, particularly with LLMs, runs the gamut from web data scraped for training to AI-driven threats like voice cloning and identity theft. To address these, Stanford HAI’s Jennifer King and Caroline Meinhardt suggest stronger regulatory frameworks are essential, Lynch notes.
‘They advocate for a shift to opt-in data sharing, a supply chain approach to data privacy, and collective solutions like data intermediaries to empower users in an era dominated by AI and vast data collection.’