Why human review is key to the success of AI in health care
New study suggests process for reducing bias in AI
Artificial intelligence (AI) tools are becoming more common in health care. They can read medical images, help predict risks and monitor patient conditions from afar. But AI systems can also make mistakes — especially when the data they learn from is not balanced or does not adequately represent different groups of people.
A new study led by UC Davis Professor Courtney Lyles stresses the importance of keeping a human in the loop to review how AI makes decisions, to help reduce bias and improve safety. The study was published in Social Science and Medicine.
Lyles is the director of the UC Davis Center for Healthcare Policy and Research. She is also a co-founder and co-director of UC S.O.L.V.E Health Tech, an initiative involving researchers from UC Davis, UC Berkeley and UC San Francisco and private digital health companies.
In this Q&A, Lyles answers questions on AI use in health care and ways to detect and prevent bias. She also shares two examples of how UC Davis Health is building fairer and more reliable AI systems to serve patients and physicians.
What is this study about?
The study is a collaboration with Google and researchers at University of California and Northeastern University. We used a human-centered approach to critically assess explainable AI model to identify areas of bias. We formed a panel of experts in different fields to find potential factors driving bias in the AI interpretation.
Why can bias be a problem in AI health care systems?
Interpretation of AI models requires an understanding of the social and structural forces that shape health data.
Without this lens, AI systems may produce outputs that sound convincing but are incomplete, biased or unsafe. As AI becomes woven into everyday clinical care, we can’t rely on algorithms alone. Human expertise in combination with explainable AI tools become essential.
What is explainable AI and why is it important in evaluating AI models?
Explainable AI (XAI) is about understanding why the model made decisions the way it did. It provides insights by peeling back what AI is doing so we can understand how the model arrived at its determinations and predictions.
How would human reviewers assess bias in XAI models?
Our study has shown that a panel of experts from several disciplines can look closely at XAI model output and provide additional contextual interpretation of whether the results make sense in the real world. In the study, this panel included experts from medicine, epidemiology, behavioral science, engineering and data science.
The study also recommends including community members and patient advocates. Their lived experience offers insight that traditional experts may miss and can help ensure AI tools reflect the needs of the communities they serve.
This interdisciplinary framework shows how bringing diverse voices into the process makes AI not only more accurate, but more equitable, more trustworthy and more reliable.
How does an interdisciplinary panel assess XAI results?
When an XAI tool highlights why a model makes certain predictions, it often reveals patterns.
When reviewing XAI findings, interdisciplinary experts could then ask:
- Could this pattern be caused by differences in the dataset?
- Is this result linked to how patients interact with medical devices?
- Does this reflect a social or structural issue rather than a medical one?
This process helps uncover where AI may be relying on “shortcut features” — patterns that look meaningful but actually reflect bias in the data.
How can you turn this XAI study into real-world practice?
Our work included a case study of how this interdisciplinary panel of experts reviewed real-world XAI results from medical imaging and suggested clear next steps for research and practice.
By combining technical tools with human judgment, this approach can also be used in other cases, improving accuracy and grounding results in context. In practice, you can establish teams ahead of time to gather the right types of expertise at the AI decision-making table. This improves implementation and trust between data scientists, clinicians, patients and communities.
As AI becomes woven into everyday clinical care, we can’t rely on algorithms alone. Human expertise and explainable AI tools become essential."—Professor Courtney Lyles, director of the UC Davis Center for Healthcare Policy and Research
How do private-public partnerships shape AI development in health care?
Moving forward, we need much more intentional private-public partnerships. For example, we established UC S.O.L.V.E Health Tech for this purpose. It is focused on UC researchers partnering with the private sector to advance equity in their products. Similar to this study, we bring people together in structured ways to facilitate collaboration between sectors that are often siloed.
Industry partners are looking for academic expertise that is pragmatic, that allows them to take a step toward technology implementation in health care.
What are some additional examples of how UC Davis Health uses AI?
UC Davis Health is a national leader in the implementation of AI in many clinical areas.
First, we have a strong AI governance committee who has been developing and reviewing new AI models for several years, led by Professor Jason Adams, director of Data and Analytics Strategy.
Second, our IT leadership team is also a national leader in equitable evaluation and rollout of AI at UC Davis Health. For example, a team led by professor and director of Population Health and Accountable Care Reshma Gupta, have developed a process for reducing bias when developing and implementing AI predictive models. This process uses a model to identify patients who might be at higher risk for readmission to UC Davis. Our population health team uses this model to carefully consider patient subgroups and identify and address potential barriers at each step in the AI development and dissemination process.
Third, UC Davis Health has implemented and evaluated the use of AI Scribe for notetaking during clinical encounters.
What is AI Scribe and how does it work?
In 2024, UC Davis Health launched an AI Scribe program, which used AI to generate notes for clinicians during visits. After the patient’s approval, our doctors could start an audio-recording of their interactions with patients. The AI application summarized the discussion into standard clinical note format, saving the physician the time of tediously transcribing the visit details. We did a pilot study to assess if the AI scribe application was accurate in generating the notes.
The results from this study, led by principal biostatistician Sandra Taylor in the Department of Public Health Sciences, were just published in the Journal of Medical Informatics Research. We found that the AI-generated clinical notes were generally of high quality, with 94.7% free from significant errors.
This study also highlighted the need for clinicians to continually review the AI scribe output to catch and fix the small number of errors — again highlighting the importance of humans in the loop of these tools.
About The Center for Healthcare Policy and Research
The Center for Healthcare Policy and Research’s mission is to facilitate research, promote education, and inform policy about health and health care. The goal is to improve the health of the public by contributing new knowledge about access, delivery, cost, quality and outcomes related to health care and providing rigorous evidence to policymakers and other stakeholders. CHPR executes its mission through interdisciplinary and collaborative research; education and career development; and research synthesis and dissemination. Learn about CHPR’s weekly seminar series and research themes and projects.



