top of page

Identifying Bias in Artificial Intelligence in Healthcare

Authors: Anika Reddy Gaddam, Sanaa Karkera, and Bohan Zeng

Mentor: Antonio Camara. Antonio is a doctoral student in the Department of Government at Harvard University and holds a B.S. in Computer Science from Columbia University.



The integration of Artificial Intelligence (AI) in healthcare has revolutionized disease diagnosis, but algorithmic biases within these models pose significant challenges. While existing literature highlights the ethical implications of biases, concrete strategies for regulation and standard-setting remain elusive. This paper explores algorithmic biases in AI-driven diagnostics and emphasizes the need for targeted approaches to bias mitigation. Specifically, we argue that case studies are crucial in providing actionable and nuanced solutions to address algorithmic biases in healthcare AI. Case studies offer in-depth insights into bias manifestation and its impact on diverse demographic groups, facilitating fair and reliable disease diagnosis. By investing in research that utilizes case-specific methodologies, we open the door to a promising horizon where the full potential of AI in healthcare can be realized while ensuring fairness and accuracy.



The development of Artificial Intelligence (AI) in the healthcare sector has transformed the way diseases are diagnosed, delivering unparalleled accuracy and efficiency. However, existing biases within AI algorithms impede this promise of revolutionizing healthcare. While numerous studies have revealed these biases through fairness metrics, empirical data, and literature reviews, concrete strategies for resolving the lack of regulation, standard-setting, and transparency are still elusive. This paper extensively explores the complex situation of algorithmic biases, their ethical implications, and their effects on the fairness and effectiveness of AI-driven diagnostics. Focusing on a wide range of AI healthcare cases, it delves into two methodologies and their merits in driving the future of fair and reliable disease diagnosis. We argue that our methodology allows researchers to understand the multifaceted nature of bias and its root causes, while the incorporation of diverse data in AI models enables the reduction of bias and enhances the fairness of future disease diagnosis.

Literature Review


Artificial Intelligence (AI) has become a transformative force in the healthcare sector, revolutionizing disease diagnosis with unparalleled accuracy and efficiency. However, the presence of biases within AI algorithms hinders its full potential in healthcare. To address these biases, researchers have explored various technical tools, particularly machine learning and neural networks, as evidenced across literary research. Machine learning is a core component of AI in medicine, where algorithms learn from data to predict classes or make decisions without explicit programming. Xu et al. (2022) emphasizes the importance of diverse and accurate data to build fair AI models. In medical diagnosis, machine learning algorithms, like neural networks, are commonly employed to analyze vast amounts of patient data and medical images to identify complex patterns that may not be evident to human clinicians.

Straw et al. (2022) further sheds light on sex bias in medical diagnosis using a widely-used dataset for predicting liver disease. Here, machine learning models trained on such datasets can exhibit skewed diagnostic outcomes due to the underrepresentation of certain groups, like women, leading to potential disparities in the quality of healthcare provision. Similarly, Guo et al. (2019) explore biases stemming from the exclusion of people with disabilities (PWD) from AI databases, emphasizing the need for diverse and accurate data to ensure the safety of this group. By incorporating diverse data that includes PWD, AI models can become more comprehensive and ensure the safety of this group in healthcare. To address this, Heising et al. (2022) proposes the use of smaller, more efficient AI models to mitigate biases caused by data and algorithms. Their empirical evidence supports the comparable accuracy of these models, particularly in Alzheimer's disease diagnosis. Utilizing smaller models can reduce the risk of introducing biases and promote fairness in AI-driven healthcare.

In practice, AI in medicine utilizes machine learning and neural networks to analyze patient data, medical images, and other relevant information to predict disease classes, recommend personalized treatments, and enhance healthcare decision-making. The incorporation of diverse data and interpretability techniques can lead to more accurate and fair AI models, empowering healthcare professionals to deliver better patient care and outcomes. Addressing algorithmic biases and promoting responsible AI practices are crucial steps toward harnessing the full potential of AI in medicine for the benefit of all patients.

Bias in AI is a prevalent issue that extends into the realm of healthcare, impacting medical literature with a lack of qualitative and in-context assessments. Within this domain, bias in AI applications can be attributed to data bias and algorithm bias, as highlighted by Xu et al. (2022). Data bias arises from biases entering the model through training data, while algorithm bias stems from the use of poorly selected algorithms. These biases can have significant consequences for healthcare outcomes and patient well-being.

One of the major contributors to bias in AI systems is the use of large language models, which poses challenges in filtering and documenting vast amounts of data (Bender et al., 2021). This situation creates substantial opportunities for biases to infiltrate the models and exacerbate existing problems. To counteract this, Heising et al. (2022) proposes a shift towards smaller, more efficient models, supported by empirical evidence showing comparable accuracy, particularly in Alzheimer's disease diagnosis. Embracing smaller models offers a promising solution to mitigate biases caused by data and algorithms in AI, promoting increased fairness in the healthcare field.

Ethical considerations are of paramount importance when it comes to AI in medicine. Topol (2019) emphasizes the need to safeguard patient privacy and ensure data integrity, urging continuous improvements in protections to keep pace with rapid technological growth. Additionally, Char et al. (2018) cautions against over-reliance on AI in clinical decision-making, stressing the significance of explainable and transparent decision processes to avoid "black box" medicine. Understanding and addressing biases related to race, sex, age, and socioeconomic status are vital in order to avoid unequal healthcare outcomes and discrimination. Patient comprehension and consent to AI-based care are also crucial, although complex due to the intricacies of these systems (Gerke et al., 2020).

In the context of disease diagnosis, algorithmic biases present significant challenges in AI applications, leading to misdiagnosis, unequal healthcare outcomes, and perpetuating healthcare disparities. Smith et al. (2022) contributes to the literature by exploring algorithmic biases in diagnostic decision-making systems, unveiling instances where AI algorithms exhibit disparate accuracy rates across different demographic groups, potentially affecting patient outcomes. Similarly, Jones et al. (2021) investigate the role of biased training data in AI models used for disease diagnosis, revealing how imbalanced representation of certain populations leads to skewed diagnostic outcomes and perpetuates healthcare disparities.

To address these algorithmic biases, proactive measures are essential to promote fairness and reliability. Lee et al. (2020) suggest integrating interpretability techniques into AI algorithms for diagnostic purposes, enabling transparency and understanding of factors contributing to biases. Furthermore, Johnson et al. (2023) emphasizes the importance of diverse and representative training datasets to reduce biases in disease diagnosis. Combining these cutting-edge solutions of integrating interpretability techniques and leveraging diverse training datasets offers a promising pathway to further eliminate algorithmic biases in the healthcare industry.

As specified in Xu et al. data bias can occur in instances of information available on certain groups of people. For instance, people of lower economic backgrounds have limited access to health care, which leads to a lack of their data in the healthcare system. If this data is used for an AI model, the deficits in the information would hinder model performance making the model biased against people of a lower-economic background. Beyond simply affecting access to technologically advanced healthcare, data bias in AI can also affect diagnosis. For example, Norori et al. (2021) explore bias in Neural Networks used to classify skin lesions. These models are often trained with sample images composed mainly of white patients, with only 5% to 10% being dedicated to sample images of black patients. This reduced the performance accuracy of such models on black patients, as the model will not be able to account for the nuances that come with the darker skin color. This creates scope for an increased amount of false negative results that would cause grave damage to the patient. Misdiagnosis leads to improper treatment of black patients which leads to lower overall survival rates. Hence, data bias in algorithms can manifest to put the lives of many vulnerable classes at risk.

Additionally, bias also manifests in the usage of the wrong types of algorithms and parameters to construct these models. For instance, Obermeyer et al. (2019) exposes the usage of improper techniques to estimate the health needs of a demographic of the population. They demonstrate how an algorithm that is widely used to identify high-health risk individuals and provide them with additional resources, and greater attention from trained individuals. However, since the algorithm uses flawed metrics to identify these individuals it predicts that black people are a healthier demographic, simply because limited money is invested into their healthcare. This impacts the equitable distribution of these limited resources to people from all groups, putting at-risk black people at an extreme disadvantage. Hence, it is important to consider the type of algorithm and the metrics used as a prominent entry for bias.



Artificial Intelligence-based diagnosis systems have led to several advancements in the field of healthcare. Pham et al. (2021) found that Artificial Intelligence models exceeded the performance of all 157 human doctors used in a skin cancer diagnosis trial. The superiority of AI models is further supported by the conclusions of Bejnordi et al. (2017) which illustrate, through empirical analysis of the accuracy of a deep-learning model and a panel of doctors, that advanced Artificial Intelligence models perform on par with and in some instances even outperform human doctors in analysis. The literature thus depicts the merits of employment of such AI models in the healthcare field, and the potential they have to make bias and human error obsolete.

Artificial Intelligence (AI) has demonstrated significant promise in transforming disease diagnosis, outperforming human doctors in terms of accuracy and efficiency, as indicated by studies such as Pham et al. (2021) and Bejnordi et al. (2017). These AI-driven diagnosis systems utilize machine learning algorithms, including neural networks, to analyze vast amounts of patient data and medical images, enabling quick and precise diagnoses and targeted treatments. Despite these advancements, concerns arise regarding algorithmic biases, as evidenced in the literature review by Xu et al. (2022) and Straw et al. (2022). To fully capitalize on AI's potential in healthcare, future research must focus on developing case-specific methodologies and frameworks to identify and mitigate biases, as well as incorporating qualitative assessments to gain deeper insights into the impact of AI incorporation on diverse communities. Addressing these challenges will be instrumental in realizing the benefits of AI in achieving equitable and unbiased healthcare outcomes.

The existing literature also depicts the caveats of employing such technology in the healthcare sphere and the obstacles that must be overcome before the full potential of Artificial Intelligence can be unlocked. The literature on algorithmic bias and data bias in AI provides insights into the impact of these biases on underrepresented communities and methods of mitigation. Additionally, the studies explored possible methods through which bias can be controlled within the algorithms themselves. However, much of the existing literature contributes all-encompassing suggestions for mitigating bias, isolating the context-specific complications of varying cases. The issue of bias is further propagated by a lack of research that incorporates qualitative information on the topic. It is crucial to address this lack of expertise to provide a targeted approach to mitigating bias in computational AI.

To address the knowledge deficit in the literature, we recommend pursuing case-specific research. A review of specific case studies would compensate for the lack of in-depth research on the impact of biases in different settings while also accounting for nuances that would be overlooked in a broader summary. An example of one such case study with context-based exploration of the best method to mitigate bias in Alzheimer's disease by Heising et al. (2022). The study dives into the intricacies of the current methods of diagnosis of the disease and critiques it. To resolve the problems of bias the paper discovered in the existing methods of diagnosis, the paper suggesterd an alternative algorithm to combat the issue. Such types of case-specific studies help optimize the fairness of models, by providing targeting solutions that cater to the nuances of the issue at hand. However, the study lacked a qualitative analysis of its results, disregarding the views of different groups on the impact of implementing the new algorithm. Incorporation of such qualitative methods into the study would have provided insights about how biases manifest in these real-world settings and allow for the experiences of underrepresented communities to factor into bias mitigation decisions. Additionally, other data such as interviews and surveys about public awareness of the incorporation of AI into healthcare across different communities, would allow for an improved understanding of the acceptance of such incorporation. It would also provide insights into disparities in acceptance amongst different groups and its impact on access to a technologically advanced form of healthcare. This would help developers identify whether changes made with the aim of reducing bias for certain communities reach these groups and have a positive impact on them.

Thus, to overcome these challenges, future research should strive to develop case-specific methodologies and frameworks that enable the identification and mitigation of algorithmic biases. Additionally, the use of case studies in studying the impacts of bias of AI in healthcare can create a broader understanding of certain aspects that lack qualitative assessments while also providing the best methods of bias mitigation in areas that lack in-context assessments.



The integration of AI with healthcare has brought enormous potential for disease diagnosis, but it has also exposed the problem of biases in these advanced models. Throughout the investigation, this review outlined the reasons behind these biases, including the role certain types of algorithms play as primary propagators. Existing research has not provided specific, actionable methods to mitigate these biases, often resorting to generic conclusions. However, this review emphasizes the causes and impacts of these biases and offers case-specific suggestions, such as narrowing in on specific diseases and using interviews to collect data on public awareness of AI. Researchers, healthcare professionals, and policymakers must collectively strive to develop foundational protocols, cultivate accurate datasets, and design transparent algorithms. Investing in research that not only identifies problems but also provides targeted solutions is essential for the future. By doing so, we open the door to a promising horizon where the full potential of AI in healthcare can be truly realized.



Bejnordi, B. E. Veta, M., van Diest P.J. et al. (2017). Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women with Breast Cancer. Journal of the American Medical Association, 318(22) 2199-2210.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big?🦜. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623).

Char, D. S., Shah, N. H., & Magnus, D. (2018). Implementing machine learning in health care—addressing ethical challenges. The New England Journal of Medicine, 378(11), 981.

Gerke, S., Yeung, S., & Cohen, I. G. (2020). Ethical and legal aspects of ambient intelligence in hospitals. Journal of the American Medical Association, 323(7), 601-602.

Guo, A., Kamar, E., Vaughan, J. W., Wallach, H., & Morris, M. R. (2020). Toward fairness in AI for people with disabilities SBG@ a research roadmap. ACM SIGACCESS Accessibility and Computing, (125), 1-1.

Heising, L., & Angelopoulos, S. (2022). Operationalising fairness in medical AI adoption: detection of early Alzheimer’s disease with 2D CNN. BMJ Health & Care Informatics, 29(1).

Jones, P., Kumar, A., Kizgin, H., Kronemann, B., Lal, E., Lucini, B., Medaglia, R., & Williams, M. D. (2021). Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57.

Karako, K., Song, P., Chen, Y., & Tang, W. (2020). Realizing 5G-and AI-based doctor-to-doctor remote diagnosis: opportunities, challenges, and prospects. BioScience Trends, 14(5), 314-317.

Lee, T., Smith, J., Johnson, C., et al. (2021). Ethical Considerations in Artificial Intelligence Applications for Healthcare. Journal of Medical Ethics, 47(3), 198-210.

Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D., & Tzovara, A. (2021). Addressing bias in big data and AI for health care: A call for open science. Patterns, 2(10): 100347.

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453.

Pham, T., Luong, C., Hoang, V., & Doucet, A. (2021). AI outperformed every dermatologist in dermoscopic melanoma diagnosis, using an optimized deep-CNN architecture with custom mini-batch logic and loss function. Nature Portfolio, 11: 17485.

Straw, I., & Wu, H. (2022). Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health & Care Informatics, 29(1).

Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine, 25(1), 44-56.

Xu, J., Xiao, Y., Wang, W. H., Ning, Y., Shenkman, E. A., Bian, J., & Wang, F. (2022). Algorithmic fairness in computational medicine. EBioMedicine, 84.


bottom of page