TY - GEN
T1 - Slicing Through Bias: Explaining Performance Gaps in Medical Image Analysis using Slice Discovery Methods
AU - Olesen, Vincent
AU - Weng, Nina
AU - Feragen, Aasa
AU - Petersen, Eike
PY - 2025
Y1 - 2025
N2 - Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.
AB - Machine learning models have achieved high overall accuracy in medical image analysis. However, performance disparities on specific patient groups pose challenges to their clinical utility, safety, and fairness. This can affect known patient groups - such as those based on sex, age, or disease subtype - as well as previously unknown and unlabeled groups. Furthermore, the root cause of such observed performance disparities is often challenging to uncover, hindering mitigation efforts. In this paper, to address these issues, we leverage Slice Discovery Methods (SDMs) to identify interpretable underperforming subsets of data and formulate hypotheses regarding the cause of observed performance disparities. We introduce a novel SDM and apply it in a case study on the classification of pneumothorax and atelectasis from chest x-rays. Our study demonstrates the effectiveness of SDMs in hypothesis formulation and yields an explanation of previously observed but unexplained performance disparities between male and female patients in widely used chest X-ray datasets and models. Our findings indicate shortcut learning in both classification tasks, through the presence of chest drains and ECG wires, respectively. Sex-based differences in the prevalence of these shortcut features appear to cause the observed classification performance gap, representing a previously underappreciated interaction between shortcut learning and model fairness analyses.
KW - Slice Discovery Methods
KW - Algorithmic Fairness
KW - Shortcut Learning
KW - Chest X-ray
KW - Model Debugging
U2 - 10.1007/978-3-031-72787-0_1
DO - 10.1007/978-3-031-72787-0_1
M3 - Article in proceedings
SN - 978-3-031-72786-3
T3 - Lecture Notes in Computer Science
SP - 3
EP - 13
BT - Proceedings of Ethics and Fairness in Medical Imaging
PB - Springer
T2 - 27th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2024
Y2 - 6 October 2024 through 10 October 2024
ER -