TY - CHAP
T1 - Difficulty Estimation for Image-Specific Medical Image Segmentation Quality Control
AU - Fournel, Joris
AU - Bartoli, Axel
AU - Marchi, Baptiste
AU - Maurin, Arnaud
AU - Bigdeli, Siavash Arjomand
AU - Jacquier, Alexis
AU - Feragen, Aasa
PY - 2026
Y1 - 2026
N2 - In clinical decisions, trusting erroneous information can be as harmful as discarding crucial data. Without accurate quality assessment of medical image segmentation, both can occur. In current segmentation quality control, any segmentation with a Dice Similarity Coefficient (DSC) above a set threshold would be considered “good enough”, while segmentations below the threshold would be discarded. However, those global thresholds ignore input-specific factors, increasing the risk of accepting inaccurate segmentations into clinical workflows or discarding valuable information. To address this, we introduce a new paradigm for segmentation quality control: image-specific segmentation quality thresholds, based on inter-observer agreement prediction. We illustrate this on a multi-annotator COVID-19 lesion segmentation dataset. To better understand the factors that contribute to segmentation difficulty, we categorize radiomic features into four distinct groups - imaging, texture, border and geometrical - to identify factors influencing expert disagreement, finding that lesion texture and geometry were most influential. In a simulated clinical setting, our proposed ensemble regressor, using automated segmentations and uncertainty maps, achieved a 5.6% MAE when predicting the mean annotator DSC score, enhancing precision by a factor of two compared to case-invariant global thresholding. By shifting to image-specific segmentation quality levels, our approach not only reduces the likelihood of erroneous segmentations but also increases the chances of including accurate ones in clinical decision-making.
AB - In clinical decisions, trusting erroneous information can be as harmful as discarding crucial data. Without accurate quality assessment of medical image segmentation, both can occur. In current segmentation quality control, any segmentation with a Dice Similarity Coefficient (DSC) above a set threshold would be considered “good enough”, while segmentations below the threshold would be discarded. However, those global thresholds ignore input-specific factors, increasing the risk of accepting inaccurate segmentations into clinical workflows or discarding valuable information. To address this, we introduce a new paradigm for segmentation quality control: image-specific segmentation quality thresholds, based on inter-observer agreement prediction. We illustrate this on a multi-annotator COVID-19 lesion segmentation dataset. To better understand the factors that contribute to segmentation difficulty, we categorize radiomic features into four distinct groups - imaging, texture, border and geometrical - to identify factors influencing expert disagreement, finding that lesion texture and geometry were most influential. In a simulated clinical setting, our proposed ensemble regressor, using automated segmentations and uncertainty maps, achieved a 5.6% MAE when predicting the mean annotator DSC score, enhancing precision by a factor of two compared to case-invariant global thresholding. By shifting to image-specific segmentation quality levels, our approach not only reduces the likelihood of erroneous segmentations but also increases the chances of including accurate ones in clinical decision-making.
KW - Automatic Quality Control
KW - Medical Image Segmentation
U2 - 10.1007/978-3-032-05169-1_12
DO - 10.1007/978-3-032-05169-1_12
M3 - Book chapter
SN - 9783032051691
T3 - Lecture Notes in Computer Science
SP - 118
EP - 127
BT - Proceedings of 28th International Conference on Medical Image Computing and Computer Assisted Intervention
PB - Springer
T2 - 28<sup>th</sup> International Conference on Medical Image Computing and Computer Assisted Intervention
Y2 - 23 September 2025 through 27 September 2025
ER -