This study evaluates the performance and robustness of 22 established and newly proposed glare prediction metrics. Experimental datasets of daylight-dominated workplaces in office-like test rooms were collected from studies by seven research groups in six different locations (Argentina, Denmark, Germany, Israel, Japan and the United States). The variability in experimental setups, locations and research teams allowed reliable evaluation of the performance and robustness of glare metrics for daylight-dominated workplaces. Independent statistical methods were applied to individual datasets and also to one combined dataset to evaluate the performance and robustness of the 22 glare metrics. As performance and robustness are not established in literature, we defined performance as: (1) the ability of the metric value to describe the glare scale (evaluated by Spearman rank correlation), and (2) the ability of the metric to distinguish between disturbing and non-disturbing situations (evaluated by diagnostic receiver operating characteristic curve analysis tests). Furthermore, we defined robustness as the ability of a metric to deliver meaningful results when applied to different datasets and to fail as few as possible statistical tests. Average Spearman rank correlation coefficients in the range of 0.55–0.60 as well as average prediction rates to distinguish between disturbing and non-disturbing glare of 70–75% for several of the metrics indicate their reliability. The results also show that metrics considering the saturation effect as a main input in their equation perform better and are more robust in daylight-dominated workplaces than purely contrast-based metrics or purely empirical metrics. In this study, the daylight glare probability (DGP) delivered the highest performance amongst the tested metrics and was also found to be the most robust. Future research should aim to optimise the terms of glare equations which combine contrast and saturation effects, such as DGP, PGSV or UGR exp , to achieve metrics that also perform reliably in dimmer lighting conditions than the ones explored in this study.