Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

Dennis Ulmer*, Jes Frellsen, Christian Hardmeier

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: EMNLP 2022
PublisherAssociation for Computational Linguistics
Publication date2022
Pages2707-2735
Publication statusPublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing - Abu Dhabi National Exhibition Centre, Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022
https://2022.emnlp.org/

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing
LocationAbu Dhabi National Exhibition Centre
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period07/12/202211/12/2022
Internet address

Fingerprint

Dive into the research topics of 'Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity'. Together they form a unique fingerprint.

Cite this