Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Sara Sterlie*, Nina Weng, Aasa Feragen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

42 Downloads (Pure)

Abstract

Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models. Our code is public at https://github.com/sterlie/fairness-criteria-LLM.
Original languageEnglish
Title of host publicationProceedings of the Fairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing (FAILED) 2024
Number of pages27
PublisherSpringer
Publication statusAccepted/In press - 2025
EventFairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing: Workshop at ECCV 2024 - Milano, Italy
Duration: 29 Sept 202429 Sept 2024

Workshop

WorkshopFairness and ethics towards transparent AI: facing the chalLEnge through model Debiasing
Country/TerritoryItaly
CityMilano
Period29/09/202429/09/2024

Fingerprint

Dive into the research topics of 'Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria'. Together they form a unique fingerprint.

Cite this