Anonymizing Speaker Voices: Easy to Imitate, Difficult to Recognize?

Jennifer Williams, Karla Pizzi, Natalia Tomashenko, Sneha Das

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

A vastly under-explored area in speech anonymization involves characterizing how different speakers perform in voice privacy tasks. In this paper, we present a deeper analysis by creating and analyzing groups of challenging speakers categorized based on their performance in two related facets of voice anonymization evaluation: (1) speaker similarity using automatic speaker verification (ASV) and (2) human perception using a large-scale A/B listening test. We group speakers into four categories (sheep, goats, lambs, and wolves) based on their anonymization properties. We present an extension of voice anonymization evaluation by identifying speakers who are easy to imitate or difficult to recognize. This knowledge is important for trustworthy anonymization evaluation, and it has the potential to influence how evaluation datasets are created from a pool of speakers. We provide further insights on speaker influence on anonymized speech between human perception and automatic speaker similarity scoring.
Original languageEnglish
Title of host publicationProceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Publication date2024
Pages12491-12495
ISBN (Print)979-8-3503-4486-8
ISBN (Electronic)979-8-3503-4485-1
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Conference

Conference2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/202419/04/2024

Keywords

  • Anonymization perception
  • Speaker characterization
  • Voice anonymization

Fingerprint

Dive into the research topics of 'Anonymizing Speaker Voices: Easy to Imitate, Difficult to Recognize?'. Together they form a unique fingerprint.

Cite this