Influence of binary mask estimation errors on robust speaker identification

Research output: Contribution to journalJournal articleResearchpeer-review

291 Downloads (Pure)

Abstract

Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating unreliable components in the feature extraction front-end, full marginalization (FM) discards unreliable feature components in the classification back-end. Finally, bounded marginalization (BM) can be used to combine the evidence from both reliable and unreliable feature components during classification. Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable and unreliable feature components in the context of automatic speaker identification (SID). A systematic evaluation under ideal and non-ideal conditions demonstrated that the robustness to errors in the binary mask varied substantially across the different missing-data strategies. Moreover, full and bounded marginalization showed complementary performances in stationary and non-stationary background noises and were subsequently combined using a simple score fusion. This approach consistently outperformed individual SID systems in all considered experimental conditions.
Original languageEnglish
JournalSpeech Communication
Volume87
Pages (from-to)40-48
Number of pages9
ISSN0167-6393
DOIs
Publication statusPublished - 2017

Bibliographical note

Published under a Creative Commons license

Keywords

  • Bounded marginalization
  • Direct masking
  • Estimated binary mask
  • Full marginalization
  • Ideal binary mask
  • Missing data
  • Speaker identification

Cite this