Influence of binary mask estimation errors on robust speaker identification

    Research output: Contribution to journalJournal articleResearchpeer-review

    426 Downloads (Pure)

    Abstract

    Missing-data strategies have been developed to improve the noise-robustness of automatic speech recognition systems in adverse acoustic conditions. This is achieved by classifying time-frequency (T-F) units into reliable and unreliable components, as indicated by a so-called binary mask. Different approaches have been proposed to handle unreliable feature components, each with distinct advantages. The direct masking (DM) approach attenuates unreliable T-F units in the spectral domain, which allows the extraction of conventionally used mel-frequency cepstral coefficients (MFCCs). Instead of attenuating unreliable components in the feature extraction front-end, full marginalization (FM) discards unreliable feature components in the classification back-end. Finally, bounded marginalization (BM) can be used to combine the evidence from both reliable and unreliable feature components during classification. Since each of these approaches utilizes the knowledge about reliable and unreliable feature components in a different way, they will respond differently to estimation errors in the binary mask. The goal of this study was to identify the most effective strategy to exploit knowledge about reliable and unreliable feature components in the context of automatic speaker identification (SID). A systematic evaluation under ideal and non-ideal conditions demonstrated that the robustness to errors in the binary mask varied substantially across the different missing-data strategies. Moreover, full and bounded marginalization showed complementary performances in stationary and non-stationary background noises and were subsequently combined using a simple score fusion. This approach consistently outperformed individual SID systems in all considered experimental conditions.
    Original languageEnglish
    JournalSpeech Communication
    Volume87
    Pages (from-to)40-48
    Number of pages9
    ISSN0167-6393
    DOIs
    Publication statusPublished - 2017

    Bibliographical note

    Published under a Creative Commons license

    Keywords

    • Bounded marginalization
    • Direct masking
    • Estimated binary mask
    • Full marginalization
    • Ideal binary mask
    • Missing data
    • Speaker identification

    Fingerprint

    Dive into the research topics of 'Influence of binary mask estimation errors on robust speaker identification'. Together they form a unique fingerprint.

    Cite this