The impact of exploiting spectro-temporal context in computational speech segregation

Thomas Bentsen*, Abigail Anne Kressner, Torsten Dau, Tobias May

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

315 Downloads (Pure)

Abstract

Computational speech segregation aims to automatically segregate speech from interfering noise, often by employing ideal binary mask estimation. Several studies have tried to exploit contextual information in speech to improve mask estimation accuracy by using two frequently-used strategies that (1) incorporate delta features and (2) employ support vector machine (SVM) based integration. In this study, two experiments were conducted. In Experiment I, the impact of exploiting spectro-temporal context using these strategies was investigated in stationary and six-talker noise. In Experiment II, the delta features were explored in detail and tested in a setup that considered novel noise segments of the six-talker noise. Computing delta features led to higher intelligibility than employing SVM based integration and intelligibility increased with the amount of spectral information exploited via the delta features. The system did not, however, generalize well to novel segments of this noise type. Measured intelligibility was subsequently compared to extended short-term objective intelligibility, hit–false alarm rate, and the amount of mask clustering. None of these objective measures alone could account for measured intelligibility. The findings may have implications for the design of speech segregation systems, and for the selection of a cost function that correlates with intelligibility.
Original languageEnglish
JournalJournal of the Acoustical Society of America
Volume143
Issue number1
Pages (from-to)248-259
ISSN0001-4966
DOIs
Publication statusPublished - 2018

Cite this