The role of temporal resolution in modulation-based speech segregation

Tobias May, Thomas Bentsen, Torsten Dau

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

137 Downloads (Pure)

Abstract

This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated
Original languageEnglish
Title of host publicationProceedings of Interspeech 2015
Number of pages5
Publication date2015
Publication statusPublished - 2015
EventINTERSPEECH 2015 : Speech beyond Speech - Dresden, Germany
Duration: 6 Sep 201510 Sep 2015

Conference

ConferenceINTERSPEECH 2015
CountryGermany
CityDresden
Period06/09/201510/09/2015

Keywords

  • Speech segregation
  • Ideal binary mask
  • Amplitude modulation spectrogram features
  • Temporal resolution

Cite this

May, T., Bentsen, T., & Dau, T. (2015). The role of temporal resolution in modulation-based speech segregation. In Proceedings of Interspeech 2015
@inproceedings{178470a56dc242888220d2038691647b,
title = "The role of temporal resolution in modulation-based speech segregation",
abstract = "This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated",
keywords = "Speech segregation, Ideal binary mask, Amplitude modulation spectrogram features, Temporal resolution",
author = "Tobias May and Thomas Bentsen and Torsten Dau",
year = "2015",
language = "English",
booktitle = "Proceedings of Interspeech 2015",

}

May, T, Bentsen, T & Dau, T 2015, The role of temporal resolution in modulation-based speech segregation. in Proceedings of Interspeech 2015. INTERSPEECH 2015 , Dresden, Germany, 06/09/2015.

The role of temporal resolution in modulation-based speech segregation. / May, Tobias; Bentsen, Thomas; Dau, Torsten.

Proceedings of Interspeech 2015. 2015.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - The role of temporal resolution in modulation-based speech segregation

AU - May, Tobias

AU - Bentsen, Thomas

AU - Dau, Torsten

PY - 2015

Y1 - 2015

N2 - This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated

AB - This study is concerned with the challenge of automatically segregating a target speech signal from interfering background noise. A computational speech segregation system is presented which exploits logarithmically-scaled amplitude modulation spectrogram (AMS) features to distinguish between speech and noise activity on the basis of individual time-frequency (T-F) units. One important parameter of the segregation system is the window duration of the analysis-synthesis stage, which determines the lower limit of modulation frequencies that can be represented but also the temporal acuity with which the segregation system can manipulate individual T-F units. To clarify the consequences of this trade-off on modulation-based speech segregation performance, the influence of the window duration was systematically investigated

KW - Speech segregation

KW - Ideal binary mask

KW - Amplitude modulation spectrogram features

KW - Temporal resolution

M3 - Article in proceedings

BT - Proceedings of Interspeech 2015

ER -

May T, Bentsen T, Dau T. The role of temporal resolution in modulation-based speech segregation. In Proceedings of Interspeech 2015. 2015