The blind segregration of acoustic sources from a mixture of different sounds remains one of the main challenges in the computer-based analysis of audio signals. One approach to achieve this segregation is to divide the audio input into spectro-temporal segments which each are assumed to be dominated by the same source. These segments are also referred to as glimpses of the locally dominant source and can be used to reconstruct or analyze the corresponding source signal. This contribution is concerned with the source-independent segmentation of acoustic scenes by extracting glimpses based on locally observable feature contrasts between neighboring timefrequency units. The goal of this data-driven approach is to avoid source-specific assumptions and to achieve more robustness to unknown acoustic scenes as compared to class-based systems. The presented algorithm uses a combination of different acoustic features to derive a map of feature contrasts which indicates on- and offsets of acoustic sources. Areas which are enclosed by high contrasts, are assumed to exhibit consistent features and thus orignate from the same source. Such regions are then converted into spectro-temporal glimpses by applying two different image segmentation methods (graph-based superpixels and regiongrow).
|Title of host publication||Proceedings of Forum Acusticum 2020|
|Publisher||European Acoustics Association|
|Publication status||Published - 2020|
|Event||Forum Acusticum 2020 - Virtual event|
Duration: 7 Dec 2020 → 11 Dec 2020
|Conference||Forum Acusticum 2020|
|Period||07/12/2020 → 11/12/2020|