A robust acoustic localization model will be presented, which is based on the supervised learning of azimuth-dependent binaural feature maps consisting of interaural time differences (ITD) and interaural level differences (ILD). Motivated by the robust localization performance of the human auditory system, the associated peripheral stage is used in this study as a front-end for binaural cue extraction. Multi-conditional training is performed to take into account the variability of the binaural features which results from the combination of multiple sources, the effect of reverberation and changes in the source/receiver configuration. One way of accumulating evidence of possible sound source locations is to combine information across auditory channels. Alternatively, integrating evidence across groups of time-frequency (T-F) units, so called fragments, which are believed to belong to a single source, was reported to significantly improve ITD-based localization performance [Christensen et al., Proc. of Interspeech, 2769-2772 (2007)]. Instead of accumulating the localization cue directly, the proposed model combines likelihoods, taking into account the uncertainty which is associated with the azimuth estimate of a particular T-F unit. Various procedures of controlling the spectro-temporal integration will be discussed and the influence on sound source localization will be presented.
|Title of host publication||Proceedings of ISAAR 2009 : Binaural Processing and Spatial Hearing.|
|Publication status||Published - 2009|
|Event||2nd International Symposium on Auditory and Audiological Research: Binaural Processing and Spatial Hearing - Marienlyst, Helsingør, Denmark|
Duration: 26 Aug 2009 → 28 Aug 2009
|Conference||2nd International Symposium on Auditory and Audiological Research|
|Period||26/08/2009 → 28/08/2009|