We propose a simple Bayesian model for performing single channel speech separation using factorized source priors in a sliding window linearly transformed domain. Using a one dimensional mixture of Gaussians to model each band source leads to fast tractable inference for the source signals. Simulations with separation of a male and a female speaker using priors trained on the same speakers show comparable performance with the blind separation approach of G.-J. Jang and T.-W. Lee (see NIPS, vol.15, 2003) with a SNR improvement of 4.9 dB for both the male and female speaker. Mixing coefficients can be estimated quite precisely using ML-II, but the estimation is quite sensitive to the accuracy of the priors as opposed to the source separation quality for known mixing coefficients, which is quite insensitive to the accuracy of the priors. Finally, we discuss how to improve our approach while keeping the complexity low using machine learning and CASA (computational auditory scene analysis) approaches (Jang and Lee, 2003; Roweis, S.T., 2001; Wang, D.L. and Brown, G.J., 1999; Hu, G. and Wang, D., 2003).
|Title of host publication||Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing|
|Publication status||Published - 2004|
|Event||IEEE International Conference on Acoustics, Speech, and Signal Processing 2004 - Montreal, Quebec, Canada|
Duration: 17 May 2004 → 21 May 2004
|Conference||IEEE International Conference on Acoustics, Speech, and Signal Processing 2004|
|Period||17/05/2004 → 21/05/2004|