We present a method for suppression of non-stationary noise in single channel recordings of speech. The method is based on a non-negative latent variable decomposition model for the speech and noise signals, learned directly from a noisy mixture. In non-speech regions an over complete basis is learned for the noise that is then used to jointly estimate the speech and the noise from the mixture. We compare the method to the classical spectral subtraction approach, where the noise spectrum is estimated as the average over non-speech frames. The proposed method significantly outperforms the classic approach, especially when the noise is highly non-stationary and at low signal-to-noise ratios.
|Title of host publication||Machine Learning for Signal Processing, IEEE Workshop on|
|Publication status||Published - 2008|
|Event||2008 IEEE International Workshop on Machine Learning for Signal Processing - Cancún, Mexico|
Duration: 16 Oct 2008 → 19 Oct 2008
|Workshop||2008 IEEE International Workshop on Machine Learning for Signal Processing|
|Period||16/10/2008 → 19/10/2008|