Abstract
We present a monaural approach to speech segregation that estimates the ideal binary mask (IBM) by combining amplitude modulation spectrogram (AMS) features, pitch-based features and speech presence probability (SPP) features derived from noise statistics. To maintain a high mask estimation accuracy in the presence of various background noises, the system employs environment-specific segregation models and automatically selects the appropriate model for a given input signal. Furthermore, instead of classifying each timefrequency (T-F) unit independently, the a posteriori probabilities of speech and noise presence are evaluated by considering adjacent TF units. The proposed system achieves high classification accuracy.
| Original language | English |
|---|---|
| Title of host publication | 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
| Number of pages | 4 |
| Publisher | IEEE |
| Publication date | 2013 |
| DOIs | |
| Publication status | Published - 2013 |
| Event | 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics - Mohonk Mountain House , New Paltz, United States Duration: 20 Oct 2013 → 23 Oct 2013 |
Conference
| Conference | 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
|---|---|
| Location | Mohonk Mountain House |
| Country/Territory | United States |
| City | New Paltz |
| Period | 20/10/2013 → 23/10/2013 |
Keywords
- Ideal binary mask estimation
- Speech segregation
- Background noise classification
Fingerprint
Dive into the research topics of 'Environment-aware ideal binary mask estimation using monaural cues'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver