Abstract
The vast majority of text-independent speaker recognition systems rely on intermediate-sized vectors (i-vectors), which are compared by probabilistic linear discriminant analysis (PLDA). This paper proposes a PLDA-alike approach with restricted Boltzmann machines for i-vector based speaker recognition: two deep architectures are presented and examined, which aim at suppressing channel effects and recovering speaker-discriminative information on back-ends trained on a small dataset. Experiments are carried out on the MOBIO SRE'13 database, which is a challenging and publicly available dataset for mobile speaker recognition with limited amounts of training data. The experiments show that the proposed system outperforms the baseline i-vector/PLDA approach by relative gains of 31% on female and 9% on male speakers in terms of half total error rate.
Original language | English |
---|---|
Title of host publication | Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Number of pages | 5 |
Publisher | IEEE |
Publication date | 2016 |
DOIs | |
Publication status | Published - 2016 |
Event | 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing - Shanghai, China Duration: 20 Mar 2016 → 25 Mar 2016 Conference number: 41 |
Conference
Conference | 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing |
---|---|
Number | 41 |
Country/Territory | China |
City | Shanghai |
Period | 20/03/2016 → 25/03/2016 |
Series | I E E E International Conference on Acoustics, Speech and Signal Processing. Proceedings |
---|---|
ISSN | 1520-6149 |
Keywords
- Deep learning
- MOBIO
- PLDA-RBM
- Speaker recognition