Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization

Mikkel N. Schmidt, Rasmus Kongsgaard Olsson

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    1020 Downloads (Pure)

    Abstract

    We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To split the data, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the target-to-masker ratio.
    Original languageEnglish
    Title of host publicationSpoken Language Proceesing, ISCA International Conference on (INTERSPEECH)
    Publication date2007
    Publication statusPublished - 2007
    EventSpoken Language Proceesing, ISCA International Conference on (INTERSPEECH) -
    Duration: 1 Jan 2007 → …

    Conference

    ConferenceSpoken Language Proceesing, ISCA International Conference on (INTERSPEECH)
    Period01/01/2007 → …

    Cite this