Linear Regression on Sparse Features for Single-Channel Speech Separation

Mikkel N. Schmidt, Rasmus Kongsgaard Olsson

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    239 Downloads (Pure)

    Abstract

    In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features.
    Original languageEnglish
    Title of host publicationApplications of Signal Processing to Audio and Acoustics : IEEE Workshop on (WASPAA)
    PublisherIEEE
    Publication date2007
    ISBN (Print)978-1-4244-1620-2
    DOIs
    Publication statusPublished - 2007
    EventApplications of Signal Processing to Audio and Acoustics, IEEE Workshop on (WASPAA) - New Paltz, NY, USA
    Duration: 1 Jan 2007 → …

    Conference

    ConferenceApplications of Signal Processing to Audio and Acoustics, IEEE Workshop on (WASPAA)
    CityNew Paltz, NY, USA
    Period01/01/2007 → …

    Bibliographical note

    Copyright: 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    Cite this

    Schmidt, M. N., & Olsson, R. K. (2007). Linear Regression on Sparse Features for Single-Channel Speech Separation. In Applications of Signal Processing to Audio and Acoustics: IEEE Workshop on (WASPAA) IEEE. https://doi.org/10.1109/ASPAA.2007.4393010
    Schmidt, Mikkel N. ; Olsson, Rasmus Kongsgaard. / Linear Regression on Sparse Features for Single-Channel Speech Separation. Applications of Signal Processing to Audio and Acoustics: IEEE Workshop on (WASPAA). IEEE, 2007.
    @inproceedings{3beb62ec25a847a3a5bf0cf5ad2d88f5,
    title = "Linear Regression on Sparse Features for Single-Channel Speech Separation",
    abstract = "In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features.",
    author = "Schmidt, {Mikkel N.} and Olsson, {Rasmus Kongsgaard}",
    note = "Copyright: 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE",
    year = "2007",
    doi = "10.1109/ASPAA.2007.4393010",
    language = "English",
    isbn = "978-1-4244-1620-2",
    booktitle = "Applications of Signal Processing to Audio and Acoustics",
    publisher = "IEEE",
    address = "United States",

    }

    Schmidt, MN & Olsson, RK 2007, Linear Regression on Sparse Features for Single-Channel Speech Separation. in Applications of Signal Processing to Audio and Acoustics: IEEE Workshop on (WASPAA). IEEE, Applications of Signal Processing to Audio and Acoustics, IEEE Workshop on (WASPAA), New Paltz, NY, USA, 01/01/2007. https://doi.org/10.1109/ASPAA.2007.4393010

    Linear Regression on Sparse Features for Single-Channel Speech Separation. / Schmidt, Mikkel N.; Olsson, Rasmus Kongsgaard.

    Applications of Signal Processing to Audio and Acoustics: IEEE Workshop on (WASPAA). IEEE, 2007.

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    TY - GEN

    T1 - Linear Regression on Sparse Features for Single-Channel Speech Separation

    AU - Schmidt, Mikkel N.

    AU - Olsson, Rasmus Kongsgaard

    N1 - Copyright: 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    PY - 2007

    Y1 - 2007

    N2 - In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features.

    AB - In this work we address the problem of separating multiple speakers from a single microphone recording. We formulate a linear regression model for estimating each speaker based on features derived from the mixture. The employed feature representation is a sparse, non-negative encoding of the speech mixture in terms of pre-learned speaker-dependent dictionaries. Previous work has shown that this feature representation by itself provides some degree of separation. We show that the performance is significantly improved when regression analysis is performed on the sparse, non-negative features, both compared to linear regression on spectral features and compared to separation based directly on the non-negative sparse features.

    U2 - 10.1109/ASPAA.2007.4393010

    DO - 10.1109/ASPAA.2007.4393010

    M3 - Article in proceedings

    SN - 978-1-4244-1620-2

    BT - Applications of Signal Processing to Audio and Acoustics

    PB - IEEE

    ER -

    Schmidt MN, Olsson RK. Linear Regression on Sparse Features for Single-Channel Speech Separation. In Applications of Signal Processing to Audio and Acoustics: IEEE Workshop on (WASPAA). IEEE. 2007 https://doi.org/10.1109/ASPAA.2007.4393010