Structure Learning in Audio

Andreas Brinch Nielsen

    Research output: Book/ReportPh.D. thesisResearch

    282 Downloads (Pure)

    Abstract

    By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach using pitch dynamics is suggested. The other approach is finding structures between the mixings of multiple sources based on an assumption of statistical independence of the sources. Three different audio classification tasks have been investigated. Audio classification into three classes, music, noise and speech, using novel features based on pitch dynamics. Within instrument classification two different harmonic models have been compared. Finally voiced/unvoiced segmentation of popular music is done based on MFCC’s and AR coefficients. The structures in the mixings of multiple sources have been investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected. A more general model is proposed based on Independent Component Analysis. It is based on sequential pruning of the parameters in the mixing matrix and a version based on a fixed source distribution as well as a parameterized distribution is found. The parameterized version has the advantage of modeling both sub- and super-Gaussian source distributions allowing a much wider use of the method. All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis.
    Original languageEnglish
    Place of PublicationKgs. Lyngby, Denmark
    PublisherTechnical University of Denmark, DTU Informatics, Building 321
    Publication statusPublished - May 2009
    SeriesIMM-PHD-2008-208

    Cite this

    Nielsen, A. B. (2009). Structure Learning in Audio. Kgs. Lyngby, Denmark: Technical University of Denmark, DTU Informatics, Building 321. IMM-PHD-2008-208
    Nielsen, Andreas Brinch. / Structure Learning in Audio. Kgs. Lyngby, Denmark : Technical University of Denmark, DTU Informatics, Building 321, 2009. (IMM-PHD-2008-208).
    @phdthesis{5ae7b0f7fd284dfe9baa6aee449b3458,
    title = "Structure Learning in Audio",
    abstract = "By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach using pitch dynamics is suggested. The other approach is finding structures between the mixings of multiple sources based on an assumption of statistical independence of the sources. Three different audio classification tasks have been investigated. Audio classification into three classes, music, noise and speech, using novel features based on pitch dynamics. Within instrument classification two different harmonic models have been compared. Finally voiced/unvoiced segmentation of popular music is done based on MFCC’s and AR coefficients. The structures in the mixings of multiple sources have been investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected. A more general model is proposed based on Independent Component Analysis. It is based on sequential pruning of the parameters in the mixing matrix and a version based on a fixed source distribution as well as a parameterized distribution is found. The parameterized version has the advantage of modeling both sub- and super-Gaussian source distributions allowing a much wider use of the method. All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis.",
    author = "Nielsen, {Andreas Brinch}",
    year = "2009",
    month = "5",
    language = "English",
    publisher = "Technical University of Denmark, DTU Informatics, Building 321",

    }

    Nielsen, AB 2009, Structure Learning in Audio. IMM-PHD-2008-208, Technical University of Denmark, DTU Informatics, Building 321, Kgs. Lyngby, Denmark.

    Structure Learning in Audio. / Nielsen, Andreas Brinch.

    Kgs. Lyngby, Denmark : Technical University of Denmark, DTU Informatics, Building 321, 2009. (IMM-PHD-2008-208).

    Research output: Book/ReportPh.D. thesisResearch

    TY - BOOK

    T1 - Structure Learning in Audio

    AU - Nielsen, Andreas Brinch

    PY - 2009/5

    Y1 - 2009/5

    N2 - By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach using pitch dynamics is suggested. The other approach is finding structures between the mixings of multiple sources based on an assumption of statistical independence of the sources. Three different audio classification tasks have been investigated. Audio classification into three classes, music, noise and speech, using novel features based on pitch dynamics. Within instrument classification two different harmonic models have been compared. Finally voiced/unvoiced segmentation of popular music is done based on MFCC’s and AR coefficients. The structures in the mixings of multiple sources have been investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected. A more general model is proposed based on Independent Component Analysis. It is based on sequential pruning of the parameters in the mixing matrix and a version based on a fixed source distribution as well as a parameterized distribution is found. The parameterized version has the advantage of modeling both sub- and super-Gaussian source distributions allowing a much wider use of the method. All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis.

    AB - By having information about the setting a user is in, a computer is able to make decisions proactively to facilitate tasks for the user. Two approaches are taken in this thesis to achieve more information about an audio environment. One approach is that of classifying audio, and a new approach using pitch dynamics is suggested. The other approach is finding structures between the mixings of multiple sources based on an assumption of statistical independence of the sources. Three different audio classification tasks have been investigated. Audio classification into three classes, music, noise and speech, using novel features based on pitch dynamics. Within instrument classification two different harmonic models have been compared. Finally voiced/unvoiced segmentation of popular music is done based on MFCC’s and AR coefficients. The structures in the mixings of multiple sources have been investigated. A fast and computationally simple approach that compares recordings and classifies if they are from the same audio environment have been developed, and shows very high accuracy and the ability to synchronize recordings in the case of recording devices which are not connected. A more general model is proposed based on Independent Component Analysis. It is based on sequential pruning of the parameters in the mixing matrix and a version based on a fixed source distribution as well as a parameterized distribution is found. The parameterized version has the advantage of modeling both sub- and super-Gaussian source distributions allowing a much wider use of the method. All methods uses a variety of classification models and model selection algorithms which is a common theme of the thesis.

    M3 - Ph.D. thesis

    BT - Structure Learning in Audio

    PB - Technical University of Denmark, DTU Informatics, Building 321

    CY - Kgs. Lyngby, Denmark

    ER -

    Nielsen AB. Structure Learning in Audio. Kgs. Lyngby, Denmark: Technical University of Denmark, DTU Informatics, Building 321, 2009. (IMM-PHD-2008-208).