By having information about the setting a user is in, a computer is able to make
decisions proactively to facilitate tasks for the user. Two approaches are taken
in this thesis to achieve more information about an audio environment. One
approach is that of classifying audio, and a new approach using pitch dynamics
is suggested. The other approach is finding structures between the mixings
of multiple sources based on an assumption of statistical independence of the
Three different audio classification tasks have been investigated. Audio classification
into three classes, music, noise and speech, using novel features based on
pitch dynamics. Within instrument classification two different harmonic models
have been compared. Finally voiced/unvoiced segmentation of popular music is
done based on MFCC’s and AR coefficients.
The structures in the mixings of multiple sources have been investigated. A fast
and computationally simple approach that compares recordings and classifies
if they are from the same audio environment have been developed, and shows
very high accuracy and the ability to synchronize recordings in the case of
recording devices which are not connected. A more general model is proposed
based on Independent Component Analysis. It is based on sequential pruning
of the parameters in the mixing matrix and a version based on a fixed source
distribution as well as a parameterized distribution is found. The parameterized
version has the advantage of modeling both sub- and super-Gaussian source
distributions allowing a much wider use of the method.
All methods uses a variety of classification models and model selection algorithms
which is a common theme of the thesis.