Semi-supervised source localization in reverberant environments using deep generative modeling

Michael J. Bianco, Sharon Gannot, Efren Fernandez Grande, Peter Gerstoft

Research output: Contribution to journalConference abstract in journalResearchpeer-review

9 Downloads (Pure)


We present a method for acoustic source localization in reverberant environments based on semi-supervised machine learning (ML) with deep generative models. Source localization in the presence of reverberation remains a major challenge, which recent ML techniques have shown promise in addressing. Despite often large data volumes, the number of labels available for supervised learning in reverberant environments is usually small. In semi-supervised learning, ML systems are trained using many examples with only few labels, with the goal of exploiting the natural structure of the data. We use variational autoencoders (VAEs), which are generative neural networks (NNs) that rely on explicit probabilistic representations, to model the latent distribution of reverberant acoustic data. VAEs consist of an encoder NN, which maps complex input distributions to simpler parametric distributions (e.g., Gaussian), and a decoder NN which approximates the training examples. The VAE is trained to generate the phase of relative transfer functions (RTFs) between two microphones in reverberant environments, in parallel with a DOA classifier, on both labeled and unlabeled RTF samples. The performance this VAE-based approach is compared with conventional and ML-based localization in simulated and real-world scenarios.
Original languageEnglish
JournalJournal of the Acoustical Society of America
Issue number4
Pages (from-to)2662
Number of pages1
Publication statusPublished - 2020
Event179th Meeting of the Acoustical Society of America - Acoustics Virtually Everywhere
Duration: 7 Dec 202011 Dec 2020


Conference179th Meeting of the Acoustical Society of America
LocationAcoustics Virtually Everywhere
Internet address


Dive into the research topics of 'Semi-supervised source localization in reverberant environments using deep generative modeling'. Together they form a unique fingerprint.

Cite this