Spectrogram inversion and potential applications for hearing research

Remi Julien Blaise Decorsière

Research output: Book/ReportPh.D. thesis

3078 Downloads (Pure)


A common way of analyzing signals in a joint time-frequency domain is found in the spectrogram, which can be interpreted as a multi-channel envelope representation of the signal. The envelope cannot fully represent a signal because it only reflects slow changes in the amplitude of a signal and lacks information regarding its fast variations, the temporal fine structure (TFS). However, the main hypothesis explored in this thesis is that a spectrogram could be a faithful representation of a signal, that is, TFS information could be recovered by across-channel comparison of envelopes. Based on this consideration, an approach for spectrogram inversion was proposed: time-domain signals were recovered from spectrograms computed using both inner hair-cell envelope (i.e., traditional half-wave rectification followed by low-pass filtering) and Hilbert envelope definitions. The high accuracy of the inversion scheme (as measured by root mean square error and spectral convergence) implies that the main hypothesis holds true for the designs chosen. Two practical applications of this result were then presented. (1) Spectrograms that are computed using the inner hair-cell (IHC) envelope definition are a reasonable model of the signal processing performed by the human cochlea. The robustness of the reconstruction from such spectrograms with regards to the properties of the cochlear model showed that, for previously documented IHC models as well as for more restrictive conditions, the TFS-related information is retained by the (modeled) cochlear processing even at high audio frequencies. (2) Using the inversion framework, it is possible to manipulate signals in the modulation domain, while preserving their long-term power spectra. Thus, this enabled the creation of mixtures of speech and noise where the signal-to-noise ratio in the envelope domain (SNRenv) was directly controlled. Behavioral measures of the intelligibility for such mixtures were compared to predictions from a model of speech intelligibility. Conditions where noise was processed led to modest intelligibility improvements for increased SNRenv, providing v
Original languageEnglish
PublisherTechnical University of Denmark, Department of Electrical Engineering
Publication statusPublished - 2013


Dive into the research topics of 'Spectrogram inversion and potential applications for hearing research'. Together they form a unique fingerprint.

Cite this