Fishing for meaningful units in connected speech
Publication: Research - peer-review › Article in proceedings – Annual report year: 2009
Standard
Fishing for meaningful units in connected speech. / Henrichsen, Peter Juel; Christiansen, Thomas Ulrich.
In: Proceedings of ISAAR 2009. ed. / Jörg Buchholz; Torsten Dau; Jakob Christensen-Dalsgaard; Torben Poulsen. 2009.Publication: Research - peer-review › Article in proceedings – Annual report year: 2009
Harvard
APA
CBE
MLA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Fishing for meaningful units in connected speech
A1 - Henrichsen,Peter Juel
A1 - Christiansen,Thomas Ulrich
AU - Henrichsen,Peter Juel
AU - Christiansen,Thomas Ulrich
PY - 2009
Y1 - 2009
N2 - In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine learning, with an alternative approach. Instead of stipulating a basic set of target units, the determination of the set is considered to be part of the learning task. Given 18 recordings of Danish talkers performing a simple lab task, our algorithm produced a set of acoustically well-defined units sufficient for identifying all the major semantic elements (be they parts of words, words or several words), relevant to the task. As the sound encoding used was very simple – fundamental frequency (F0), Harmonicity-to-Noise-Ratio (HNR), and Intensity samples only – the computational complexity involved was far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially be a more effective way of achieving intelligibility transfer, which is the end goal for speech transducing technologies.
AB - In many branches of spoken language analysis including ASR, the set of smallest meaningful units of speech is taken to coincide with the set of phones or phonemes. However, fishing for phones is difficult, error-prone, and computationally expensive. We present an experiment, based on machine learning, with an alternative approach. Instead of stipulating a basic set of target units, the determination of the set is considered to be part of the learning task. Given 18 recordings of Danish talkers performing a simple lab task, our algorithm produced a set of acoustically well-defined units sufficient for identifying all the major semantic elements (be they parts of words, words or several words), relevant to the task. As the sound encoding used was very simple – fundamental frequency (F0), Harmonicity-to-Noise-Ratio (HNR), and Intensity samples only – the computational complexity involved was far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially be a more effective way of achieving intelligibility transfer, which is the end goal for speech transducing technologies.
SN - 87-990013-2-2
BT - Proceedings of ISAAR 2009
T2 - Proceedings of ISAAR 2009
A2 - Poulsen,Torben
ED - Poulsen,Torben
ER -