Spectro-Temporal Analysis of Speech for Spanish Phoneme Recognition

Sara Sharifzadeh, Javier Serrano, Jordi Carrabina

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    State of the art speech recognition systems (ASR), mostly use Mel-Frequency cepstral coefficients (MFCC), as acoustic features. In this paper, we propose a new discriminative analysis of acoustic features, based on spectrogram analysis. Both spectral and temporal variations of speech signal are considered. This has improved the recognition performance especially in case of noisy situation and phonemes with time domain modulations such as stops. In this method, the 2D Discrete Cosine Transform (DCT) is applied on small overlapped 2D Hamming windowed patches of spectrogram of Spanish phonemes and enhanced by means of bi-cubic interpolation. An adaptive strategy is proposed for the size of patches over the time to construct unique length vectors for different phonemes. These vectors are classified based on K-nearest neighbor (KNN) and linear discriminative analysis (LDA) and reduced rank LDA (RLDA). Experimental results demonstrate improvement in recognition performance for noisy speech signals and stops.
    Original languageEnglish
    Title of host publicationProceedings : IWSSIP 2012, 11-13 April 2012, Vienna, Austria
    Publication date2012
    Pages566-569
    ISBN (Print)978-3-200-02588-2
    Publication statusPublished - 2012
    Event19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012) - Vienna, Austria
    Duration: 11 Apr 201213 Apr 2012
    http://www.iwssip2012.com/index.php?id=59

    Conference

    Conference19th International Conference on Systems, Signals and Image Processing (IWSSIP 2012)
    Country/TerritoryAustria
    CityVienna
    Period11/04/201213/04/2012
    Internet address

    Bibliographical note

    Best Student Paper in the Field of Speech processing.

    Fingerprint

    Dive into the research topics of 'Spectro-Temporal Analysis of Speech for Spanish Phoneme Recognition'. Together they form a unique fingerprint.

    Cite this