Castsearch - Context Based Spoken Document Retrieval

Lasse Lohilahti Mølgaard, Kasper Winther Jørgensen, Lars Kai Hansen

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    927 Downloads (Pure)

    Abstract

    The paper describes our work on the development of a system for retrieval of relevant stories from broadcast news. The system utilizes a combination of audio processing and text mining. The audio processing consists of a segmentation step that partitions the audio into speech and music. The speech is further segmented into speaker segments and then transcribed using an automatic speech recognition system, to yield text input for clustering using non-negative matrix factorization (NMF). We find semantic topics that are used to evaluate the performance for topic detection. Based on these topics we show that a novel query expansion can be performed to return more intelligent search results. We also show that the query expansion helps overcome errors of the automatic transcription
    Original languageEnglish
    Title of host publicationIEEE International Conference on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007.
    Volume4
    PublisherIEEE
    Publication date2007
    ISBN (Print)1-4244-0727-3
    DOIs
    Publication statusPublished - 2007
    Event2007 IEEE International Conference on Acoustics, Speech and Signal Processing - Honolulu, United States
    Duration: 15 Apr 200720 Apr 2007
    Conference number: 32

    Conference

    Conference2007 IEEE International Conference on Acoustics, Speech and Signal Processing
    Number32
    Country/TerritoryUnited States
    CityHonolulu
    Period15/04/200720/04/2007

    Bibliographical note

    Copyright: 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    Keywords

    • Nonnegative Matrix Factorization
    • Audio Retrieval
    • Text Mining
    • Document Clustering

    Fingerprint

    Dive into the research topics of 'Castsearch - Context Based Spoken Document Retrieval'. Together they form a unique fingerprint.

    Cite this