Two methods for improving performance of an HMM and their application for gene finding

Anders Stærmose Krogh

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    A hidden Markov model for gene finding consists of submodels for coding regions, splice sites, introns, intergenic regions and possibly more. It is described how to estimate the model as a whole from labeled sequences instead of estimating the individual parts independently from subsequences. It is argued that the standard maximum likelihood estimation criterion is not optimal for training such a model. Instead of maximizing the probability of the DNA sequence, one should maximize the probability of the correct prediction. Such a criterion, called conditional maximum likelihood, is used for the gene finder `HMMgene'. A new (approximative) algorithm is described, which finds the most probable prediction summed over all paths yielding the same prediction. We show that these methods contribute significantly to the high performance of HMMgene.
    Original languageEnglish
    Title of host publicationProceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology
    Place of PublicationMenlo Par, California
    PublisherAAAI Press
    Publication date1997
    Pages179-186
    Publication statusPublished - 1997
    Event5th International Conference on Intelligent Systems for Molecular Biology - Halkidiki, Greece
    Duration: 21 Jun 199726 Jun 1997
    Conference number: 5
    http://www.aaai.org/Library/ISMB/ismb97contents.php

    Conference

    Conference5th International Conference on Intelligent Systems for Molecular Biology
    Number5
    CountryGreece
    CityHalkidiki
    Period21/06/199726/06/1997
    Internet address

    Fingerprint Dive into the research topics of 'Two methods for improving performance of an HMM and their application for gene finding'. Together they form a unique fingerprint.

    Cite this