Improved ontology for eukaryotic single-exon coding sequences in biological databases

Roddy Jorquera, Carolina González, Philip Clausen, Bent Petersen, David S Holmes*

*Corresponding author for this work

    Research output: Contribution to journalJournal articleResearchpeer-review

    91 Downloads (Pure)

    Abstract

    Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.
    Original languageEnglish
    Article numberbay089
    JournalDatabase
    Volume2018
    Issue number1
    Number of pages6
    ISSN1758-0463
    DOIs
    Publication statusPublished - 2018

    Bibliographical note

    This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

    Cite this

    Jorquera, Roddy ; González, Carolina ; Clausen, Philip ; Petersen, Bent ; Holmes, David S. / Improved ontology for eukaryotic single-exon coding sequences in biological databases. In: Database. 2018 ; Vol. 2018, No. 1.
    @article{7b8ea55cd8e1458e90c27d0a6fd8cea1,
    title = "Improved ontology for eukaryotic single-exon coding sequences in biological databases",
    abstract = "Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.",
    author = "Roddy Jorquera and Carolina Gonz{\'a}lez and Philip Clausen and Bent Petersen and Holmes, {David S}",
    note = "This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.",
    year = "2018",
    doi = "10.1093/database/bay089",
    language = "English",
    volume = "2018",
    journal = "Database",
    issn = "1758-0463",
    publisher = "Oxford University Press",
    number = "1",

    }

    Improved ontology for eukaryotic single-exon coding sequences in biological databases. / Jorquera, Roddy; González, Carolina; Clausen, Philip ; Petersen, Bent; Holmes, David S.

    In: Database, Vol. 2018, No. 1, bay089, 2018.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Improved ontology for eukaryotic single-exon coding sequences in biological databases

    AU - Jorquera, Roddy

    AU - González, Carolina

    AU - Clausen, Philip

    AU - Petersen, Bent

    AU - Holmes, David S

    N1 - This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

    PY - 2018

    Y1 - 2018

    N2 - Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

    AB - Efficient extraction of knowledge from biological data requires the development of structured vocabularies to unambiguously define biological terms. This paper proposes descriptions and definitions to disambiguate the term ‘single-exon gene’. Eukaryotic Single-Exon Genes (SEGs) have been defined as genes that do not have introns in their protein coding sequences. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancer and neurological/developmental disorders and many exhibit tissue-specific transcription. Unfortunately, the term ‘SEGs’ is rife with ambiguity, leading to biological misinterpretations. In the classic definition, no distinction is made between SEGs that harbor introns in their untranslated regions (UTRs) versus those without. This distinction is important to make because the presence of introns in UTRs affects transcriptional regulation and post-transcriptional processing of the mRNA. In addition, recent whole-transcriptome shotgun sequencing has led to the discovery of many examples of single-exon mRNAs that arise from alternative splicing of multi-exon genes, these single-exon isoforms are being confused with SEGs despite their clearly different origin. The increasing expansion of RNA-seq datasets makes it imperative to distinguish the different SEG types before annotation errors become indelibly propagated in biological databases. This paper develops a structured vocabulary for their disambiguation, allowing a major reassessment of their evolutionary trajectories, regulation, RNA processing and transport, and provides the opportunity to improve the detection of gene associations with disorders including cancers, neurological and developmental diseases.

    U2 - 10.1093/database/bay089

    DO - 10.1093/database/bay089

    M3 - Journal article

    VL - 2018

    JO - Database

    JF - Database

    SN - 1758-0463

    IS - 1

    M1 - bay089

    ER -