TY - JOUR
T1 - Analysis of two large functionally uncharacterized regions in the Methanopyrus kandleri AV19 genome
AU - Jensen, Lars Juhl
AU - Skovgaard, Marie
AU - Sicheritz-Pontén, Thomas
AU - Jørgensen, Merete Kjær
AU - Lundegaard, Claus
AU - Pedersen, Corinna Cavan
AU - Petersen, Nanna
AU - Ussery, David
PY - 2003
Y1 - 2003
N2 - Background: For most sequenced prokaryotic genomes, about a third of the protein coding genes annotated are "orphan proteins", that is, they lack homology to known proteins. These hypothetical genes are typically short and randomly scattered throughout the genome. This trend is seen for most of the bacterial and archaeal genomes published to date.Results: In contrast we have found that a large fraction of the genes coding for such orphan proteins in the Methanopyrus kandleri AV19 genome occur within two large regions. These genes have no known homologs except from other M. kandleri genes. However, analysis of their lengths, codon usage, and Ribosomal Binding Site (RBS) sequences shows that they are most likely true protein coding genes and not random open reading frames.Conclusions: Although these regions can be considered as candidates for massive lateral gene transfer, our bioinformatics analysis suggests that this is not the case. We predict many of the organism specific proteins to be transmembrane and belong to protein families that are non-randomly distributed between the regions. Consistent with this, we suggest that the two regions are most likely unrelated, and that they may be integrated plasmids.
AB - Background: For most sequenced prokaryotic genomes, about a third of the protein coding genes annotated are "orphan proteins", that is, they lack homology to known proteins. These hypothetical genes are typically short and randomly scattered throughout the genome. This trend is seen for most of the bacterial and archaeal genomes published to date.Results: In contrast we have found that a large fraction of the genes coding for such orphan proteins in the Methanopyrus kandleri AV19 genome occur within two large regions. These genes have no known homologs except from other M. kandleri genes. However, analysis of their lengths, codon usage, and Ribosomal Binding Site (RBS) sequences shows that they are most likely true protein coding genes and not random open reading frames.Conclusions: Although these regions can be considered as candidates for massive lateral gene transfer, our bioinformatics analysis suggests that this is not the case. We predict many of the organism specific proteins to be transmembrane and belong to protein families that are non-randomly distributed between the regions. Consistent with this, we suggest that the two regions are most likely unrelated, and that they may be integrated plasmids.
U2 - 10.1186/1471-2164-4-12
DO - 10.1186/1471-2164-4-12
M3 - Journal article
SN - 1471-2164
VL - 4
SP - 12
JO - BMC Genomics
JF - BMC Genomics
ER -