The genetic diversity and comparative analyses of eukaryotes based on large-scale genomic data

Ting Yang*

*Corresponding author for this work

Research output: Book/ReportPh.D. thesis

41 Downloads (Pure)


Genome sequencing has grown rapidly in the past 20 years. From a single genome, we can dissect the genome size, gene content, gene family, and biosynthesis pathways for each species. In addition, by performing genome comparisons using more than two genomes, we can understand their phylogenetic relationships, genomic changes and common eukaryotic biology, such as for fungi and plants, which play critical roles in ecosystems and provide us food, medicine and other basic requirements. It is critical to assess the conservation status of the population, utilize the useful feature and retain their biodiversity in the genomics era.

There are around 350,000 recognized species of plants on the planet. The plastid organelle performs various vital cellular processes, including the growth and development of plants. The complete plastid genomes could be utilized to enhance our understanding of plant biology and diversity, and provide evidence for new species identification. By employing these large-scale data, the genomic organizations were compared in their plastid genomic DNA, which include gene gain/loss, gene copy number, GC content, gene conservation and gene blocks. By employing three different data sets in all nucleotide positions (nt123), only the first and second codon positions (nt12), and amino acids (AA), a robust phylogeny were constructed of green plants.

Filamentous fungi are known for their natural production of many commercial enzymes and biologically active drugs, and in particular, Aspergillus and Penicillium as the most dominant genera of filamentous fungi are widely used in industry, food, fermentation and drug discovery. Until January 2020, 128 Aspergillus genomes from NCBI and 87 from JGI, 68 Penicillium genomes from NCBI and 32 from JGI have been published. In my PhD study, a comprehensively analysis of 146 genomes from 22 sections of the genus Aspergillus and 13 sections of the genus Penicillium was done. This large amount of complete genome data was effectively utilized to understand the phylogenetic relationships among Aspergillus and Penicillium. Gene content, particularly genes encoding carbohydrate-active enzyme (CAZyme genes), was compared among intra- and inter-section. By identifying gain, expansion and unique gene families in each species, we tried to discover genes associated with ecological niches. Secondary metabolism gene clusters were predicted and compared in 114 Aspergillus and 32 Penicillium genomes. By examining 99 published secondary metabolism gene clusters, some important metabolism gene clusters were found to keep conserved order along the genomes, it seemed these clusters descend from the same ancestral elements of their last common ancestor. In conclusion, large scale analysis of eukaryotes performs as a strong power method to reveal genomic diversity and conservation including gene gain, gene loss, unqiue gene families and gene clusters, thereby enabling to better understand the finctional annotation of genes and evolutionary history of genomes.
Original languageEnglish
Place of PublicationKgs. Lyngby, Denmark
PublisherDTU Bioengineering
Number of pages142
Publication statusPublished - 2022


Dive into the research topics of 'The genetic diversity and comparative analyses of eukaryotes based on large-scale genomic data'. Together they form a unique fingerprint.

Cite this