Background: Non- ribosomal peptide synthetases ( NRPSs) are large multimodular enzymes that synthesize a wide range of biologically active natural peptide compounds, of which many are pharmacologically important. Peptide bond formation is catalyzed by the Condensation ( C) domain. Various functional subtypes of the C domain exist: An LCL domain catalyzes a peptide bond between two L- amino acids, a DCL domain links an L- amino acid to a growing peptide ending with a D- amino acid, a Starter C domain ( first denominated and classified as a separate subtype here) acylates the first amino acid with a beta-hydroxy- carboxylic acid ( typically a beta-hydroxyl fatty acid), and Heterocyclization ( Cyc) domains catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues. The homologous Epimerization ( E) domain flips the chirality of the last amino acid in the growing peptide; Dual E/C domains catalyze both epimerization and condensation. Results: In this paper, we report on the reconstruction of the phylogenetic relationship of NRPS C domain subtypes and analyze in detail the sequence motifs of recently discovered subtypes ( Dual E/ C, C-D(L) and Starter domains) and their characteristic sequence differences, mutually and in comparison with C-L(L) domains. Based on their phylogeny and the comparison of their sequence motifs, C-L(L) and Starter domains appear to be more closely related to each other than to other subtypes, though pronounced differences in some segments of the protein account for the unequal donor substrates ( amino vs. beta- hydroxycarboxylic acid). Furthermore, on the basis of phylogeny and the comparison of sequence motifs, we conclude that Dual E/ C and C-D(L) domains share a common ancestor. In the same way, the evolutionary origin of a C domain of unknown function in glycopeptide ( GP) NRPSs can be determined to be an C-L(L) domain. In the case of two GP C domains which are most similar to C-D(L) but which have C-L(L) activity, we postulate convergent evolution. Conclusion: We systematize all C domain subtypes including the novel Starter C domain. With our results, it will be easier to decide the subtype of unknown C domains as we provide profile Hidden Markov Models ( pHMMs) for the sequence motifs as well as for the entire sequences. The determined specificity conferring positions will be helpful for the mutation of one subtype into another, e. g. turning C-D(L) to C-L(L), which can be a useful step for obtaining novel products.