A major challenge in lead discovery is to detect well-known and trivial compounds rapidly, a process known as dereplication, so that isolation, structure elucidation, and pharmacological investigations can be focused on novel compounds. In this paper, we present a new algorithm, X-hitting, based on cross sample comparison of full UV spectra from HPLC analysis of highly complex natural product extracts/samples. X-Hitting allows automatic identification of known compounds but more important also allows finding of potentially new or similar compounds. We demonstrate this new algorithm by automatic identification of known structures, a task we call cross-hitting, and tentative identification of potentially new bioactive compounds, a task we call new-hitting, in HPLC data from analysis of fungal extracts. Both tasks are illustrated using 18 important reference compounds and complex fungal extracts obtained from isolates in the IBT Culture Collection held at BioCentrum-DTU, Technical University of Denmark. The receiver operating characteristics statistic is used to evaluate the performance of the compound predictor, and it was found that compounds could be identified with high confidence (AUC approximate to 0.98). Based on high confidence in retrieving identical spectra, the method is extended to include similar but still different spectra.
Hansen, M. E., Smedsgaard, J., & Larsen, T. O. (2005). X-hitting: A new algorithm for novelty detection and dereplication by UV spectra of complex mixtures of natural products. Analytical Chemistry, 77(21), 6805-6817. https://doi.org/10.1021/ac040191e