TY - JOUR
T1 - X-hitting: A new algorithm for novelty detection and dereplication by UV spectra of complex mixtures of natural products
AU - Hansen, Michael Edberg
AU - Smedsgaard, Jørn
AU - Larsen, Thomas Ostenfeld
PY - 2005
Y1 - 2005
N2 - A major challenge in lead discovery is to detect well-known and trivial compounds rapidly, a process known as dereplication, so that isolation, structure elucidation, and pharmacological investigations can be focused on novel compounds. In this paper, we present a new algorithm, X-hitting, based on cross sample comparison of full UV spectra from HPLC analysis of highly complex natural product extracts/samples. X-Hitting allows automatic identification of known compounds but more important also allows finding of potentially new or similar compounds. We demonstrate this new algorithm by automatic identification of known structures, a task we call cross-hitting, and tentative identification of potentially new bioactive compounds, a task we call new-hitting, in HPLC data from analysis of fungal extracts. Both tasks are illustrated using 18 important reference compounds and complex fungal extracts obtained from isolates in the IBT Culture Collection held at BioCentrum-DTU, Technical University of Denmark. The receiver operating characteristics statistic is used to evaluate the performance of the compound predictor, and it was found that compounds could be identified with high confidence (AUC approximate to 0.98). Based on high confidence in retrieving identical spectra, the method is extended to include similar but still different spectra.
AB - A major challenge in lead discovery is to detect well-known and trivial compounds rapidly, a process known as dereplication, so that isolation, structure elucidation, and pharmacological investigations can be focused on novel compounds. In this paper, we present a new algorithm, X-hitting, based on cross sample comparison of full UV spectra from HPLC analysis of highly complex natural product extracts/samples. X-Hitting allows automatic identification of known compounds but more important also allows finding of potentially new or similar compounds. We demonstrate this new algorithm by automatic identification of known structures, a task we call cross-hitting, and tentative identification of potentially new bioactive compounds, a task we call new-hitting, in HPLC data from analysis of fungal extracts. Both tasks are illustrated using 18 important reference compounds and complex fungal extracts obtained from isolates in the IBT Culture Collection held at BioCentrum-DTU, Technical University of Denmark. The receiver operating characteristics statistic is used to evaluate the performance of the compound predictor, and it was found that compounds could be identified with high confidence (AUC approximate to 0.98). Based on high confidence in retrieving identical spectra, the method is extended to include similar but still different spectra.
U2 - 10.1021/ac040191e
DO - 10.1021/ac040191e
M3 - Journal article
SN - 0003-2700
VL - 77
SP - 6805
EP - 6817
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 21
ER -