TY - JOUR
T1 - NetAllergen, a random forest model integrating MHC-II presentation propensity for improved allergenicity prediction
AU - Li, Yuchen
AU - Sackett, Peter Wad
AU - Nielsen, Morten
AU - Barra, Carolina
N1 - Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press.
PY - 2023
Y1 - 2023
N2 - Motivation: Allergy is a pathological immune reaction towards innocuous protein antigens. Although only a narrow fraction of plant or animal proteins induce allergy, atopic disorders affect millions of children and adults and cost billions in healthcare systems worldwide. In silico predictors can aid in the development of more innocuous food sources. Previous allergenicity predictors used sequence similarity, common structural domains, and amino acid physicochemical features. However, these predictors strongly rely on sequence similarity to known allergens and fail to predict protein allergenicity accurately when similarity diminishes. Results: To overcome these limitations, we collected allergens from AllergenOnline, a curated database of IgE-inducing allergens, carefully removed allergen redundancy with a novel protein partitioning pipeline, and developed a new allergen prediction method, introducing MHC presentation propensity as a novel feature. NetAllergen outperformed a sequence similarity-based BLAST baseline approach, and previous allergenicity predictor AlgPred 2 when similarity to known allergens is limited.
AB - Motivation: Allergy is a pathological immune reaction towards innocuous protein antigens. Although only a narrow fraction of plant or animal proteins induce allergy, atopic disorders affect millions of children and adults and cost billions in healthcare systems worldwide. In silico predictors can aid in the development of more innocuous food sources. Previous allergenicity predictors used sequence similarity, common structural domains, and amino acid physicochemical features. However, these predictors strongly rely on sequence similarity to known allergens and fail to predict protein allergenicity accurately when similarity diminishes. Results: To overcome these limitations, we collected allergens from AllergenOnline, a curated database of IgE-inducing allergens, carefully removed allergen redundancy with a novel protein partitioning pipeline, and developed a new allergen prediction method, introducing MHC presentation propensity as a novel feature. NetAllergen outperformed a sequence similarity-based BLAST baseline approach, and previous allergenicity predictor AlgPred 2 when similarity to known allergens is limited.
U2 - 10.1093/bioadv/vbad151
DO - 10.1093/bioadv/vbad151
M3 - Journal article
C2 - 37901344
AN - SCOPUS:85177064465
SN - 2635-0041
VL - 3
JO - Bioinformatics Advances
JF - Bioinformatics Advances
IS - 1
M1 - vbad151
ER -