Abstract
Asthma, rhinitis, and eczema, are among the most prevalent allergic diseases worldwide. They have strong genetic and epigenetic contributions. We hypothesized that an integration of multi-omics layers can accurately predict allergy using machine learning methods.
We combined data on environmental and genetic risk scores with blood and nasal DNA methylation data from 348 subjects aged 16-years from the Dutch PIAMA (Prevention and Incidence of Asthma and Mite Allergy) birth cohort. After assessing multiple machine learning methods, we selected Elastic Net for its accuracy, low overfit and interpretability.
The majority of predictive power could be attributed to nasal DNA methylation. Using strict feature selection, we created a parsimonious allergy prediction model based on just three nasal CpG sites, that is able to robustly predict allergic disease. This model achieved a ROC AUC of 0.86 in the discovery PIAMA cohort and 0.82 in a Puerto Rican replication cohort of similar age. Lower performance was observed in two younger Dutch and Danish replication cohorts, both at age 6 years, which could be explained by the differing and age dependent methylation levels. The DNA methylation levels of the model’s three CpG sites are related to IgE sensitization and allergic disease comorbidity and are able to differentiate between symptomatic and asymptomatic allergy. The transcriptomic features associated with methylation at these CpG sites indicated increased presence or activity of T cells and myeloid cells.
Our study provides novel insights into the strong prediction power of nasal DNA methylation and offers promising non-invasive biomarkers that could be used to diagnose childhood allergy in clinical practice.
We combined data on environmental and genetic risk scores with blood and nasal DNA methylation data from 348 subjects aged 16-years from the Dutch PIAMA (Prevention and Incidence of Asthma and Mite Allergy) birth cohort. After assessing multiple machine learning methods, we selected Elastic Net for its accuracy, low overfit and interpretability.
The majority of predictive power could be attributed to nasal DNA methylation. Using strict feature selection, we created a parsimonious allergy prediction model based on just three nasal CpG sites, that is able to robustly predict allergic disease. This model achieved a ROC AUC of 0.86 in the discovery PIAMA cohort and 0.82 in a Puerto Rican replication cohort of similar age. Lower performance was observed in two younger Dutch and Danish replication cohorts, both at age 6 years, which could be explained by the differing and age dependent methylation levels. The DNA methylation levels of the model’s three CpG sites are related to IgE sensitization and allergic disease comorbidity and are able to differentiate between symptomatic and asymptomatic allergy. The transcriptomic features associated with methylation at these CpG sites indicated increased presence or activity of T cells and myeloid cells.
Our study provides novel insights into the strong prediction power of nasal DNA methylation and offers promising non-invasive biomarkers that could be used to diagnose childhood allergy in clinical practice.
Original language | English |
---|---|
Journal | The European Respiratory Journal |
Volume | 58 |
ISSN | 0903-1936 |
DOIs | |
Publication status | Published - 2021 |