TY - JOUR
T1 - On using kernel integration by graphical LASSO to study partial correlations between heterogeneous data sets
AU - Nørgaard, Sarah Kristine
AU - Linder-Steinlein, Kristoffer
AU - Eliasen, Anders Ulrik
AU - Stokholm, Jakob
AU - Chawes, Bo L.
AU - Bønnelykke, Klaus
AU - Bisggard, Hans
AU - Smilde, Age K.
AU - Rasmussen, Morten Arendt
PY - 2021
Y1 - 2021
N2 - Integration of unstructured and very diverse data is often required for a deeper understanding of the complex biological systems. In order to uncover communalities between heterogeneous data, the data are often harmonized by constructing a kernel and perform numerical integration. In this study, we propose a method for data integration in the framework of an undirected graphical model, where the nodes represent individual data sources of varying nature in terms of complexity and underlying distribution and where the edges represent the partial correlations between two blocks of data. We propose a modified GLASSO for estimation of the graph, with a combination of cross-validation and extended Bayes Information Criterion for sparsity tuning. Furthermore, hierarchical clustering on the weighted consensus kernels from a fixed network is used to partitioning the samples into different classes. Simulations show increasing ability to uncover true edges with increasing sample size and signal to noise. Likewise, identification of nonexisting edges towards disconnected nodes is feasible. The framework is demonstrated for integration of longitudinal symptom burden data, from the second and third year of life, combined with 21 diseases precursors and information of the development of asthma and eczema at the age of 6 years, from 403 children from the COPSAC2010 mother-child cohort. This suggests that maternal predisposition as well as being born preterm indirectly lead to a higher risk of asthma via an increased respiratory symptom burden.
AB - Integration of unstructured and very diverse data is often required for a deeper understanding of the complex biological systems. In order to uncover communalities between heterogeneous data, the data are often harmonized by constructing a kernel and perform numerical integration. In this study, we propose a method for data integration in the framework of an undirected graphical model, where the nodes represent individual data sources of varying nature in terms of complexity and underlying distribution and where the edges represent the partial correlations between two blocks of data. We propose a modified GLASSO for estimation of the graph, with a combination of cross-validation and extended Bayes Information Criterion for sparsity tuning. Furthermore, hierarchical clustering on the weighted consensus kernels from a fixed network is used to partitioning the samples into different classes. Simulations show increasing ability to uncover true edges with increasing sample size and signal to noise. Likewise, identification of nonexisting edges towards disconnected nodes is feasible. The framework is demonstrated for integration of longitudinal symptom burden data, from the second and third year of life, combined with 21 diseases precursors and information of the development of asthma and eczema at the age of 6 years, from 403 children from the COPSAC2010 mother-child cohort. This suggests that maternal predisposition as well as being born preterm indirectly lead to a higher risk of asthma via an increased respiratory symptom burden.
KW - Dual-primal optimization
KW - GLASSO
KW - Heterogeneous data integration
KW - Kernelization
KW - Undirected graphical models
U2 - 10.1002/cem.3324
DO - 10.1002/cem.3324
M3 - Journal article
SN - 0886-9383
VL - 35
JO - Journal of Chemometrics
JF - Journal of Chemometrics
IS - 10
M1 - e3324
ER -