TY - JOUR
T1 - Compass: A hybrid method for clinical and biobank data mining
AU - Krysiak-Baltyn, Konrad
AU - Petersen, Thomas Nordahl
AU - Audouze, Karine Marie Laure
AU - Jørgensen, Niels
AU - Ängquist, L.
AU - Brunak, Søren
PY - 2014
Y1 - 2014
N2 - We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical–disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.
AB - We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as “hotspots” for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical–disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.
KW - Data mining
KW - Clinical data
KW - Rule extraction
KW - Self-Organizing Map
KW - Association mining
U2 - 10.1016/j.jbi.2013.10.007
DO - 10.1016/j.jbi.2013.10.007
M3 - Journal article
SN - 1532-0464
VL - 47
SP - 160
EP - 170
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -