Thanks to its increasing availability, electronic literature has become a potential source of information for the development of complex Bayesian networks (BN), when human expertise is missing or data is scarce or contains much noise. This opportunity raises the question of how to integrate information from free-text resources with statistical data in learning Bayesian networks. Firstly, we report on the collection of prior information resources in the ovarian cancer domain, which includes "kernel" annotations of the domain variables. We introduce methods based on the annotations and literature to derive informative pairwise dependency measures, which are derived from the statistical cooccurrence of the names of the variables, from the similarity of the "kernel" descriptions of the variables and from a combined method. We perform wide-scale evaluation of these text-based dependency scores against an expert reference and against data scores (the mutual information (MI) and a Bayesian score). Next, we transform the text-based dependency measures into informative text-based priors for Bayesian network structures. Finally, we report the benefit of such informative text-based priors on the performance of a Bayesian network for the classification of ovarian tumors from clinical data.
|Journal||Artificial Intelligence in Medicine|
|Publication status||Published - 2004|