Projects per year
Abstract
Statistical methods are often motivated by real problems. We consider methods inspired by problems in biology and medicine. The thesis is in two parts.
In the ﬁrst part we consider data in the form of graphs (or networks). These occur naturally in many contexts such as social and biological networks. We speciﬁcally consider the setting where we have multiple graphs on the same set of nodes. We propose a model in this setting called the multiple random dot product graph model. Fitting the model is an optimization problem which we solve eﬃciently using a new alternating minimization algorithm. A hypothesis test in the model framework for whether two graphs are drawn from the same distribution is also proposed. Both the ﬁtting algorithm and test are evaluated in simulation studies. The model is also generalized to weighted graphs where we speciﬁcally consider Poisson and normally distributed weights. Similar hypothesis tests are proposed in these settings and again we evaluate the performance through simulation studies.
The second part of the thesis considers prediction of disease progression. We compare three common approaches for disease prediction and apply them to a diabetes data set. In this data, the time until a patient goes on to insulin treatment is of interest  especially whether progression is fast or slow. The methods are: A Cox proportional hazards model, a random forest method for survival data, and a neural network approach. The prediction performance, and the pros and cons of the methods are discussed.
In the ﬁrst part we consider data in the form of graphs (or networks). These occur naturally in many contexts such as social and biological networks. We speciﬁcally consider the setting where we have multiple graphs on the same set of nodes. We propose a model in this setting called the multiple random dot product graph model. Fitting the model is an optimization problem which we solve eﬃciently using a new alternating minimization algorithm. A hypothesis test in the model framework for whether two graphs are drawn from the same distribution is also proposed. Both the ﬁtting algorithm and test are evaluated in simulation studies. The model is also generalized to weighted graphs where we speciﬁcally consider Poisson and normally distributed weights. Similar hypothesis tests are proposed in these settings and again we evaluate the performance through simulation studies.
The second part of the thesis considers prediction of disease progression. We compare three common approaches for disease prediction and apply them to a diabetes data set. In this data, the time until a patient goes on to insulin treatment is of interest  especially whether progression is fast or slow. The methods are: A Cox proportional hazards model, a random forest method for survival data, and a neural network approach. The prediction performance, and the pros and cons of the methods are discussed.
Original language  English 

Publisher  Technical University of Denmark 

Number of pages  157 
Publication status  Published  2019 
Series  DTU Compute PHD2018 

Volume  488 
ISSN  09093192 
Fingerprint
Dive into the research topics of 'Statistical Learning with Applications in Biology'. Together they form a unique fingerprint.Projects
 1 Finished

Big Data Modelling with Applications to Airports
Nielsen, A. M., Clemmensen, L. K. H., Dahl, A. B., Ersbøll, B. K., Dahl, V. A., Sporring, J. & Jenssen, R.
01/08/2015 → 10/04/2019
Project: PhD