Computational Methods for Large Ordinal Data with Crossed Random Effects

Research output: Book/ReportPh.D. thesis

10 Downloads (Pure)

Abstract

We are constantly reminded of the strong presence of data in our society. Regardless of whether the topic is crimes, tax systems, surveillance systems, self-driving cars, chatbots or systems which give us recommendations on anything and everything, data is mentioned. Data does not only grow in the public awarness; it also grows in both amount and size. However, we can only exploit the data if we can analyze it and thereby gain insight into the knowledge hidden within the structures of the data.

This thesis studies a special type of data called ordinal data with crossed random effects. This type of data is harvested in large quantities from online recommendation systems, but is found within many different areas of research. The data essentially consists of an ordered categorical (ordinal) response and two crossed factors, which means that each element of one factor can co-appear with each element of the other factor. It may be depicted as a two-way table, where the elements of one factor constitute the rows, the elements of the other factor constitute the columns, and the cells contain the ordinal response. A classic example is movie recommender systems, where users rate movies on a scale from one to five stars. The stars represent an ordinal response while the users and the movies are elements of the two crossed factors. When one is not concerned with the specific elements of the factors, and the elements of the factors can be assumed randomly drawn from some large population of possible elements, then it is relevant to treat the factors as random. In movie recommender systems, this is relevant if the aim of the analysis for example is to find out whether the genre of a movie affects the score for a general user and a general movie.

An obvious choice of model to analyse this type of data is ordinal mixed effects models. They analyse the response as truly ordinal and include both fixed and random effects. But the computations required to estimate these models do not scale to large data sets, with both many observations and a large number of elements in each of the crossed random factors. This thesis therefore studies computational methods for large ordinal data sets with crossed random effects. The goal is to scale estimation and inference such that it is possible to analyse data sets of ever-increasing size.

The thesis is based on four papers. In Paper A we use the R package TMB to optimize the Laplace Approximation to the marginal log-likelihood and thereby estimate ordinal models with crossed random effects. We show that this is faster than existing software in R, but does not scale better. The remaining papers (B, C and D) examine and develop different computational methods to scale estimation and inference of ordinal models with crossed random effects. This includes, among others, Gaussian variational approximation, stochastic optimization, approximations based on the delta method, optimization based on iterative fixation of variables and diagonal approximations. One of the methods, shows in numerical studies, that estimation and inference scales close to linearly with the number of observations, while the estimates are similar to estimates based on the Laplace Approximation in terms of accuracy. The thesis ends with a summary of the most important points from the papers and a discussion on future work.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages282
Publication statusPublished - 2023

Fingerprint

Dive into the research topics of 'Computational Methods for Large Ordinal Data with Crossed Random Effects'. Together they form a unique fingerprint.

Cite this