Embedding Respresentations for Discrete Choice and Travel Demand Models

Ioanna Arkoudi

Research output: Book/ReportPh.D. thesis

64 Downloads (Pure)


Including richer data in Discrete Choice Models (DCM) and Activity-based Models is crucial for promoting future transport research. However, the default method of encoding categorical variables, “one-hot encoding”, poses constrains to the number of explanatory variables that can be included. As it adds an extra variable per category, the model’s complexity is increased proportionally to the cardinality of the categorical variables considered. This can pose severe challenges in statistical modeling with an exponentially increased sample size requirement to avoid problems such as overfitting or poor parameter
estimation. Given that travel data collection is the most resource-intensive of the transportation model development process, rather than increasing the sample size, a more efficient treatment would be to find alternative methods for encoding categorical variables, that would allow for more compact yet informative representations of the categorical data.

In this thesis we address this need by introducing a novel, data-driven Neural Network (NN) approach for encoding categorical and discrete travel variables into continuous vector representations, called Embeddings. This approach is strongly inspired by Natural Language Processing (NLP) techniques, and allows us to obtain compact–yet meaningful representations that are able to capture semantic associations between the encoded categories, and contextual information in relation to a specific prediction task. By integrating embedding representations into Discrete Choice and Activity-Based Models we aim to develop hybrid models that not only outperform previously suggested approaches but also produce interpretable and behaviorally meaningful outputs that provide a more comprehensive understanding of the travel behavior. By achieving these aims, this thesis seeks to contribute to future transport research by providing a more efficient and interpretative framework for modeling travel behavior, ultimately leading to more informed decision-making in transportation planning and policy implementation.

In the first part of the thesis, i.e. “Embedding representations in Discrete Choice Models” we build on previous work combining theory-driven and data-driven choice models and introduce more efficient embedding model architectures aiming to improve both performance and interpretability within DCMs. In the second part, i.e. “Latent representations in Travel Demand Models” our focus shifts from using embedding representations within static DCMs to explore their applications in Activity-Based models for Travel Demand. In this respect, we explore how embedding representations can be used within timeseries
data, specifically focusing on their applications in Activity-Based models for daily trip sequence generation and single-trip forecasting purposes. Within this context, we present two methods for learning and employing embedding representations: (i) within the framework of a Dynamic Bayesian Network where they constitute an integral component of the model architecture, and (ii) through an Auto-encoder to obtain geospatial embeddings representing origin and destination areas, which are subsequently used to augment the feature space of long short-term memory (LSTM) model. The thesis concludes with a discussion on the wider context of recent advancements in Artificial Intelligence (AI), specifically in the language domain that inspired our research, aiming to identify potential avenues for future research in travel behavior modeling.
Original languageEnglish
PublisherTechnical University of Denmark
Number of pages154
Publication statusPublished - 2024


Dive into the research topics of 'Embedding Respresentations for Discrete Choice and Travel Demand Models'. Together they form a unique fingerprint.

Cite this