Abstract
Since their introduction, smartphones have constantly increased their market share. Smartphones allow users’ identification, authentication, and billing. From a transport science perspective, smartphones can be used as a complex multi-sensor platform, enabling passive collection of human travel behavior. During this development, people and smartphones have become almost inseparable, especially during travel.
Smartphones can reveal new knowledge on transport behavior variations both between and within users. While traditional approaches are already measuring behavior variations between users, we need higher resolution to measure these variations within the same user. For example, one can alternate the use of bike and car according to weather conditions. For others, the alternation could derive from the day of week, the season, or the needs of some family members. On the one hand, cross-sectional interview-based surveys are unable to capture such details. Smartphones and their sensors may on the other hand offer unprecedented spatial and contextual resolution.
Handling such a higher resolution, however, provides a new complex set of challenges. Let us imagine a scenario in which smartphones and vehicles active on the transport network are continuously connected to the communication network for the purpose of providing an intelligent transportation service. The resulting data footprint of both sensors and algorithms would be huge. Moreover, the “intelligence” of such a system should follow each passenger on each instant of its journey. Regardless of the ongoing engineering challenges, mostly unsolved, learning people transport behavior in this scenario requires rich and efficient data representations, and knowledge of each trip’s ground truth at the same scale as the data: this is in itself a gargantuan challenge, and this work moves initial steps to ease it.
The ability of measuring behavior variations within the same user could enable discoveries we cannot predict. Further, these measures may discover causal reasons for human transport behavior that existing measurement systems cannot provide. Although people travel only a fraction of the day, the purpose behind each trip is one of many activities defining their lives. The thesis contributes towards better measurements of the transport behavior.
For achieving significant measures of transport behavior variations within each user, while measuring variations between users, this Ph.D. thesis provides the following main contributions.
We pinpoint and examine the problems limiting prior research up-front. This step exposes drivers to select and rank machine-learning algorithms used for processing data generated by smartphones. It also shows the main physical limitations, and an overview of the methodological frameworks deployed for measuring transport behavior variations. The output consists of a defined relationship among user interaction, methods, and data.
Next, we focus on two fundamental binary classification problems of Geographic Positioning System (GPS) trajectories. Both underpin many current and upcoming smartphonebased technologies deployed to measure transport behavior variations during a journey: one problem is stop-detection; the other is identifying users’ presence inside or outside the transport network. Most of the problems relevant for detecting transport behavior variations belong to one or both of these two large categories.
In both cases the solutions share a framework of methodological-, technological-, and sensorial-information convergence. Solutions’ quality affects directly the quality of transport behavior measures, such as inference of departure/arrival time, transport mode, trip purpose, and transit flows through the transport network.
For stop detection, we combine GPS time series with spatial context information retrieved from a Geographic Information System (GIS), which we represent as multi-dimension tensors. This line of work explores both simple and advanced data representations benchmarked through specialized artificial neural networks, random forest, and unsupervised machine learning baselines.
To classify whether one is inside or outside the transport network, we combine independent sensors measuring the same interactions between smartphone and infrastructure. We leverage signals allowing short-range implicit interactions between devices. To assess the potential, we verify how robust these signals and related machine learning classifiers are against the noise typical of realistic contexts.
We developed a proprietary smartphone-sensing platform collecting these independent and contemporary signals from devices installed on the infrastructure–buses in our use case– and Global Positioning System locations of both buses and smartphones. In a real experiment, we collected various levels of ground truth quality and smartphone-based sensors’ data. Then we simulate human errors in the labelling process, as is known to happen in smartphone surveys when people validate travel diaries.
On large scale multi-modal deployments, widespread technologies sensing people presence within the transportation system–such as Implicit Walk-in/Walk-out (WIWO) and explicit Check-in/Check-out (CICO)–present limitations. For example, accuracy depends on the ground truth’s reliability; scalability, on the sustainability of reliable ground truth. These limitations prevent Intelligent Transportation Systems from supporting analysis, optimization, and control of transport comfort, safety, and efficiency. Implicit smartphone-sensing aims also at closing this gap. We propose the Cause-Effect Multitask Wasserstein Autoencoder. This method acts as a powerful dimensionality reduction tool and obtains an auto-validated representation of a latent space describing users’ smartphones within the transport system. Such a representation allows meaningful clustering, consistent with the problem at hand, via DBSCAN (Density-based Spatial Clustering of Applications with Noise). Consequently, this method enables the output of ground truth at Big Data scale.
A general contribution we yield across the work presented above, stems from the ablation studies. Noisy signals affect the classification performance. However, the impact of this noise on the classification performance is not always intuitive. For example, let us consider a very noisy dataset. If this noise affects the signal, the classification accuracy computed after data cleansing would not consider a large fraction of the data lost in the cleansing step. If the noise affects the ground truth, false positives may be true positives and false negatives would be true negatives. Consequently, to support optimal decision-making,
we propose two perspectives, which we introduce to complement metrics derived from the confusion matrix, such as Accuracy or F1-score. We measure the impact of noise for both GPS signal and ground truth. In the first case, we look at the correlation coefficient. In the second case, we simulate labelling errors. These measures of noise impact to the classification performance can be considered as key performance index, facilitating the comparison across different classifiers. Ground truth quality, as other signals, represents a random variable underpinning both the scalability and the performance of any classifier.
As a conclusion, the thesis provides the basis for methods enabling higher resolution measurements of human transport behavior variations at a Big Data scale, and the contributions mentioned above represent a promising step. The novel data structures and methodologies bring the potential of reduced bias in the measurements. At the same time, the impact of a reduced bias for methods’ evaluation is direct and immediate.
Smartphones can reveal new knowledge on transport behavior variations both between and within users. While traditional approaches are already measuring behavior variations between users, we need higher resolution to measure these variations within the same user. For example, one can alternate the use of bike and car according to weather conditions. For others, the alternation could derive from the day of week, the season, or the needs of some family members. On the one hand, cross-sectional interview-based surveys are unable to capture such details. Smartphones and their sensors may on the other hand offer unprecedented spatial and contextual resolution.
Handling such a higher resolution, however, provides a new complex set of challenges. Let us imagine a scenario in which smartphones and vehicles active on the transport network are continuously connected to the communication network for the purpose of providing an intelligent transportation service. The resulting data footprint of both sensors and algorithms would be huge. Moreover, the “intelligence” of such a system should follow each passenger on each instant of its journey. Regardless of the ongoing engineering challenges, mostly unsolved, learning people transport behavior in this scenario requires rich and efficient data representations, and knowledge of each trip’s ground truth at the same scale as the data: this is in itself a gargantuan challenge, and this work moves initial steps to ease it.
The ability of measuring behavior variations within the same user could enable discoveries we cannot predict. Further, these measures may discover causal reasons for human transport behavior that existing measurement systems cannot provide. Although people travel only a fraction of the day, the purpose behind each trip is one of many activities defining their lives. The thesis contributes towards better measurements of the transport behavior.
For achieving significant measures of transport behavior variations within each user, while measuring variations between users, this Ph.D. thesis provides the following main contributions.
We pinpoint and examine the problems limiting prior research up-front. This step exposes drivers to select and rank machine-learning algorithms used for processing data generated by smartphones. It also shows the main physical limitations, and an overview of the methodological frameworks deployed for measuring transport behavior variations. The output consists of a defined relationship among user interaction, methods, and data.
Next, we focus on two fundamental binary classification problems of Geographic Positioning System (GPS) trajectories. Both underpin many current and upcoming smartphonebased technologies deployed to measure transport behavior variations during a journey: one problem is stop-detection; the other is identifying users’ presence inside or outside the transport network. Most of the problems relevant for detecting transport behavior variations belong to one or both of these two large categories.
In both cases the solutions share a framework of methodological-, technological-, and sensorial-information convergence. Solutions’ quality affects directly the quality of transport behavior measures, such as inference of departure/arrival time, transport mode, trip purpose, and transit flows through the transport network.
For stop detection, we combine GPS time series with spatial context information retrieved from a Geographic Information System (GIS), which we represent as multi-dimension tensors. This line of work explores both simple and advanced data representations benchmarked through specialized artificial neural networks, random forest, and unsupervised machine learning baselines.
To classify whether one is inside or outside the transport network, we combine independent sensors measuring the same interactions between smartphone and infrastructure. We leverage signals allowing short-range implicit interactions between devices. To assess the potential, we verify how robust these signals and related machine learning classifiers are against the noise typical of realistic contexts.
We developed a proprietary smartphone-sensing platform collecting these independent and contemporary signals from devices installed on the infrastructure–buses in our use case– and Global Positioning System locations of both buses and smartphones. In a real experiment, we collected various levels of ground truth quality and smartphone-based sensors’ data. Then we simulate human errors in the labelling process, as is known to happen in smartphone surveys when people validate travel diaries.
On large scale multi-modal deployments, widespread technologies sensing people presence within the transportation system–such as Implicit Walk-in/Walk-out (WIWO) and explicit Check-in/Check-out (CICO)–present limitations. For example, accuracy depends on the ground truth’s reliability; scalability, on the sustainability of reliable ground truth. These limitations prevent Intelligent Transportation Systems from supporting analysis, optimization, and control of transport comfort, safety, and efficiency. Implicit smartphone-sensing aims also at closing this gap. We propose the Cause-Effect Multitask Wasserstein Autoencoder. This method acts as a powerful dimensionality reduction tool and obtains an auto-validated representation of a latent space describing users’ smartphones within the transport system. Such a representation allows meaningful clustering, consistent with the problem at hand, via DBSCAN (Density-based Spatial Clustering of Applications with Noise). Consequently, this method enables the output of ground truth at Big Data scale.
A general contribution we yield across the work presented above, stems from the ablation studies. Noisy signals affect the classification performance. However, the impact of this noise on the classification performance is not always intuitive. For example, let us consider a very noisy dataset. If this noise affects the signal, the classification accuracy computed after data cleansing would not consider a large fraction of the data lost in the cleansing step. If the noise affects the ground truth, false positives may be true positives and false negatives would be true negatives. Consequently, to support optimal decision-making,
we propose two perspectives, which we introduce to complement metrics derived from the confusion matrix, such as Accuracy or F1-score. We measure the impact of noise for both GPS signal and ground truth. In the first case, we look at the correlation coefficient. In the second case, we simulate labelling errors. These measures of noise impact to the classification performance can be considered as key performance index, facilitating the comparison across different classifiers. Ground truth quality, as other signals, represents a random variable underpinning both the scalability and the performance of any classifier.
As a conclusion, the thesis provides the basis for methods enabling higher resolution measurements of human transport behavior variations at a Big Data scale, and the contributions mentioned above represent a promising step. The novel data structures and methodologies bring the potential of reduced bias in the measurements. At the same time, the impact of a reduced bias for methods’ evaluation is direct and immediate.
| Original language | English |
|---|
| Publisher | Technical University of Denmark |
|---|---|
| Number of pages | 212 |
| Publication status | Published - 2021 |
Fingerprint
Dive into the research topics of 'Mining User Transport Behavior from Smartphones'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Mining User Behaviour from Smartphone-data
Servizi, V. (PhD Student), Fang, Z. (Examiner), Rodrigues, F. (Examiner), Pereira, F. C. (Main Supervisor), Nielsen, O. A. (Supervisor) & Tørset, T. (Examiner)
01/11/2018 → 09/06/2022
Project: PhD
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver