Predicting injury-severity for cyclist crashes using natural language processing and neural network modelling

Kira Hyldekær Janstrup*, Bojan Kostic, Mette Møller, Filipe Rodrigues, Stanislav Borysov, Francisco Camara Pereira

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review


The use of machine learning techniques in safety research has increased as has the interest in using new data sources. This study's unique contribution is the application of text mining—focusing on perceived cyclist safety and crash occurrence in an urban environment. We analysed crash data collected by the emergency forces in the Capital Region of Denmark from 2013 to 2017 and self-reported textual data provided by cyclists from 2018 to 2019. The analysis included natural language processing and topic modelling to identify Latent Dirichlet Allocation (LDA) topics from self-reports, representing environment characteristics that cyclists’ perceive as unsafe. A multi-output neural network regression model is applied to predict the injury-severity distribution of cyclists involved in crashes (measured by emergency response level [ERL]) based on the obtained topic distributions together with additional variables like cycle flow. We identified six LDA topics which address buses and cycle paths, conflicts with parked cars, roundabouts and inadequate maintenance, fast-moving cars and lack of cycle path, school zones and heavy traffic, and intersections and interactions with vehicles. Cycle flow was found to be the highest impacter on ERL prediction. However, other factors also impacted ERLs, especially school zones and heavy traffic. The results bring new insights into safety perception and actual safety for cyclists. The results contribute to a novel procedure for the joint correlation analysis using machine learning techniques on self-reported textual data thereby providing a better tool for infrastructure planning. The findings show the importance of including perceived safety in crash modelling and that authorities should focus on safety around schools and in intersections in order to improve safety for cyclists in a urban environment.

Original languageEnglish
Article number106153
JournalSafety Science
Number of pages14
Publication statusPublished - 2023


  • Cyclist crashes
  • Latent dirichlet allocation
  • Multi-output neural network regression model
  • Natural language processing
  • New data sources
  • Perceived Safety


Dive into the research topics of 'Predicting injury-severity for cyclist crashes using natural language processing and neural network modelling'. Together they form a unique fingerprint.

Cite this