can-sleuth: Investigating and Evaluating Automotive Intrusion Detection Datasets

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

3 Downloads (Pure)


The modern automobile is a network—specifically, a controller area network (CAN)—of computers. Automotive computers manage the engine (e.g., fuel injection), the transmission (e.g., automatic shifting), the vehicle speed (e.g., cruise control), and many, many more systems. Therefore, a vehicle’s CAN bus is safety critical; by design, it is robust, reliable, and error tolerant. Unfortunately, it is not secure; it was developed in the 1980s, and, at that time, it was a closed system—no Internet access. The modern automobile is not a closed system, yet the CAN bus remains insecure. Automotive researchers are gravitating toward intrusion detection as one possible solution to the problem of automotive [in]security. To build and evaluate an intrusion detection system (IDS), however, researchers need adequate training and testing data. In this paper, we investigate and evaluate the following automotive intrusion detection datasets: (1) the HCRL Car Hacking dataset, (2) the HCRL Survival Analysis dataset, and (3) the can-train-and-test dataset. The HCRL Car Hacking dataset (hcrl-ch) and the HCRL Survival Analysis dataset (hcrl-sa) are well-established in the literature, whereas the can-train-and-test dataset is a promising new dataset. First, we investigate the can-train-and-test dataset—in particular, we evaluate the impacts of various features on the performance of sixteen machine learning IDSs. Second, we compare can-train-and-test to hcrl-ch and hcrl-sa. We find that, compared to the two established datasets, can-train-and-test provides new and greater insights to researchers interested in automotive intrusion detection, automotive firewalls & filtering, and more. With an order of magnitude more training and testing data, can-train-and-test enables the data-intensive machine learning models to demonstrate their full potential, with eight of the sixteen models achieving an average F1-score above 0.9. Moreover, can-train-and-test maintains ample differentiation power; the standard deviation of the models’ average F1-scores was 0.2247, which exceeds the standard deviations of hcrl-ch (0.2202) and hcrl-sa (0.2243).
Original languageEnglish
Title of host publicationProceedings of the EICC 2024: European Interdisciplinary Cybersecurity Conference
PublisherAssociation for Computing Machinery
Publication date2024
ISBN (Electronic)979-8-4007-1651-5
Publication statusPublished - 2024
EventEuropean Interdisciplinary Cybersecurity Conference 2024 - Xanthi, Greece
Duration: 5 Jun 20246 Jun 2024


ConferenceEuropean Interdisciplinary Cybersecurity Conference 2024


Dive into the research topics of 'can-sleuth: Investigating and Evaluating Automotive Intrusion Detection Datasets'. Together they form a unique fingerprint.

Cite this