Detecting Ambiguous Phishing Certificates using Machine Learning

Kaspar Hageman, Sajad Homayoun, Sam Afzal-Houshmand, Christian D. Jensen, Jens M. Pedersen

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

910 Downloads (Pure)

Abstract

Recent phishing attacks have started to migrate to HTTP over TLS (HTTPS), making a phishing web page appear safe to the user’s browser despite its malicious purpose. This paper proposes new data features as well as machine learning-based solutions to predict digital certificates involved in HTTPS as phishing or benign certificates. In contrast to previous
works that consider this a binary classification problem, we take into account that a certificate can be partially benign and phishy simultaneously. We propose a multi-class classifier and a regressor to classify these ambiguous certificates, in addition to benign and phishing certificates, where the ‘phishyness’ of a certificate is expressed as a value between 0 and 1 for the regressor. We apply our method to a set of certificates obtained from certificate transparency logs and show that we can classify them with high performance. We extend our validation by evaluating the performance of the model over time, showing that our model generalizes over time on our training data set.
Original languageEnglish
Title of host publicationProceedings of 36th International Conference on Information Networking
Number of pages6
PublisherIEEE
Publication date2022
Publication statusPublished - 2022
Event36th International Conference on Information Networking - Jeju Island, Korea, Republic of
Duration: 12 Jan 202215 Jan 2022
http://icoin.org

Conference

Conference36th International Conference on Information Networking
Country/TerritoryKorea, Republic of
CityJeju Island
Period12/01/202215/01/2022
Internet address

Keywords

  • Digital Certificate
  • Phishing
  • Machine Learning
  • Feature Extraction

Fingerprint

Dive into the research topics of 'Detecting Ambiguous Phishing Certificates using Machine Learning'. Together they form a unique fingerprint.

Cite this