Addressing data scarcity in ML-based failure-cause identification in optical networks through generative models

Meadhbh Healy, Andreas Baum, Francesco Musumeci*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We consider the issue of data scarcity with class imbalance in failure-cause identification for optical fiber systems using Machine Learning (ML) techniques. We use an open dataset comprising of real Optical Time-Domain Reflectometer (OTDR) traces which have been gathered in an artificial setup spanning tens of kilometers, consistent with a long-haul network. Whilst ML methods have shown satisfactory results for automating the process of identifying failure causes in optical fiber networks, the solutions are generally strongly dependent on available labeled datasets, and require extensive data to train and validate any findings. However, in the case of failure management in optical networks, building a valuable dataset with sufficiently informative samples is in general a hard process, due to the fact that, by nature, failures occur infrequently. As such, data-labeling is time and resource intensive for domain experts. We therefore seek to mitigate these issues by exploring two generative models, namely, conditional Generative Adversarial Network (cGAN) and conditional Variational Autoencoder (cVAE), to balance the number of failures samples in a multiclass dataset. In order to balance the dataset with accurate synthetic data across the different failure causes, we adopt generative models that are conditioned on the failure classes, the SNR level of the trace and the maximum amplitude of the signal. These approaches are compared to Synthetic Minority Over-sampling TEchnique (SMOTE). We compare our approaches by training our datasets using an autoencoder classifier and testing them against three holdout datasets. Results show that, with the cGAN and cVAE, failure-cause identification can be improved by more than 5% in terms of global accuracy when compared to the imbalanced dataset, and in particular for scarcely-represented failure classes, our generative models provide an improvement in the f1 scores of over 50%.
Original languageEnglish
Article number104137
JournalOptical Fiber Technology
Volume90
Number of pages12
ISSN1068-5200
DOIs
Publication statusPublished - 2025

Keywords

  • Conditional GAN
  • Conditional VAE
  • Data augmentation
  • Failure-cause identification
  • OTDR
  • Optical networks
  • SMOTE

Fingerprint

Dive into the research topics of 'Addressing data scarcity in ML-based failure-cause identification in optical networks through generative models'. Together they form a unique fingerprint.

Cite this