Options
2024
Journal Article
Title
Experimental dataset for developing and testing ML models in optical communication systems
Abstract
Due to the scarcity of diverse and well-organized public datasets, individual research organizations are often forced to develop and utilize their own datasets. However, the utilization of machine learning (ML) models in optical communications and networks heavily depends on the existence of high-quality datasets, especially covering the various parameters to be optimized in wavelength-division multiplexing (WDM) systems. In this work, we present a public dataset for developing and testing ML models. The dataset is developed in a laboratory setting and includes 12,672 samples including data points with different modulation formats, symbol rates, distances, WDM channel allocation profiles, etc. Each data point offers more than 60 features, revealing almost every aspect of the transmission setup. Moreover, we provide optical spectra of the entire C-band as well as a constellation diagram of the channel under test for all the data points. The diversity and extensiveness of the dataset alongside a well-structured document would allow plenty of use-cases and studies to be carried out covering quality of transmission (QoT) studies, optical spectrum analysis, constellation diagram modeling, digital twin evaluation, etc. Similar to our previous efforts, the current dataset aims to facilitate collaboration by offering a way for fair comparison of research outcomes in data analysis within the domain of optical communication systems.
Author(s)