A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network

Abdelfattah Abassi; Brahim Bakkas; Brahim Bakkas; Mostapha El Jai; Mostapha El Jai; Ahmed Arid; Hussain Benazza

doi:10.3844/jcssp.2024.700.707

Research Article Open Access

A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network

Abdelfattah Abassi¹, Brahim Bakkas^2,3, Mostapha El Jai^2,4, Ahmed Arid² and Hussain Benazza²

¹ Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
² Department of Computer Science, Ecole Nationale Supérieure d'Arts et Métiers (ENSAM-MEKNES), Moulay Ismail University, Meknes, Morocco
³ Department of Computer Science, Regional Center for Teaching and Training Professions, Meknes, Morocco
⁴ Euromed Center of Research, Euromed Polytechnic School, Euromed University, FEZ, Morocco

Abstract

In this study, we present a Multi-Split Cross-Strategy (MSC-Strategy) designed to leverage synthetic tabular data generated by a Conditional Generative Adversarial Network (CGAN). Our study aims to investigate the potential of synthetic data in comparison to real-world data for improving machine learning predictive results. Firstly, we develop a CGAN architecture tailored to generate synthetic tabular data, trained on a comprehensive real-world dataset. Secondly, we validate the synthetic data generated by the CGAN to ensure its statistical fidelity and resemblance to the distribution of real data. Finally, we selectively leverage a subset of the generated data and apply our strategy to create a new combined training set comprising the training set of real data and the chosen subset of generated data. To validate our approach, we employ six diverse regression models: Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), XGB Regressor (XGB), and Support Vector Regressor (SVR). Each model is trained and tested using a training set of real data, generated data, combined data (training set of real data and generated data), and data formed by our MSC strategy. Our findings indicate that the training set formed by our MSC strategy demonstrates remarkable predictive performance compared to real-world data and generated data, highlighting its ability to enhance the prediction of machine learning models using only a subset of generated data.

Journal of Computer Science

Volume 20 No. 7, 2024, 700-707

DOI: https://doi.org/10.3844/jcssp.2024.700.707

Submitted On: 7 February 2024 Published On: 9 April 2024

How to Cite: Abassi, A., Bakkas, B., Jai, M. E., Arid, A. & Benazza, H. (2024). A Multi-Split Cross-Strategy for Enhancing Machine Learning Algorithms Prediction Results with Data Generated by Conditional Generative Adversarial Network. Journal of Computer Science, 20(7), 700-707. https://doi.org/10.3844/jcssp.2024.700.707

Copyright: © 2024 Abdelfattah Abassi, Brahim Bakkas, Mostapha El Jai, Ahmed Arid and Hussain Benazza. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

599 Views
293 Downloads
0 Citations

Download

Keywords

Conditional Generative Adversarial Networks
Tabular Data Generation
Machine Learning