Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling

In this paper, we present a comparative analysis of Synthetic Minority Oversampling TEchnique (SMOTE) and Random OverSampling Examples (ROSE) oversampling techniques for K Nearest Neigbhors KNN-based autonomous vehicle behavior modeling. We address the challenges posed by imbalanced and mixed datase...

Full description

Saved in:
Bibliographic Details
Main Authors: Celine Serbouh Touazi, Iness Ahriz, Ndeye Niang, Alain Piperno
Format: Article
Language:English
Published: Croatian Communications and Information Society (CCIS) 2025-03-01
Series:Journal of Communications Software and Systems
Subjects:
Online Access:https://jcoms.fesb.unist.hr/10.24138/jcomss-2024-0121/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850155566565949440
author Celine Serbouh Touazi
Iness Ahriz
Ndeye Niang
Alain Piperno
author_facet Celine Serbouh Touazi
Iness Ahriz
Ndeye Niang
Alain Piperno
author_sort Celine Serbouh Touazi
collection DOAJ
description In this paper, we present a comparative analysis of Synthetic Minority Oversampling TEchnique (SMOTE) and Random OverSampling Examples (ROSE) oversampling techniques for K Nearest Neigbhors KNN-based autonomous vehicle behavior modeling. We address the challenges posed by imbalanced and mixed datasets in the context of autonomous vehicle testing, where the majority of test outcomes are classified as ”OK” (safe) and fewer as ”KO” (unsafe). We propose an enhanced approach that extends our previous work by incorporating ROSE as an alternative to SMOTE for generating synthetic samples. We integrate these resampling techniques with Leave- One-Out Cross-Validation (LOO-CV), applying resampling at each iteration to ensure data balancing is tailored to each training set. Additionally, we investigate the impact of different encoding strategies for categorical variables, including OneHot, binary encoding, and Factor Analysis of Mixed Data (FAMD). Our research aims to develop a robust classification model capable of accurately predicting autonomous vehicle behavior while effectively managing class imbalance and mixed data types, despite the limited availability of data due to costly and timeconsuming testing procedures.
format Article
id doaj-art-8cdaa2da4e42447fa0c35c1fd1b63fbd
institution OA Journals
issn 1845-6421
1846-6079
language English
publishDate 2025-03-01
publisher Croatian Communications and Information Society (CCIS)
record_format Article
series Journal of Communications Software and Systems
spelling doaj-art-8cdaa2da4e42447fa0c35c1fd1b63fbd2025-08-20T02:24:51ZengCroatian Communications and Information Society (CCIS)Journal of Communications Software and Systems1845-64211846-60792025-03-0121213214310.24138/jcomss-2024-0121Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior ModelingCeline Serbouh TouaziIness AhrizNdeye NiangAlain PipernoIn this paper, we present a comparative analysis of Synthetic Minority Oversampling TEchnique (SMOTE) and Random OverSampling Examples (ROSE) oversampling techniques for K Nearest Neigbhors KNN-based autonomous vehicle behavior modeling. We address the challenges posed by imbalanced and mixed datasets in the context of autonomous vehicle testing, where the majority of test outcomes are classified as ”OK” (safe) and fewer as ”KO” (unsafe). We propose an enhanced approach that extends our previous work by incorporating ROSE as an alternative to SMOTE for generating synthetic samples. We integrate these resampling techniques with Leave- One-Out Cross-Validation (LOO-CV), applying resampling at each iteration to ensure data balancing is tailored to each training set. Additionally, we investigate the impact of different encoding strategies for categorical variables, including OneHot, binary encoding, and Factor Analysis of Mixed Data (FAMD). Our research aims to develop a robust classification model capable of accurately predicting autonomous vehicle behavior while effectively managing class imbalance and mixed data types, despite the limited availability of data due to costly and timeconsuming testing procedures.https://jcoms.fesb.unist.hr/10.24138/jcomss-2024-0121/iaautonomous vehiclessmoterose
spellingShingle Celine Serbouh Touazi
Iness Ahriz
Ndeye Niang
Alain Piperno
Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
Journal of Communications Software and Systems
ia
autonomous vehicles
smote
rose
title Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
title_full Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
title_fullStr Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
title_full_unstemmed Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
title_short Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling
title_sort comparative analysis of smote and rose oversampling techniques for knn based autonomous vehicle behavior modeling
topic ia
autonomous vehicles
smote
rose
url https://jcoms.fesb.unist.hr/10.24138/jcomss-2024-0121/
work_keys_str_mv AT celineserbouhtouazi comparativeanalysisofsmoteandroseoversamplingtechniquesforknnbasedautonomousvehiclebehaviormodeling
AT inessahriz comparativeanalysisofsmoteandroseoversamplingtechniquesforknnbasedautonomousvehiclebehaviormodeling
AT ndeyeniang comparativeanalysisofsmoteandroseoversamplingtechniquesforknnbasedautonomousvehiclebehaviormodeling
AT alainpiperno comparativeanalysisofsmoteandroseoversamplingtechniquesforknnbasedautonomousvehiclebehaviormodeling