Comparative Analysis of SMOTE and ROSE Oversampling Techniques for kNN-Based Autonomous Vehicle Behavior Modeling

In this paper, we present a comparative analysis of Synthetic Minority Oversampling TEchnique (SMOTE) and Random OverSampling Examples (ROSE) oversampling techniques for K Nearest Neigbhors KNN-based autonomous vehicle behavior modeling. We address the challenges posed by imbalanced and mixed datase...

Full description

Saved in:
Bibliographic Details
Main Authors: Celine Serbouh Touazi, Iness Ahriz, Ndeye Niang, Alain Piperno
Format: Article
Language:English
Published: Croatian Communications and Information Society (CCIS) 2025-03-01
Series:Journal of Communications Software and Systems
Subjects:
Online Access:https://jcoms.fesb.unist.hr/10.24138/jcomss-2024-0121/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we present a comparative analysis of Synthetic Minority Oversampling TEchnique (SMOTE) and Random OverSampling Examples (ROSE) oversampling techniques for K Nearest Neigbhors KNN-based autonomous vehicle behavior modeling. We address the challenges posed by imbalanced and mixed datasets in the context of autonomous vehicle testing, where the majority of test outcomes are classified as ”OK” (safe) and fewer as ”KO” (unsafe). We propose an enhanced approach that extends our previous work by incorporating ROSE as an alternative to SMOTE for generating synthetic samples. We integrate these resampling techniques with Leave- One-Out Cross-Validation (LOO-CV), applying resampling at each iteration to ensure data balancing is tailored to each training set. Additionally, we investigate the impact of different encoding strategies for categorical variables, including OneHot, binary encoding, and Factor Analysis of Mixed Data (FAMD). Our research aims to develop a robust classification model capable of accurately predicting autonomous vehicle behavior while effectively managing class imbalance and mixed data types, despite the limited availability of data due to costly and timeconsuming testing procedures.
ISSN:1845-6421
1846-6079