Enhancing Binary Classification Performance in Biomedical Datasets: Regularized ELM with SMOTE and Quantile Transforms Focused on Breast Cancer Analysis

Using microarray datasets, this research investigation addresses the problem of unbalanced data in binary classification tasks. The objective is to increase classification performance by adding Extreme Learning Machine (ELM) regularization, as well as Synthetic Minority Over-sampling Technique (SMOT...

Full description

Saved in:
Bibliographic Details
Main Authors: Brilliant Friezka Aina, Meta Kallista, Ig. Prasetya Dwi Wibawa, Ginaldi Ari Nugroho, Ivana Meiska, Syifa Melinda Naf’an
Format: Article
Language:English
Published: Mathematics Department UIN Maulana Malik Ibrahim Malang 2024-11-01
Series:Cauchy: Jurnal Matematika Murni dan Aplikasi
Subjects:
Online Access:https://ejournal.uin-malang.ac.id/index.php/Math/article/view/28785
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Using microarray datasets, this research investigation addresses the problem of unbalanced data in binary classification tasks. The objective is to increase classification performance by adding Extreme Learning Machine (ELM) regularization, as well as Synthetic Minority Over-sampling Technique (SMOTE) for data over-sampling and Quantile Transformer for data scaling. The study began with gathering important biological datasets from reputable sources such as UCI and Kaggle, including Pima Indian Diabetes, Heart Disease, and Wisconsin Breast Cancer. SMOTE was employed to solve the difficulty of data imbalance in the preparation of the dataset. The data was then separated into training (80%) and testing (20%) sets before being scaled using Quantile Transformation. To boost accuracy, ELMs were employed with an emphasis on introducing regularization techniques. Quantile Transforms are used to generate a Gaussian or uniform probability distribution from numerical input variables. Regularized ELM (R-ELM) surpasses ELM in terms of AUC, despite ELM's faster calculation time. The final selection of the regularization parameter (C) in R-ELM influences the model's performance and calculation time. Overall, R-ELM with SMOTE produces encouraging results when it comes to effectively categorizing biological dataset properties. A subsequent investigation and validation of additional datasets, however, are necessary to establish its generalizability and robustness.
ISSN:2086-0382
2477-3344