Fast binary logistic regression

This study presents a novel numerical approach that improves the training efficiency of binary logistic regression, a popular statistical model in the machine learning community. Our method achieves training times an order of magnitude faster than traditional logistic regression by employing a novel...

Full description

Saved in:
Bibliographic Details
Main Authors: Nurdan Ayse Saran, Fatih Nar
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2579.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832574392123523072
author Nurdan Ayse Saran
Fatih Nar
author_facet Nurdan Ayse Saran
Fatih Nar
author_sort Nurdan Ayse Saran
collection DOAJ
description This study presents a novel numerical approach that improves the training efficiency of binary logistic regression, a popular statistical model in the machine learning community. Our method achieves training times an order of magnitude faster than traditional logistic regression by employing a novel Soft-Plus approximation, which enables reformulation of logistic regression parameter estimation into matrix-vector form. We also adopt the Lf-norm penalty, which allows using fractional norms, including the L2-norm, L1-norm, and L0-norm, to regularize the model parameters. We put Lf-norm formulation in matrix-vector form, providing flexibility to include or exclude penalization of the intercept term when applying regularization. Furthermore, to address the common problem of collinear features, we apply singular value decomposition (SVD), resulting in a low-rank representation commonly used to reduce computational complexity while preserving essential features and mitigating noise. Moreover, our approach incorporates a randomized SVD alongside a newly developed SVD with row reduction (SVD-RR) method, which aims to manage datasets with many rows and features efficiently. This computational efficiency is crucial in developing a generalized model that requires repeated training over various parameters to balance bias and variance. We also demonstrate the effectiveness of our fast binary logistic regression (FBLR) method on various datasets from the OpenML repository in addition to synthetic datasets.
format Article
id doaj-art-a30209b30fc344df8a321d788295ec23
institution Kabale University
issn 2376-5992
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-a30209b30fc344df8a321d788295ec232025-02-01T15:05:17ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e257910.7717/peerj-cs.2579Fast binary logistic regressionNurdan Ayse Saran0Fatih Nar1Department of Computer Engineering, Cankaya University, Ankara, TürkiyeDepartment of Computer Engineering, Ankara Yildirim Beyazit University, Ankara, TürkiyeThis study presents a novel numerical approach that improves the training efficiency of binary logistic regression, a popular statistical model in the machine learning community. Our method achieves training times an order of magnitude faster than traditional logistic regression by employing a novel Soft-Plus approximation, which enables reformulation of logistic regression parameter estimation into matrix-vector form. We also adopt the Lf-norm penalty, which allows using fractional norms, including the L2-norm, L1-norm, and L0-norm, to regularize the model parameters. We put Lf-norm formulation in matrix-vector form, providing flexibility to include or exclude penalization of the intercept term when applying regularization. Furthermore, to address the common problem of collinear features, we apply singular value decomposition (SVD), resulting in a low-rank representation commonly used to reduce computational complexity while preserving essential features and mitigating noise. Moreover, our approach incorporates a randomized SVD alongside a newly developed SVD with row reduction (SVD-RR) method, which aims to manage datasets with many rows and features efficiently. This computational efficiency is crucial in developing a generalized model that requires repeated training over various parameters to balance bias and variance. We also demonstrate the effectiveness of our fast binary logistic regression (FBLR) method on various datasets from the OpenML repository in addition to synthetic datasets.https://peerj.com/articles/cs-2579.pdfLogistic regressionLow-rankSingular value decompositionLf-norm regularization
spellingShingle Nurdan Ayse Saran
Fatih Nar
Fast binary logistic regression
PeerJ Computer Science
Logistic regression
Low-rank
Singular value decomposition
Lf-norm regularization
title Fast binary logistic regression
title_full Fast binary logistic regression
title_fullStr Fast binary logistic regression
title_full_unstemmed Fast binary logistic regression
title_short Fast binary logistic regression
title_sort fast binary logistic regression
topic Logistic regression
Low-rank
Singular value decomposition
Lf-norm regularization
url https://peerj.com/articles/cs-2579.pdf
work_keys_str_mv AT nurdanaysesaran fastbinarylogisticregression
AT fatihnar fastbinarylogisticregression