The good, the better and the challenging: Insights into predicting high-growth firms using machine learning

This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and...

Full description

Saved in:
Bibliographic Details
Main Authors: Sermet Pekin, Aykut Şengül
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Borsa Istanbul Review
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214845024001558
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832591747973120000
author Sermet Pekin
Aykut Şengül
author_facet Sermet Pekin
Aykut Şengül
author_sort Sermet Pekin
collection DOAJ
description This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and firm characteristics between 2009 and 2022 with 1,318,799 unique firms (averaging 554,178 annually), we evaluate the performance of each model using metrics such as MCC, ROC AUC, accuracy, precision, recall and F1-score. In our study, ROC AUC values ranged from 0.53 to 0.87 for employee-high growth and from 0.53 to 0.91 for turnover-high growth, depending on the method used. Our findings indicate that XGBoost achieves the highest performance, followed by Random Forest and Logistic Regression, demonstrating their effectiveness in distinguishing between high-growth and non-high-growth firms. Conversely, KNN and Naive Bayes yield lower accuracy. Furthermore, our findings reveal that growth opportunity emerges as the most significant factor in our study. This research contributes valuable insights to financial analysts and investors in identifying high-growth firms and underscores the potential of machine learning in economic prediction.
format Article
id doaj-art-11f792fbfb3a44a389cc72961fed34cf
institution Kabale University
issn 2214-8450
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Borsa Istanbul Review
spelling doaj-art-11f792fbfb3a44a389cc72961fed34cf2025-01-22T05:42:31ZengElsevierBorsa Istanbul Review2214-84502024-12-01244760The good, the better and the challenging: Insights into predicting high-growth firms using machine learningSermet Pekin0Aykut Şengül1Corresponding author.; Central Bank of the Republic of Türkiye, Research and Monetary Policy Department, TürkiyeCentral Bank of the Republic of Türkiye, Research and Monetary Policy Department, TürkiyeThis study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and firm characteristics between 2009 and 2022 with 1,318,799 unique firms (averaging 554,178 annually), we evaluate the performance of each model using metrics such as MCC, ROC AUC, accuracy, precision, recall and F1-score. In our study, ROC AUC values ranged from 0.53 to 0.87 for employee-high growth and from 0.53 to 0.91 for turnover-high growth, depending on the method used. Our findings indicate that XGBoost achieves the highest performance, followed by Random Forest and Logistic Regression, demonstrating their effectiveness in distinguishing between high-growth and non-high-growth firms. Conversely, KNN and Naive Bayes yield lower accuracy. Furthermore, our findings reveal that growth opportunity emerges as the most significant factor in our study. This research contributes valuable insights to financial analysts and investors in identifying high-growth firms and underscores the potential of machine learning in economic prediction.http://www.sciencedirect.com/science/article/pii/S2214845024001558C40C55C60C81L25
spellingShingle Sermet Pekin
Aykut Şengül
The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
Borsa Istanbul Review
C40
C55
C60
C81
L25
title The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_full The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_fullStr The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_full_unstemmed The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_short The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_sort good the better and the challenging insights into predicting high growth firms using machine learning
topic C40
C55
C60
C81
L25
url http://www.sciencedirect.com/science/article/pii/S2214845024001558
work_keys_str_mv AT sermetpekin thegoodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning
AT aykutsengul thegoodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning
AT sermetpekin goodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning
AT aykutsengul goodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning