The good, the better and the challenging: Insights into predicting high-growth firms using machine learning

This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sermet Pekin, Aykut Şengül
Format:	Article
Language:	English
Published:	Elsevier 2024-12-01
Series:	Borsa Istanbul Review
Subjects:	C40 C55 C60 C81 L25
Online Access:	http://www.sciencedirect.com/science/article/pii/S2214845024001558
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832591747973120000
author	Sermet Pekin Aykut Şengül
author_facet	Sermet Pekin Aykut Şengül
author_sort	Sermet Pekin
collection	DOAJ
description	This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and firm characteristics between 2009 and 2022 with 1,318,799 unique firms (averaging 554,178 annually), we evaluate the performance of each model using metrics such as MCC, ROC AUC, accuracy, precision, recall and F1-score. In our study, ROC AUC values ranged from 0.53 to 0.87 for employee-high growth and from 0.53 to 0.91 for turnover-high growth, depending on the method used. Our findings indicate that XGBoost achieves the highest performance, followed by Random Forest and Logistic Regression, demonstrating their effectiveness in distinguishing between high-growth and non-high-growth firms. Conversely, KNN and Naive Bayes yield lower accuracy. Furthermore, our findings reveal that growth opportunity emerges as the most significant factor in our study. This research contributes valuable insights to financial analysts and investors in identifying high-growth firms and underscores the potential of machine learning in economic prediction.
format	Article
id	doaj-art-11f792fbfb3a44a389cc72961fed34cf
institution	Kabale University
issn	2214-8450
language	English
publishDate	2024-12-01
publisher	Elsevier
record_format	Article
series	Borsa Istanbul Review
spelling	doaj-art-11f792fbfb3a44a389cc72961fed34cf2025-01-22T05:42:31ZengElsevierBorsa Istanbul Review2214-84502024-12-01244760The good, the better and the challenging: Insights into predicting high-growth firms using machine learningSermet Pekin0Aykut Şengül1Corresponding author.; Central Bank of the Republic of Türkiye, Research and Monetary Policy Department, TürkiyeCentral Bank of the Republic of Türkiye, Research and Monetary Policy Department, TürkiyeThis study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and firm characteristics between 2009 and 2022 with 1,318,799 unique firms (averaging 554,178 annually), we evaluate the performance of each model using metrics such as MCC, ROC AUC, accuracy, precision, recall and F1-score. In our study, ROC AUC values ranged from 0.53 to 0.87 for employee-high growth and from 0.53 to 0.91 for turnover-high growth, depending on the method used. Our findings indicate that XGBoost achieves the highest performance, followed by Random Forest and Logistic Regression, demonstrating their effectiveness in distinguishing between high-growth and non-high-growth firms. Conversely, KNN and Naive Bayes yield lower accuracy. Furthermore, our findings reveal that growth opportunity emerges as the most significant factor in our study. This research contributes valuable insights to financial analysts and investors in identifying high-growth firms and underscores the potential of machine learning in economic prediction.http://www.sciencedirect.com/science/article/pii/S2214845024001558C40C55C60C81L25
spellingShingle	Sermet Pekin Aykut Şengül The good, the better and the challenging: Insights into predicting high-growth firms using machine learning Borsa Istanbul Review C40 C55 C60 C81 L25
title	The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_full	The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_fullStr	The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_full_unstemmed	The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_short	The good, the better and the challenging: Insights into predicting high-growth firms using machine learning
title_sort	good the better and the challenging insights into predicting high growth firms using machine learning
topic	C40 C55 C60 C81 L25
url	http://www.sciencedirect.com/science/article/pii/S2214845024001558
work_keys_str_mv	AT sermetpekin thegoodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning AT aykutsengul thegoodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning AT sermetpekin goodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning AT aykutsengul goodthebetterandthechallenginginsightsintopredictinghighgrowthfirmsusingmachinelearning

The good, the better and the challenging: Insights into predicting high-growth firms using machine learning

Similar Items