The good, the better and the challenging: Insights into predicting high-growth firms using machine learning

This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and...

Full description

Saved in:
Bibliographic Details
Main Authors: Sermet Pekin, Aykut Şengül
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Borsa Istanbul Review
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214845024001558
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study aims to classify high-growth firms using several machine learning algorithms, including K-Nearest Neighbors, Logistic Regression with L1 (Lasso) and L2 (Ridge) Regularization, XGBoost, Gradient Descent, Naive Bayes and Random Forest. Leveraging a dataset composed of financial metrics and firm characteristics between 2009 and 2022 with 1,318,799 unique firms (averaging 554,178 annually), we evaluate the performance of each model using metrics such as MCC, ROC AUC, accuracy, precision, recall and F1-score. In our study, ROC AUC values ranged from 0.53 to 0.87 for employee-high growth and from 0.53 to 0.91 for turnover-high growth, depending on the method used. Our findings indicate that XGBoost achieves the highest performance, followed by Random Forest and Logistic Regression, demonstrating their effectiveness in distinguishing between high-growth and non-high-growth firms. Conversely, KNN and Naive Bayes yield lower accuracy. Furthermore, our findings reveal that growth opportunity emerges as the most significant factor in our study. This research contributes valuable insights to financial analysts and investors in identifying high-growth firms and underscores the potential of machine learning in economic prediction.
ISSN:2214-8450