A Massively Parallel SMC Sampler for Decision Trees

Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially lon...

Full description

Saved in:

Bibliographic Details
Main Authors:	Efthyvoulos Drousiotis, Alessandro Varsi, Alexander M. Phillips, Simon Maskell, Paul G. Spirakis
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Algorithms
Subjects:	parallel algorithms machine learning Bayesian decision trees sequential Monte Carlo samplers Markov Chain Monte Carlo shared memory
Online Access:	https://www.mdpi.com/1999-4893/18/1/14
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589431768350720
author	Efthyvoulos Drousiotis Alessandro Varsi Alexander M. Phillips Simon Maskell Paul G. Spirakis
author_facet	Efthyvoulos Drousiotis Alessandro Varsi Alexander M. Phillips Simon Maskell Paul G. Spirakis
author_sort	Efthyvoulos Drousiotis
collection	DOAJ
description	Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A widely used approach to reducing an algorithm’s runtime is to employ modern multi-core computer architectures, either with shared memory (SM) or distributed memory (DM), and use parallel computing to accelerate the algorithm. However, the inherent sequential nature of MCMC makes it unsuitable for parallel implementation unless the accuracy is sacrificed. This issue is particularly evident in DM architectures, which normally provide access to larger numbers of cores than SM. Sequential Monte Carlo (SMC) samplers are a parallel alternative to MCMC, which do not trade off accuracy for parallelism. However, the performance of SMC samplers in the context of DTs is underexplored, and the parallelization is complicated by the challenges in parallelizing its bottleneck, namely redistribution, especially on variable-size data types such as DTs. In this work, we study the problem of parallelizing SMC in the context of DTs both on SM and DM. On both memory architectures, we show that the proposed parallelization strategies achieve asymptotically optimal <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow></semantics></math></inline-formula> time complexity. Numerical results are presented for a 32-core SM machine and a 256-core DM cluster. For both computer architectures, the experimental results show that our approach has comparable or better accuracy than MCMC but runs up to 51 times faster on SM and 640 times faster on DM. In this paper, we share the GitHub link to the source code.
format	Article
id	doaj-art-d84f3fe3cd6742cd8877d01ef04e1747
institution	Kabale University
issn	1999-4893
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Algorithms
spelling	doaj-art-d84f3fe3cd6742cd8877d01ef04e17472025-01-24T13:17:28ZengMDPI AGAlgorithms1999-48932025-01-011811410.3390/a18010014A Massively Parallel SMC Sampler for Decision TreesEfthyvoulos Drousiotis0Alessandro Varsi1Alexander M. Phillips2Simon Maskell3Paul G. Spirakis4Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool L69 3GJ, UKDepartment of Electrical Engineering and Electronics, University of Liverpool, Liverpool L69 3GJ, UKDepartment of Electrical Engineering and Electronics, University of Liverpool, Liverpool L69 3GJ, UKDepartment of Electrical Engineering and Electronics, University of Liverpool, Liverpool L69 3GJ, UKDepartment of Computer Science, University of Liverpool, Liverpool L69 3BX, UKBayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A widely used approach to reducing an algorithm’s runtime is to employ modern multi-core computer architectures, either with shared memory (SM) or distributed memory (DM), and use parallel computing to accelerate the algorithm. However, the inherent sequential nature of MCMC makes it unsuitable for parallel implementation unless the accuracy is sacrificed. This issue is particularly evident in DM architectures, which normally provide access to larger numbers of cores than SM. Sequential Monte Carlo (SMC) samplers are a parallel alternative to MCMC, which do not trade off accuracy for parallelism. However, the performance of SMC samplers in the context of DTs is underexplored, and the parallelization is complicated by the challenges in parallelizing its bottleneck, namely redistribution, especially on variable-size data types such as DTs. In this work, we study the problem of parallelizing SMC in the context of DTs both on SM and DM. On both memory architectures, we show that the proposed parallelization strategies achieve asymptotically optimal <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>O</mi><mo>(</mo><msub><mo form="prefix">log</mo><mn>2</mn></msub><mi>N</mi><mo>)</mo></mrow></semantics></math></inline-formula> time complexity. Numerical results are presented for a 32-core SM machine and a 256-core DM cluster. For both computer architectures, the experimental results show that our approach has comparable or better accuracy than MCMC but runs up to 51 times faster on SM and 640 times faster on DM. In this paper, we share the GitHub link to the source code.https://www.mdpi.com/1999-4893/18/1/14parallel algorithmsmachine learningBayesian decision treessequential Monte Carlo samplersMarkov Chain Monte Carloshared memory
spellingShingle	Efthyvoulos Drousiotis Alessandro Varsi Alexander M. Phillips Simon Maskell Paul G. Spirakis A Massively Parallel SMC Sampler for Decision Trees Algorithms parallel algorithms machine learning Bayesian decision trees sequential Monte Carlo samplers Markov Chain Monte Carlo shared memory
title	A Massively Parallel SMC Sampler for Decision Trees
title_full	A Massively Parallel SMC Sampler for Decision Trees
title_fullStr	A Massively Parallel SMC Sampler for Decision Trees
title_full_unstemmed	A Massively Parallel SMC Sampler for Decision Trees
title_short	A Massively Parallel SMC Sampler for Decision Trees
title_sort	massively parallel smc sampler for decision trees
topic	parallel algorithms machine learning Bayesian decision trees sequential Monte Carlo samplers Markov Chain Monte Carlo shared memory
url	https://www.mdpi.com/1999-4893/18/1/14
work_keys_str_mv	AT efthyvoulosdrousiotis amassivelyparallelsmcsamplerfordecisiontrees AT alessandrovarsi amassivelyparallelsmcsamplerfordecisiontrees AT alexandermphillips amassivelyparallelsmcsamplerfordecisiontrees AT simonmaskell amassivelyparallelsmcsamplerfordecisiontrees AT paulgspirakis amassivelyparallelsmcsamplerfordecisiontrees AT efthyvoulosdrousiotis massivelyparallelsmcsamplerfordecisiontrees AT alessandrovarsi massivelyparallelsmcsamplerfordecisiontrees AT alexandermphillips massivelyparallelsmcsamplerfordecisiontrees AT simonmaskell massivelyparallelsmcsamplerfordecisiontrees AT paulgspirakis massivelyparallelsmcsamplerfordecisiontrees

A Massively Parallel SMC Sampler for Decision Trees

Similar Items