PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS

The random forests algorithm proposed by Breiman is an ensemble-based approach with very high accuracy. The learning and classification tasks of a set of decision trees take a lot of time, make it intractable when dealing with very large datasets. There is a need to scale up the random forests algor...

Full description

Saved in:
Bibliographic Details
Main Authors: Do Thanh Nghi, Pham Nguyen Khang, Nguyen Van Hoa, Ly Hoang Trong
Format: Article
Language:English
Published: Dalat University 2013-06-01
Series:Tạp chí Khoa học Đại học Đà Lạt
Subjects:
Online Access:https://tckh.dlu.edu.vn/index.php/tckhdhdl/article/view/247
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570833414914048
author Do Thanh Nghi
Pham Nguyen Khang
Nguyen Van Hoa
Ly Hoang Trong
author_facet Do Thanh Nghi
Pham Nguyen Khang
Nguyen Van Hoa
Ly Hoang Trong
author_sort Do Thanh Nghi
collection DOAJ
description The random forests algorithm proposed by Breiman is an ensemble-based approach with very high accuracy. The learning and classification tasks of a set of decision trees take a lot of time, make it intractable when dealing with very large datasets. There is a need to scale up the random forests algorithm to handle massive datasets. We propose parallel algorithms of random forests to take into account the benefits of Grids computing. These algorithms improve training and classification time compared with the original ones. The experimental results on large datasets including Forest cover type,KDD Cup 1999, Connect-4 from the UCI data repository showed that the training and classification time of parallel algorithms are significantly reduced.
format Article
id doaj-art-6359738de3634b1c9cb675977c075c46
institution Kabale University
issn 0866-787X
language English
publishDate 2013-06-01
publisher Dalat University
record_format Article
series Tạp chí Khoa học Đại học Đà Lạt
spelling doaj-art-6359738de3634b1c9cb675977c075c462025-02-02T13:53:50ZengDalat UniversityTạp chí Khoa học Đại học Đà Lạt0866-787X2013-06-013210.37569/DalatUniversity.3.2.247(2013)PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETSDo Thanh Nghi0Pham Nguyen Khang1Nguyen Van Hoa2Ly Hoang Trong3College of Information Technology, Cantho UniversityCollege of Information Technology, Cantho UniversityFaculty of Technology, Engineering and Environment, Angiang UniversityCollege of Information Technology, Cantho UniversityThe random forests algorithm proposed by Breiman is an ensemble-based approach with very high accuracy. The learning and classification tasks of a set of decision trees take a lot of time, make it intractable when dealing with very large datasets. There is a need to scale up the random forests algorithm to handle massive datasets. We propose parallel algorithms of random forests to take into account the benefits of Grids computing. These algorithms improve training and classification time compared with the original ones. The experimental results on large datasets including Forest cover type,KDD Cup 1999, Connect-4 from the UCI data repository showed that the training and classification time of parallel algorithms are significantly reduced.https://tckh.dlu.edu.vn/index.php/tckhdhdl/article/view/247Random forestDecision treeBaggingBoostingMPIGrids.
spellingShingle Do Thanh Nghi
Pham Nguyen Khang
Nguyen Van Hoa
Ly Hoang Trong
PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
Tạp chí Khoa học Đại học Đà Lạt
Random forest
Decision tree
Bagging
Boosting
MPI
Grids.
title PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
title_full PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
title_fullStr PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
title_full_unstemmed PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
title_short PARALLEL ALGORITHMS OF RANDOM FORESTS FOR CLASSIFYING VERY LARGE DATASETS
title_sort parallel algorithms of random forests for classifying very large datasets
topic Random forest
Decision tree
Bagging
Boosting
MPI
Grids.
url https://tckh.dlu.edu.vn/index.php/tckhdhdl/article/view/247
work_keys_str_mv AT dothanhnghi parallelalgorithmsofrandomforestsforclassifyingverylargedatasets
AT phamnguyenkhang parallelalgorithmsofrandomforestsforclassifyingverylargedatasets
AT nguyenvanhoa parallelalgorithmsofrandomforestsforclassifyingverylargedatasets
AT lyhoangtrong parallelalgorithmsofrandomforestsforclassifyingverylargedatasets