A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis

A Survey of Data Partitioning and Sampling Methods to Support Big Data Analysis

Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis. In cluster computing, data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability. In this paper, we prese...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohammad Sultan Mahmud, Joshua Zhexue Huang, Salman Salloum, Tamer Z. Emara, Kuanishbay Sadatdiynov
Format:	Article
Language:	English
Published:	Tsinghua University Press 2020-06-01
Series:	Big Data Mining and Analytics
Subjects:	big data analysis data partitioning data sampling distributed and parallel computing approximate computing
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2019.9020015
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis
by: Xudong Sun, et al.
Published: (2023-06-01)

Comprehensive Survey of Big Data Mining Approaches in Cloud Systems
by: Zainab Salih Ageed, et al.
Published: (2021-04-01)

Big Data with Cloud Computing: Discussions and Challenges
by: Amanpreet Kaur Sandhu
Published: (2022-03-01)

Design and Application of a Data-computation Integrated Database for Meteorological Grid Data
by: Wang Shu, et al.
Published: (2025-01-01)

Super Partition: fast, flexible, and interpretable large-scale data reduction in R
by: Katelyn J. Queen, et al.
Published: (2025-01-01)

A Methodology of Real-Time Data Fusion for Localized Big Data Analytics
by: Sohail Jabbar, et al.
Published: (2018-01-01)

On the research for big data uses for public good purposes
by: Adeline Decuyper
Published: (2016-12-01)

Beyond boundaries: Charting the frontier of healthcare with big data and ai advancements in pharmacovigilance
by: Arohi Agarwal, et al.
Published: (2025-03-01)

BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse
by: Qiaowang Li, et al.
Published: (2025-02-01)

Features of the use of big data in the financial sector
by: F. O. Chernenkov
Published: (2024-08-01)

Mining Conditional Functional Dependency Rules on Big Data
by: Mingda Li, et al.
Published: (2020-03-01)

Optimizing healthcare big data performance through regional computing
by: Tariq Alsahfi, et al.
Published: (2025-01-01)

Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
by: Zhihua Li, et al.
Published: (2024-06-01)

Uso de metodologias ativas no ensino de Big Data e Data Analytics (BD/DA): uma análise sob a ótica dos discentes de ciências contábeis
by: Elvis Araujo Albertin, et al.
Published: (2023-03-01)

Big Data Analytics for Healthcare Industry: Impact, Applications, and Tools
by: Sunil Kumar, et al.
Published: (2019-03-01)

An improved K‐means algorithm for big data
by: Fatemeh Moodi, et al.
Published: (2022-02-01)

Review of prime issues in big data storage
by: Zhi Zhou, et al.
Published: (2021-12-01)

A Mini-Review of Machine Learning in Big Data Analytics: Applications, Challenges, and Prospects
by: Isaac Kofi Nti, et al.
Published: (2022-06-01)

QoE-Driven Big Data Management in Pervasive Edge Computing Environment
by: Qianyu Meng, et al.
Published: (2018-09-01)

A comprehensive review of big data applications
by: Soheil Fakheri
Published: (2022-03-01)

Quantum-inspired framework for big data analytics: evaluating the impact of movie trailers and its financial returns
by: Jaiteg Singh, et al.
Published: (2025-02-01)

Editorial: Visualizing big culture and history data
by: Florian Windhager, et al.
Published: (2025-02-01)

AI, big data, and robots for the evolution of biotechnology
by: Haseong Kim
Published: (2019-11-01)

BIG DATA: HOW CAN INFORMATION TECHNOLOGIES HELP THE STATISTICS SERVICE TO INCREASE EFFICIENCY OF CALCULATION OF THE CONSUMER PRICE INDEX
by: A. Mikhajlova
Published: (2018-04-01)

The use of big data in interdisciplinary research on example of the Greater Mediterranean macroregion
by: O. V. Yarmak, et al.
Published: (2022-09-01)

A Survey of Data Mining Implementation in Smart City Applications
by: Zainab Salih Ageed, et al.
Published: (2021-04-01)

DeepEye: An Automatic Big Data Visualization Framework
by: Xuedi Qin, et al.
Published: (2018-03-01)

Big Data Analytics For Organizations: Challenges and Opportunities and Its Effect on International Business Education
by: Twana Saeed Ali, et al.
Published: (2019-12-01)

Improving The Performance of Big Data Databases
by: Nzar Abdulqadir Ali, et al.
Published: (2019-12-01)

A Big Data Architecture for Digital Twin Creation of Railway Signals Based on Synthetic Data
by: Giulio Salierno, et al.
Published: (2024-01-01)

Researching Organic Solar Cells from the Perspective of Literature Big Data Analysis
by: Qing Wang, et al.
Published: (2025-01-01)

IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB
by: Pengyu Chen, et al.
Published: (2024-03-01)

Efficacy of Bluetooth-Based Data Collection for Road Traffic Analysis and Visualization Using Big Data Analytics
by: Ashish Rajeshwar Kulkarni, et al.
Published: (2023-06-01)

Big Data Governance Challenges Arising From Data Generated by Intelligent Systems Technologies: A Systematic Literature Review
by: Yunusa Adamu Bena, et al.
Published: (2025-01-01)

Influence of partitioning methods on computational cost of cfd simulations applied to hydrocyclones
by: Felipe Orlando da Costa, et al.
Published: (2020-10-01)

Providing a Framework for Implementing Agile Big Data-based Supply Chain (Case Study: FMCG Companies)
by: Hamed Nozari, et al.
Published: (2021-08-01)

Big Data Oriented Novel Background Subtraction Algorithm for Urban Surveillance Systems
by: Ling Hu, et al.
Published: (2018-06-01)

A Novel Clustering Technique for Efficient Clustering of Big Data in Hadoop Ecosystem
by: Sunil Kumar, et al.
Published: (2019-12-01)

Designing a Model for Implementing Digital Banking Policy Based on Using Big Data in Iranian Banking Industry
by: Rahmatollah Gholipour Souteh, et al.
Published: (2024-11-01)

Parallelizing the Computation of Grid Resistance to Measure the Strength of Skyline Tuples
by: Davide Martinenghi
Published: (2025-01-01)