Incremental Closed Frequent Itemsets Mining-Based Approach Using Maximal Candidates

Incremental frequent itemset mining aims to efficiently update frequent itemsets without recalculating them from scratch, making it suitable for streaming data and real-time analytics. In the incremental frequent itemset mining approach, approximate methods are popular approaches that attempt to imp...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed A. Al-Zeiadi, Basheer M. Al-Maqaleh
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10892134/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Incremental frequent itemset mining aims to efficiently update frequent itemsets without recalculating them from scratch, making it suitable for streaming data and real-time analytics. In the incremental frequent itemset mining approach, approximate methods are popular approaches that attempt to improve the accuracy of global frequent itemsets and minimize computational overhead by removing candidates at an early stage of the mining process. However, these methods are hampered by the generation of a large number of candidates, imprecise support count estimation, and the lack of utilization of optimal methods in the data-splitting process. To address these challenges, a Maximal Candidates-based Incremental Closed Frequent Itemsets Mining approach called MC-ICFIM is proposed. This approach minimizes computational costs by reducing the generation of candidate itemsets through the discovery of closed itemsets using maximal candidates, rather than generating all frequent itemsets. The proposed approach uses the train_test_split method to randomly divide a dataset into partitions based on a specified ratio. This aids in the effective identification of closed frequent itemsets that are consistent across different partitions. This approach reduces the search space and computational costs for incremental updates by applying similarity measurements to local maximal candidates instead of to local closed candidates, thereby facilitating the pruning of numerous uninteresting candidates. The experimental findings revealed that this approach is more accurate and efficient than existing methods for large and dynamic datasets.
ISSN:2169-3536