DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data

The processing of big data is a hotspot in the scientific research. Data on the Internet is very large and also very important for the scientific researchers, so the capture and store of Internet data is a priority among priorities. The traditional single-host web spider and data store approaches ha...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu Xu, Jia Zhao, Gaochao Xu, Yan Ding, Yunmeng Dong
Format: Article
Language:English
Published: Wiley 2014-11-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1155/2014/430848
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547842132017152
author Xu Xu
Jia Zhao
Gaochao Xu
Yan Ding
Yunmeng Dong
author_facet Xu Xu
Jia Zhao
Gaochao Xu
Yan Ding
Yunmeng Dong
author_sort Xu Xu
collection DOAJ
description The processing of big data is a hotspot in the scientific research. Data on the Internet is very large and also very important for the scientific researchers, so the capture and store of Internet data is a priority among priorities. The traditional single-host web spider and data store approaches have some problems such as low efficiency and large memory requirement, so this paper proposes a big data store-retrieve approach DSMC (distributed store-retrieve approach using MapReduce model and community detection) based on distributed processing. Firstly, the distributed capture method using MapReduce to deduplicate big data is presented. Secondly, the storage optimization method is put forward; it uses the hash functions with light-weight characteristics and the community detection to address the storage structure and solve the data retrieval problems. DSMC has achieved the high performance of large web data comparison and storage and gets the efficient data retrieval at the same time. The experimental results show that, in the Cloudsim platform, comparing with the traditional web spider, the proposed DSMC approach shows better efficiency and performance.
format Article
id doaj-art-5980908ae8dd4c61b4581911b5394260
institution Kabale University
issn 1550-1477
language English
publishDate 2014-11-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-5980908ae8dd4c61b4581911b53942602025-02-03T06:43:00ZengWileyInternational Journal of Distributed Sensor Networks1550-14772014-11-011010.1155/2014/430848430848DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big DataXu Xu0Jia Zhao1Gaochao Xu2Yan Ding3Yunmeng Dong4 College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, China College of Computer Science and Engineering, ChangChun University of Technology, Changchun, Jilin 130000, China College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, China College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, China College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, ChinaThe processing of big data is a hotspot in the scientific research. Data on the Internet is very large and also very important for the scientific researchers, so the capture and store of Internet data is a priority among priorities. The traditional single-host web spider and data store approaches have some problems such as low efficiency and large memory requirement, so this paper proposes a big data store-retrieve approach DSMC (distributed store-retrieve approach using MapReduce model and community detection) based on distributed processing. Firstly, the distributed capture method using MapReduce to deduplicate big data is presented. Secondly, the storage optimization method is put forward; it uses the hash functions with light-weight characteristics and the community detection to address the storage structure and solve the data retrieval problems. DSMC has achieved the high performance of large web data comparison and storage and gets the efficient data retrieval at the same time. The experimental results show that, in the Cloudsim platform, comparing with the traditional web spider, the proposed DSMC approach shows better efficiency and performance.https://doi.org/10.1155/2014/430848
spellingShingle Xu Xu
Jia Zhao
Gaochao Xu
Yan Ding
Yunmeng Dong
DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
International Journal of Distributed Sensor Networks
title DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
title_full DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
title_fullStr DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
title_full_unstemmed DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
title_short DSMC: A Novel Distributed Store-Retrieve Approach of Internet Data Using MapReduce Model and Community Detection in Big Data
title_sort dsmc a novel distributed store retrieve approach of internet data using mapreduce model and community detection in big data
url https://doi.org/10.1155/2014/430848
work_keys_str_mv AT xuxu dsmcanoveldistributedstoreretrieveapproachofinternetdatausingmapreducemodelandcommunitydetectioninbigdata
AT jiazhao dsmcanoveldistributedstoreretrieveapproachofinternetdatausingmapreducemodelandcommunitydetectioninbigdata
AT gaochaoxu dsmcanoveldistributedstoreretrieveapproachofinternetdatausingmapreducemodelandcommunitydetectioninbigdata
AT yanding dsmcanoveldistributedstoreretrieveapproachofinternetdatausingmapreducemodelandcommunitydetectioninbigdata
AT yunmengdong dsmcanoveldistributedstoreretrieveapproachofinternetdatausingmapreducemodelandcommunitydetectioninbigdata