An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to t...

Full description

Saved in:

Bibliographic Details
Main Authors:	R. Suganya Devi, D. Manjula, R. K. Siddharth
Format:	Article
Language:	English
Published:	Wiley 2015-01-01
Series:	The Scientific World Journal
Online Access:	http://dx.doi.org/10.1155/2015/739286
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832563816245755904
author	R. Suganya Devi D. Manjula R. K. Siddharth
author_facet	R. Suganya Devi D. Manjula R. K. Siddharth
author_sort	R. Suganya Devi
collection	DOAJ
description	Web Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.
format	Article
id	doaj-art-fc97640b922a4e52a37a91f52951d457
institution	Kabale University
issn	2356-6140 1537-744X
language	English
publishDate	2015-01-01
publisher	Wiley
record_format	Article
series	The Scientific World Journal
spelling	doaj-art-fc97640b922a4e52a37a91f52951d4572025-02-03T01:12:30ZengWileyThe Scientific World Journal2356-61401537-744X2015-01-01201510.1155/2015/739286739286An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web CrawlingR. Suganya Devi0D. Manjula1R. K. Siddharth2Department of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai 600025, IndiaDepartment of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai 600025, IndiaDepartment of Computer Science and Engineering, College of Engineering, Guindy, Anna University, Chennai 600025, IndiaWeb Crawling has acquired tremendous significance in recent times and it is aptly associated with the substantial development of the World Wide Web. Web Search Engines face new challenges due to the availability of vast amounts of web documents, thus making the retrieved results less applicable to the analysers. However, recently, Web Crawling solely focuses on obtaining the links of the corresponding documents. Today, there exist various algorithms and software which are used to crawl links from the web which has to be further processed for future use, thereby increasing the overload of the analyser. This paper concentrates on crawling the links and retrieving all information associated with them to facilitate easy processing for other uses. In this paper, firstly the links are crawled from the specified uniform resource locator (URL) using a modified version of Depth First Search Algorithm which allows for complete hierarchical scanning of corresponding web links. The links are then accessed via the source code and its metadata such as title, keywords, and description are extracted. This content is very essential for any type of analyser work to be carried on the Big Data obtained as a result of Web Crawling.http://dx.doi.org/10.1155/2015/739286
spellingShingle	R. Suganya Devi D. Manjula R. K. Siddharth An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling The Scientific World Journal
title	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_fullStr	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_full_unstemmed	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_short	An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling
title_sort	efficient approach for web indexing of big data through hyperlinks in web crawling
url	http://dx.doi.org/10.1155/2015/739286
work_keys_str_mv	AT rsuganyadevi anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT dmanjula anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT rksiddharth anefficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT rsuganyadevi efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT dmanjula efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling AT rksiddharth efficientapproachforwebindexingofbigdatathroughhyperlinksinwebcrawling

An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling

Similar Items