A Novel Approach for Extraction of Relevant Web Pages from WWW Using Data Mining

By Jyoti Pandey, Ashok Sharma and Amit Goel.

Published by The Technology Collection

Format Price
Article: Print $US10.00
Article: Electronic $US5.00

The World Wide Web is a large, distributed hypertext repository of information, where people navigate through browsers on their terminals. With the exponential growth of data on WWW, providing the relevant information to the users to cater their needs is the primary goal of the website owners. The key factors for the success of WWW are its enormous set of information and de-centralized control. Nevertheless, both these issues are the major problems for searching of relevant information. In WWW, the information is represented as a collection of web pages. In order to search these web pages, a tool called search engine is employed. The search engines basically provide an interface to the user for retrieving appropriate web pages pertaining to queried keywords. e.g. if the user desires web pages in context of “computer”, then the user enters few set of keywords for searching and the keywords are sent to the search engine. Consequently, a typical search engine returns 10 to 100’s of web pages that weight highest in likeliness with the requested keywords. In order to perform the above mentioned task, the search engine looks in to the repository about the number of occurrences of the keywords in the various web pages and the context for computing a weight for each web page. In addition, weights based on link analysis and user feedback may also be added to the total score of the web page. This contributes to the page rank of the web page on the basis of which search engine lists down the links to the various web pages containing information about computer or with the help of some other mechanism. In the background, the search engines deploy a program called ‘crawler’ which automatically traverses the web, retrieve web pages, and builds up a repository of the portion of the web that it has visited. However, due to the rapid growth of WWW and the frequency of its rate of change of its content are posing unprecedented scaling challenges for the search engines. Thus, the critical issue pertaining to search engines is to device an efficient page ranking algorithm for identifying the most relevant information from the collection of web pages. In this paper, the concept of data mining has been employed for extraction of most appropriate web pages from the search engines to the user interface. Data mining, also known as Knowledge Discovery in Databases (KDD), is the practice of automatically searching large stores of data for meaningful patterns (knowledge). The goals of data mining can be achieved with the help of various mining methods like classification, association rule mining, clustering etc. The association rule based technique assists in searching interesting association relationships among the items stored in the database. Therefore, the current work suitably utilizes an association rule based method called ‘A-priori’ for identification of the relevance of web pages in accordance to the various keywords entered in the user query and subsequently, the ranking of the web pages is performed with the help of proposed mechanism called relevance_retrieval algorithm.

Keywords: Internet Technologies, Search Engine, Crawler, Data Mining

The International Journal of Technology, Knowledge and Society, Volume 3, Issue 2, pp.109-118. Article: Print (Spiral Bound). Article: Electronic (PDF File; 663.460KB).

Jyoti Pandey

Lecturer, Computer Engg, YMCA Institute of Engineering, FBD, FBD, India

Lecturer at YMCAIE,Fbd

Prof. Ashok Sharma

YMCA Institute of Engineering, India


Dr. Amit Goel

Lecturer, Department of Computer Engineering, YMCA Institute of Engineering, India

Sh.Amit Goel received his M.Tech (Comp. Engg.) with Hons. from National Institute of Technology (NIT), Trichy (India) in the year 2003. He is a recipient of EFIP scholarship sponsored by AICTE. From Feb 2003 to June 2003, he served at NIT Kurukshetra (India). Since July 2003, he is working as Lecturer in Computer Engg. at YMCA Institute of Engineering, Faridabad (India). Currently he is pursuing his Ph.D. in the area of Mobile Ad Hoc Networks, Internet Technologies


There are currently no reviews of this product.

Write a Review