Graph based crawler seed selection

Shuyi Zheng,Pavel Dmitriev,C. Lee Giles

Graph based crawler seed selection

2009

Shuyi Zheng
Pavel Dmitriev
C. Lee Giles

This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more ``good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data.

Keywords:

PageRank
World Wide Web
Computer science
Data mining
Power graph analysis
Focused crawler
Information retrieval
Web crawler
Graph
graph based

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations