A Task Scheduling Strategy Based on Weighted Round-Robin for Distributed Crawler

2014 
With the rapid development of the network, stand-alone crawlers have been hard to find and gather the massive information. The form of crawlers will gradually tend to distributed. This paper proposes a task scheduling strategy based on weighted Round-Robin for small-scale distributed crawler, and formula weights for the current node based on crawling efficiency, so that each node can load balance. The design of the error recovery mechanism and the node table allows crawling nodes have flexible scalability and fault tolerance. Finally, we conducted some experiments to prove the good load balancing performance of the system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    9
    Citations
    NaN
    KQI
    []