Simple web crawler speed

I created a very simple web crawler in PHP where I crawl some soccer sites for match results.

But when I browse the website, it takes about 0.5-1 seconds to crawl it. So if I have many URLs to crawl it will take a long time.

This is the beginning of my code for crawling the site:

$doc = new DOMDocument();
$doc->loadHTMLFile("http://resultater.dai-sport.dk/tms/Turneringer-og-resultater/Pulje-Stilling.aspx?PuljeId=229");
$xpath = new DOMXpath($doc);

      

I created the finder myself, so maybe there is a better way to do this or faster? Or maybe my expectations for speed are high?

+3


source to share


2 answers


Please check this lib for an asynchronous implementation of your crawler. It uses the "yield" introduced in PHP 5.5: https://github.com/icicleio/Icicle



In the library examples, you will find an example of use.

+1


source


if you don't plan on using any ready-made module then the way you did it is good, just make sure to parse the url once. here is an example from an older post: How to create a simple crawler in PHP?



if you decide to test ready-made modules refer to http://phpcrawl.cuab.de/ this is a very good option

0


source







All Articles