Simple web crawler speed
I created a very simple web crawler in PHP where I crawl some soccer sites for match results.
But when I browse the website, it takes about 0.5-1 seconds to crawl it. So if I have many URLs to crawl it will take a long time.
This is the beginning of my code for crawling the site:
$doc = new DOMDocument();
$doc->loadHTMLFile("http://resultater.dai-sport.dk/tms/Turneringer-og-resultater/Pulje-Stilling.aspx?PuljeId=229");
$xpath = new DOMXpath($doc);
I created the finder myself, so maybe there is a better way to do this or faster? Or maybe my expectations for speed are high?
source to share
Please check this lib for an asynchronous implementation of your crawler. It uses the "yield" introduced in PHP 5.5: https://github.com/icicleio/Icicle
In the library examples, you will find an example of use.
source to share
if you don't plan on using any ready-made module then the way you did it is good, just make sure to parse the url once. here is an example from an older post: How to create a simple crawler in PHP?
if you decide to test ready-made modules refer to http://phpcrawl.cuab.de/ this is a very good option
source to share