How fast is simplexml_load_file ()?

I am getting a lot of user data via the last.fm API for my mashup. I do this every week as I have to collect data for listening.

I am fetching data through their REST API and XML: more specifically simplexml_load_file()

.

The script takes ridiculously long. For roughly 2,300 users, the script takes 30 minutes to display only artist names. I have to fix this now, otherwise my hosting company shut me down. I've thrown away all the other options, it's XML that slows down the script.

Now I need to figure out if last.fm has a slow API (or restricts calls if they don't tell us), or a simple PHP file is actually pretty slow.

One thing I figured out is that the XML request is getting a lot more than I need, but I cannot limit it via the API (i.e. give me the information in only 3 lanes, not 70). But "large" XML files are about 20 kb. Maybe this is slowing down the script? Need to upload 20kb to an object for each of the 2300 users?

Doesn't make sense what it might be ... I just need confirmation that this is probably last.fm a slow API. Or that?

Any other help you can provide?

+2


source to share


5 answers


I dont think simple xml is slow, it is slow because it is a parser, but I think 2300 curl / file_get_contents take a lot longer. Also why not fetch the data and just use simplexml_load_string, do you really need to put this file on the server disk?



At least loading from memory should speed up some things, and also what kind of processing are you going for the loaded xmls? are you sure your processing is effective, how can it be?

+1


source


20kb * 2300 users - ~ 45 MB. If you download files at ~ 25KB / sec, it only takes 30 minutes to download the data, let alone analyze it.



+1


source


Make sure the XML loaded from last.fm is gzipped. You probably need to include the correct HTTP header to tell the server that gzip is supported. This will speed up the loading but will consume more server resources with the expandable part.

Also consider using asynchronous downloads to free up server resources. This doesn't necessarily speed up the process, but it should make the server administrators happy.

If the XML itself is large, use a SAX parser instead of a DOM parser.

0


source


I think there is a limit of 1 API call per second. I'm not sure if this policy is enforced by code, but it might have something to do with it. You can ask the Last.fm staff on IRC at irc.last.fm #audioscrobbler if you think this is the case.

0


source


As suggested, extract the data and analyze with help simplexml_load_string

instead of relying on simplexml_load_file

- it is about twice as fast. Here is the code:

function simplexml_load_file2($url, $timeout = 30) {


// parse domain etc from url
$url_parts = parse_url($url);
if(!$url_parts || !array_key_exists('host', $url_parts)) return false;

$fp = fsockopen($url_parts['host'], 80, $errno, $errstr, $timeout);
if($fp) 
{
    $path = array_key_exists('path', $url_parts) ? $url_parts['path'] : '/'; 
    if(array_key_exists('query', $url_parts)) 
    {
        $path .= '?' . $url_parts['query'];
    }

    // make request
    $out = "GET $path HTTP/1.1\r\n";
    $out .= "Host: " . $url_parts['host'] . "\r\n";
    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);

    // get response
    $resp = "";
    while (!feof($fp))
    {
        $resp .= fgets($fp, 128);
    }
    fclose($fp);

    $parts = explode("\r\n\r\n", $resp);
    $headers = array_shift($parts);

    $status_regex = "/HTTP\/1\.\d\s(\d+)/";
    if(preg_match($status_regex, $headers, $matches) && $matches[1] == 200)
    {
        $xml = join("\r\n\r\n", $parts);    
        return @simplexml_load_string($xml);            
    }   

}
return false; }

      

0


source







All Articles