File_get_contents () returns unknown string

I am executing the following code:

<?php
$html = file_get_contents('http://actualidad.rt.com/actualidad');
var_dump($html);
?>

      

And the result is more than strange. I have been working with file_get_contents()

for a long time. But I have no idea what it might be.

Any help? Thanks a lot for reading.

+3


source to share


1 answer


The site is technically broken. It sends the page back to gzip encoding regardless of whether the client has indicated that it can handle it. This works in all modern web browsers as they either request a page that is compressed by default, or they handle a gzip response even if they don't ask for it.

You can follow the route suggested in the answer to the question Wouter points out, but I would suggest using the PHP curl library instead. It should be able to transparently decode the requested page.

For example:



$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, 'http://actualidad.rt.com/actualidad');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_ENCODING , 'gzip');
echo curl_exec ($ch);

      

You should find that this outputs the actual HTML of the web page. This is because of the option CURLOPT_ENCODING

I set for "gzip". Since I installed this, curl knows the response will be gzipped and will unpack it for you.

I think this is a better solution than manually unpacking the page, as in the future, if the site is fixed to actually return the page without gzip, if the client says it cannot handle gzip, this code should continue to work.

+4


source







All Articles