File_get_contents gets a different file from google than shown in the browser

I am using file_get_contents to see if there is a search url I am looking at:

http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate

If I go to this URL in my browser another file is displayed which I see when I echo file_get_contents

$url = "http://www.google.com/search?q=*a*+site:www.reddit.com/r/+-inurl:(/shirt/|/related/|/domain/|/new/|/top/|/controversial/|/widget/|/buttons/|/about/|/duplicates/|dest=|/i18n)&num=1&sort=date-sdate";
$google_search = file_get_contents($url);

      

What's wrong with my code?

+3


source to share


2 answers


Nothing. The problem is that the page uses javascript and ajax to get the content. So, to get a snapshot of a page, you need to "launch". That is, you need to parse javascript code, which php does not.

Your best bet is to use a headless browser like phantomjs. If you are looking, you will find several guides explaining how to do this.



Note

If all you're looking for is a way to extract raw data from a search, you can try using the google search api .

+2


source


My guess is that Google is definitely checking the user agent to avoid any automatic searches.

So, you should at least use CURL and define the correct user agent string (that is, the same as the generic browser) to "trick" Google.



Somehow I'm afraid it won't be so easy to fool Google, but maybe I'm just paranoid and at least you can learn something about CURL.

0


source







All Articles