Python Scraping fb comments from site

I am trying to clean up facebook comments using Beautiful Soup in the pages below.

import BeautifulSoup
import urllib2
import re

url = 'http://techcrunch.com/2012/05/15/facebook-lightbox/'

fd = urllib2.urlopen(url)

soup = BeautifulSoup.BeautifulSoup(fd)

fb_comment = soup("div", {"class":"postText"}).find(text=True)

print fb_comment

      

The output is set to zero. However, I can clearly see that the facebook comment is within the above tags in the checkout element of the techcrunch site (I'm not new to Python and wondered if the approach is correct and where am I going wrong?)

+3


source to share


3 answers


Like Christopher and Thiefmaster: this is all due to javascript.



But if you really need this information, you can still get it thanks to Selenium at http://seleniumhq.org and then use beautifulsoup on that output.

+1


source


Facebook comments are loaded dynamically using AJAX. You can clear the original page to get this:

<fb:comments href="http://techcrunch.com/2012/05/15/facebook-lightbox/" num_posts="25" width="630"></fb:comments>

      



After that, you need to send a request to some Facebook API that will give you comments for the url in that tag.

0


source


The parts of the page you are looking for are not included in the original file. Use a browser and you can see it for yourself by opening the page source.

You will need to use something like pywebkitgtk to execute the javascript before passing the document to BeautifulSoup

0


source







All Articles