Python Scraping fb comments from site

Question

Python Scraping fb comments from site

I am trying to clean up facebook comments using Beautiful Soup in the pages below.

import BeautifulSoup
import urllib2
import re

url = 'http://techcrunch.com/2012/05/15/facebook-lightbox/'

fd = urllib2.urlopen(url)

soup = BeautifulSoup.BeautifulSoup(fd)

fb_comment = soup("div", {"class":"postText"}).find(text=True)

print fb_comment

The output is set to zero. However, I can clearly see that the facebook comment is within the above tags in the checkout element of the techcrunch site (I'm not new to Python and wondered if the approach is correct and where am I going wrong?)

+3

python beautifulsoup

Jay setti 19 jan. 13 at 13:31

source to share

3 answers

Lynx-Lab · Answer 1 · 2013-01-19T18:52:52+0000

Like Christopher and Thiefmaster: this is all due to javascript.

But if you really need this information, you can still get it thanks to Selenium at http://seleniumhq.org and then use beautifulsoup on that output.

ThiefMaster · Answer 2 · 2013-01-19T13:45:20+0000

Facebook comments are loaded dynamically using AJAX. You can clear the original page to get this:

<fb:comments href="http://techcrunch.com/2012/05/15/facebook-lightbox/" num_posts="25" width="630"></fb:comments>

After that, you need to send a request to some Facebook API that will give you comments for the url in that tag.

Christopher Hackett · Answer 3 · 2013-01-19T13:46:00+0000

The parts of the page you are looking for are not included in the original file. Use a browser and you can see it for yourself by opening the page source.

You will need to use something like pywebkitgtk to execute the javascript before passing the document to BeautifulSoup

Python Scraping fb comments from site

More articles: