Can anyone clarify some options for Python Web Automation

I am trying to make a simple script in python that scans a tweet for a link and then visits that link. I'm having trouble figuring out which direction to go from here. From what I've researched, it seems like I can use Selenium or Mechanize? What can be used to automate the browser. Can I use them to browse the Internet?


I can find out one of the twit apis, Requests library and pajamas (converts python code to javascript) so I can make a simple script and load it into google chrome / firefox extensions.

What's the best option?


source to share

3 answers

There are many different ways to do web automation. Since you are working with Twitter, you can try the Twitter API. If you are doing any other task, there are more options.

Once you get the page you want, I recommend that you use BeautifulSoup to parse the data you want.

I hope this takes you in the right direction for website automation.



I am not expecting in web scraping. But I've had some experience with Mechanize and Selenium. I think in your case, either Mechanize or Selenium will suit your needs well, but will also spend some time learning about these Python libraries Beautiful Soup, urllib and urlib2.

In my humble opinion, I will recommend that you use Mechanize over Selenium in your case. Because selenium is not as light as that of mechanization. Selenium is used to emulate a real web browser, so you can actually do a " click action ".

There are some excerpts from Mechanize. You will find "Mechanism" in time to press the enter button... Also Mechanize doesn't understand java scripts, so many times I have to imitate what java scripts do in my own python code.

One last tip if you've somehow decided to choose Selenium over Mechanization in the future. Use a headerless browser like PhantomJS rather than Chrome or Firefox to reduce your Selenium computation time. Hope this helps and good luck.



For web scraping, Scrapy seems like a better framework.

It is very well documented and easy to use.



All Articles