Can anyone clarify some options for Python Web Automation
I am trying to make a simple script in python that scans a tweet for a link and then visits that link. I'm having trouble figuring out which direction to go from here. From what I've researched, it seems like I can use Selenium or Mechanize? What can be used to automate the browser. Can I use them to browse the Internet?
or
I can find out one of the twit apis, Requests library and pajamas (converts python code to javascript) so I can make a simple script and load it into google chrome / firefox extensions.
What's the best option?
source to share
There are many different ways to do web automation. Since you are working with Twitter, you can try the Twitter API. If you are doing any other task, there are more options.
-
Selenium
very useful when you need to click buttons or enter values on a form. The only drawback is that it opens a separate browser window. -
Mechanize
, unlike Selenium, does not open a browser window, and is also suitable for managing buttons and forms. This task may take a few more lines to complete. -
Urllib
/Urllib2
this is what I am using. Some people find it a little difficult at first, but once you know what you are doing, it is very fast and gets the job done. Alternatively, you can do things with cookies and proxies. It is a built-in library, so there is no need to download anything. -
Requests
as good asUrllib
but i don't have much experience with it. You can do things like add titles. This is a very good library.
Once you get the page you want, I recommend that you use BeautifulSoup to parse the data you want.
I hope this takes you in the right direction for website automation.
source to share
I am not expecting in web scraping. But I've had some experience with Mechanize and Selenium. I think in your case, either Mechanize or Selenium will suit your needs well, but will also spend some time learning about these Python libraries Beautiful Soup, urllib and urlib2.
In my humble opinion, I will recommend that you use Mechanize over Selenium in your case. Because selenium is not as light as that of mechanization. Selenium is used to emulate a real web browser, so you can actually do a " click action ".
There are some excerpts from Mechanize. You will find "Mechanism" in time to press the enter button... Also Mechanize doesn't understand java scripts, so many times I have to imitate what java scripts do in my own python code.
One last tip if you've somehow decided to choose Selenium over Mechanization in the future. Use a headerless browser like PhantomJS rather than Chrome or Firefox to reduce your Selenium computation time. Hope this helps and good luck.
source to share