Python: check if Wikipedia article exists
4 answers
>>> import urllib
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Food").getcode()
200
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
404
this is normal?
or
>>> a = urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
>>> if a == 404:
... print "Wikipedia does not have an article with this exact name."
...
Wikipedia does not have an article with this exact name.
+4
source to share
Even though there is a page on Wikipedia, if you look at its request and response data, you will see:
- Status: Not Found
- Code: 404
Post Python 2.6, you can use
import urllib
urllib.urlopen("https://some-url").getcode()
to return the status code of this request for validation in your code.
+1
source to share
Basically, most websites or web services will declare some status from each of your HTTP requests in the HTTP response header.
In your case, you can simply find the status code if 404 while the article does not exist, even if your browser rendered the page as a normal result.
import request
result = request.get('https://en.wikipedia.org/wiki/Food')
if result.status_code == 200: # the article exists
pass # blablabla
+1
source to share
You can use the Wikipedia Api for Python and just use the keyword to find the article. It also brings you close related articles. Check the below example
>>> import wikipedia as wiki
>>> wikipedia.search("Barack")
[u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']
>>> wikipedia.search("Ford", results=3)
[u'Ford Motor Company', u'Gerald Ford', u'Henry Ford']
Here is the link to the python module.
0
source to share