Python: check if Wikipedia article exists

I am trying to figure out how to check if a Wikipedia article exists. For example,

https://en.wikipedia.org/wiki/Food

      

exists however

https://en.wikipedia.org/wiki/Fod 

      

no, and the page just says, "There is no entry on Wikipedia with that exact name."

Thank!

+3


source to share


4 answers


>>> import urllib
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Food").getcode()
200
>>> print urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
404

      

this is normal?



or

>>> a = urllib.urlopen("https://en.wikipedia.org/wiki/Fod").getcode()
>>> if a == 404:
...     print "Wikipedia does not have an article with this exact name."
...
Wikipedia does not have an article with this exact name.

      

+4


source


Even though there is a page on Wikipedia, if you look at its request and response data, you will see:

  • Status: Not Found
  • Code: 404

Post Python 2.6, you can use



import urllib

urllib.urlopen("https://some-url").getcode()

      

to return the status code of this request for validation in your code.

+1


source


Basically, most websites or web services will declare some status from each of your HTTP requests in the HTTP response header.
In your case, you can simply find the status code if 404 while the article does not exist, even if your browser rendered the page as a normal result.

import request
result = request.get('https://en.wikipedia.org/wiki/Food')
if result.status_code == 200:  # the article exists
    pass  # blablabla

      

+1


source


You can use the Wikipedia Api for Python and just use the keyword to find the article. It also brings you close related articles. Check the below example

>>> import wikipedia as wiki
>>> wikipedia.search("Barack")
    [u'Barak (given name)', u'Barack Obama', u'Barack (brandy)', u'Presidency of Barack Obama', u'Family of Barack Obama', u'First inauguration of Barack Obama', u'Barack Obama presidential campaign, 2008', u'Barack Obama, Sr.', u'Barack Obama citizenship conspiracy theories', u'Presidential transition of Barack Obama']
>>> wikipedia.search("Ford", results=3)
    [u'Ford Motor Company', u'Gerald Ford', u'Henry Ford']

      

Here is the link to the python module.

0


source







All Articles