HTTP Error 403 Forbidden - When loading nltk data

I am facing some problem to access nltk data

. I've tried it nltk.download()

. An error appeared on the gui page HTTP Error 403: Forbidden

. I am also trying to install from the command line which is provided here .

python -m nltk.downloader all

      

and get this error.

C: \ Python36 \ lib \ runpy.py: 125: RuntimeWarning: 'nltk.downloader' found in sys.modules after importing package 'nltk' but before executing 'nltk.downloader'; this can lead to unpredictable behavior warning (RuntimeWarning (msg)) [nltk_data] Error while loading: HTTP Error 403: Forbidden.

I am also reviewing How to download NLTK data? and Failed to load english.pickle using nltk.data.load .

+3


source to share


3 answers


The problem comes from the nltk download server. If you look at the gui config it will point to this link

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

      

If you accessed this link in a browser, you will receive this as a message:

Error 403 Forbidden.

Forbidden.

Guru Mediation:

Details: cache-lcy1125-LCY 1501134862 2002107460

Varnish cache server

      

So, I was about to submit an issue on github, but someone else did it here: https://github.com/nltk/nltk/issues/1791

A workaround has been suggested: https://github.com/nltk/nltk/issues/1787 .



Based on the github discussion:

Github seems to be disabling / blocking access to raw content on the repo.

The suggested workaround is to download manually like this:

PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA

      

People have also suggested using the later index like this:

python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt

      

+3


source


Go to /nltk/downloader.py

And change the default url:

DEFAULT_URL = ' http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml '



to

DEFAULT_URL = ' http://nltk.github.com/nltk_data/ '

0


source


The best solution for me is:

PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA

      

link

Alternative solution doesn't work for me

python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt

      

0


source







All Articles