HTTP Error 403 Forbidden - When loading nltk data
I am facing some problem to access nltk data
. I've tried it nltk.download()
. An error appeared on the gui page HTTP Error 403: Forbidden
. I am also trying to install from the command line which is provided here .
python -m nltk.downloader all
and get this error.
C: \ Python36 \ lib \ runpy.py: 125: RuntimeWarning: 'nltk.downloader' found in sys.modules after importing package 'nltk' but before executing 'nltk.downloader'; this can lead to unpredictable behavior warning (RuntimeWarning (msg)) [nltk_data] Error while loading: HTTP Error 403: Forbidden.
I am also reviewing How to download NLTK data? and Failed to load english.pickle using nltk.data.load .
source to share
The problem comes from the nltk download server. If you look at the gui config it will point to this link
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
If you accessed this link in a browser, you will receive this as a message:
Error 403 Forbidden.
Forbidden.
Guru Mediation:
Details: cache-lcy1125-LCY 1501134862 2002107460
Varnish cache server
So, I was about to submit an issue on github, but someone else did it here: https://github.com/nltk/nltk/issues/1791
A workaround has been suggested: https://github.com/nltk/nltk/issues/1787 .
Based on the github discussion:
Github seems to be disabling / blocking access to raw content on the repo.
The suggested workaround is to download manually like this:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
People have also suggested using the later index like this:
python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt
source to share
Go to /nltk/downloader.py
And change the default url:
DEFAULT_URL = ' http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml '
to
DEFAULT_URL = ' http://nltk.github.com/nltk_data/ '
source to share
The best solution for me is:
PATH_TO_NLTK_DATA=/home/username/nltk_data/
wget https://github.com/nltk/nltk_data/archive/gh-pages.zip
unzip gh-pages.zip
mv nltk_data-gh-pages/ $PATH_TO_NLTK_DATA
Alternative solution doesn't work for me
python -m nltk.downloader -u https://pastebin.com/raw/D3TBY4Mj punkt
source to share