Sphinx - no tokenizer for fr, cs, etc.
I am using Sphinx for documentation purposes. I want to use the spell checker in French.
So far I have done the following:
- installing sphinx spellcheck extension
sudo pip install sphinxcontrib-spelling
- French language setting
sudo apt-get install myspell-fr-fr
- add extension to conf.py
extensions = ["sphinxcontrib.spelling"]
spelling_lang='fr'
- add spelling builder
builder = ["html", "pdf", "spelling"],
Here is the trace I get when I run sphinx:
Exception occurred:
File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 188, in main
warningiserror, tags)
File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 134, in __init__
self._init_builder(buildername)
File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 194, in _init_builder
self.builder = builderclass(self)
File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 57, in __init__
self.init()
File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 253, in init
filters=filters,
File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 181, in __init__
self.tokenizer = get_tokenizer(lang, filters)
File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py", line 186, in get_tokenizer
raise TokenizerNotFoundError(msg)
TokenizerNotFoundError: No tokenizer found for language 'fr'
Any help is appreciated :-)
source to share
You need to add a tokenizer for the given language to PyEnchant
.
A quick and dirty solution
Paste the pyrenant repo into it and cd:
$ git clone git@github.com:rfk/pyenchant.git
$ cd pyenchant
Change to the directory where tokenizers are defined:
$ cd enchant/tokenize
Copy your existing tokenizer en.py
to the language code you want to use (I was missing cs
, you can try fr
):
$ cp en.py cs.py $ cp en.py fr.py
Install the package from this modified code:
$ cd ../.. # first return to the dir with `setup.py`
$ pip install -e .
And it will work now (it works for me).
The best solution would be to look at the copied tokenizer and change where it doesn't match your language. And contribute to PyEnchant
.
source to share
I ran into the same error and it looks like it has nothing to do with missing dictionaries.
PyEnchant just doesn't come with a French tokenizer, but only in English. As noted in the Extension enchant.tokenize documentation:
The author would be very grateful for non-English tokenization routines that can be included in the main PyEnchant distribution.
source to share