Sphinx - no tokenizer for fr, cs, etc.

I am using Sphinx for documentation purposes. I want to use the spell checker in French.

So far I have done the following:

  • installing sphinx spellcheck extension
sudo pip install sphinxcontrib-spelling

      

  • French language setting
 sudo apt-get install myspell-fr-fr

      

  • add extension to conf.py
 extensions = ["sphinxcontrib.spelling"]
 spelling_lang='fr'

      

  • add spelling builder

builder = ["html", "pdf", "spelling"],

Here is the trace I get when I run sphinx:

Exception occurred:
  File "/usr/lib/python2.7/dist-packages/sphinx/cmdline.py", line 188, in main
warningiserror, tags)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 134, in __init__
self._init_builder(buildername)
  File "/usr/lib/python2.7/dist-packages/sphinx/application.py", line 194, in _init_builder
self.builder = builderclass(self)
  File "/usr/lib/python2.7/dist-packages/sphinx/builders/__init__.py", line 57, in __init__
self.init()
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 253, in init
filters=filters,
  File "/usr/lib/pymodules/python2.7/sphinxcontrib/spelling.py", line 181, in __init__
self.tokenizer = get_tokenizer(lang, filters)
  File "/usr/lib/python2.7/dist-packages/enchant/tokenize/__init__.py", line 186, in get_tokenizer
raise TokenizerNotFoundError(msg)
TokenizerNotFoundError: No tokenizer found for language 'fr'

      

Any help is appreciated :-)

+3


source to share


2 answers


You need to add a tokenizer for the given language to PyEnchant

.

A quick and dirty solution

Paste the pyrenant repo into it and cd:

$ git clone git@github.com:rfk/pyenchant.git
$ cd pyenchant

      

Change to the directory where tokenizers are defined:

$ cd enchant/tokenize

      



Copy your existing tokenizer en.py

to the language code you want to use (I was missing cs

, you can try fr

):

$ cp en.py cs.py
$ cp en.py fr.py

      

Install the package from this modified code:

$ cd ../..  # first return to the dir with `setup.py`
$ pip install -e .

      

And it will work now (it works for me).

The best solution would be to look at the copied tokenizer and change where it doesn't match your language. And contribute to PyEnchant

.

+2


source


I ran into the same error and it looks like it has nothing to do with missing dictionaries.

PyEnchant just doesn't come with a French tokenizer, but only in English. As noted in the Extension enchant.tokenize documentation:



The author would be very grateful for non-English tokenization routines that can be included in the main PyEnchant distribution.

+2


source







All Articles