Value error in Spacy when using pytextrank (Python text block implementation)
I have used pytextrank to extract keywords. I have installed both pytextrank and spacy using below commands.
pip install pytextrank
pip install -U spacy
python -m spacy download en
Here is my code
import pytextrank
import sys
path_stage0 = jsonPath
path_stage1 = "data/json/temp/o1.json"
with open(path_stage1, 'w') as f:
for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
# to view output in this notebook
print(pytextrank.pretty_print(graf))
When trying to accomplish this
i am getting below error.ValueError Traceback (most recent call last)
<ipython-input-12-07819fc6acea> in <module>()
6
7 with open(path_stage1, 'w') as f:
----> 8 for graf in
pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
9 f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
10 # to view output in this notebook
/home/sameera/anaconda2/lib/python2.7/site-
packages/pytextrank/pytextrank.pyc in parse_doc(json_iter)
259 print("graf_text:", graf_text)
260
--> 261 grafs, new_base_idx = parse_graf(meta["id"], graf_text, base_idx)
262 base_idx = new_base_idx
263
/home/sameera/anaconda2/lib/python2.7/site-packages/pytextrank/pytextrank.pyc in parse_graf(doc_id, graf_text, base_idx, spacy_nlp)
193 doc = spacy_nlp(graf_text, parse=True)
194
--> 195 for span in doc.sents:
196 graf = []
197 digest = hashlib.sha1()
/home/sameera/anaconda2/lib/python2.7/site-packages/spacy/tokens/doc.pyx in __get__ (spacy/tokens/doc.cpp:9664)()
432
433 if not self.is_parsed:
--> 434 raise ValueError(
435 "sentence boundary detection requires the dependency parse, which "
436 "requires data to be installed. If you haven't done so, run: "
ValueError: sentence boundary detection requires the dependency parse, which
requires data to be installed. If you haven't done so, run:
python -m spacy download en
to install the data
I am using python 2.7, anaconda 4.3, jupyter notebook and ubuntu 14.04
source to share
It might just be a mistake in the way you copied your code to StackOverflow, but if not:
Be sure to indent everything under the "c" statement, including the for loop.
Basically:
with open(path_stage1, 'w') as f:
for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
f.write("%s\n" % pytextrank.pretty_print(graf._asdict()))
print(pytextrank.pretty_print(graf))
source to share
It is better to use requirements.txt
in a package pytextrank
instead of pip install -U spacy
- as spaCy
it develops quickly and -U
will install the latest version. These updates are not always backward compatible.
Also feel free to post on the GitHub repository for pytextrank
: https://github.com/ceteri/pytextrank/issues
Glad to hear about use :)
source to share