Spacy outputs and offsets are different
My suggestion: She had another chemotherapy protocol history with 5-FU alone before this protocol without any significant side effects.
When I put this in offset ( https://demos.explosion.ai/displacy/ ), the output contains a reference to 5-FU as a name phrase.
However, when I comment out the text and search for noun fragments, I am not showing 5-FU as part of the noun.
nlp = spacy.load('en')
ax = nlp(mySentence)
for w in ax.noun_chunks: print(w)
edit Also, when I search for tags with the code below, I am showing 5-FU as NN. If the annotation "Space" understands this one-dot word as a noun surrounded by prepositions, why shouldn't the word be taken as a noun? end edit
What am I doing wrong? Is there a version difference between me and the version I am using? Is there a spaCy support team to resolve this issue?
Thank you so much!
source to share
Displacy does some preprocessing when displaying the parse tree. Here is a link to the parsing service (built in open spaces) used by display: https://github.com/explosion/spacy-services/blob/master/displacy/displacy_service/parse.py#L25
if collapse_phrases:
for np in list(self.doc.noun_chunks):
np.merge(np.root.tag_, np.root.lemma_, np.root.ent_type_)
Spacy concatenates noun fragments in a sentence rather than treating them as separate tokens, so your output is different.
Another difference will be the models you use. You may be using the smallest en_core_web_sm, whereas Spacy may use the larger en_core_web_md (although it is not officially mentioned anywhere)
source to share
I am trying to solve the same problem. DisplayCy and SpaCy outputs are different (both POS tags and relationship between words).
It doesn't seem like preprocessing virtualization is to blame as you can turn it off in DisplayCy - Settings> Collapse Phrases - for me the result is still not the same.
Perhaps you need to use the en_core_web_md model (not en_core_web_sm):
python -m spacy download en_core_web_md
However, I have not tested this yet.
source to share
Since they are upgrading to V2.0 I faced a similar issue. Then I upgraded to V2.0 To install the model, you need to download it with the full name using the --direct flag:
python -m spacy download en_core_web_sm-2.0.0-alpha --direct # English
python -m spacy download xx_ent_wiki_sm-2.0.0-alpha --direct # Multi-language NER
You can load the model by calling spaCy loader. for example nlp = spacy.load('en_core_web_sm')
or import it as a module ( import en_core_web_sm
) and call its load () method, eg nlp = en_core_web_sm.load()
.
Follow the documentation at https://github.com/explosion/spaCy/releases/tag/v2.0.0-alpha
source to share