Spacy outputs and offsets are different

Question

Spacy outputs and offsets are different

My suggestion: She had another chemotherapy protocol history with 5-FU alone before this protocol without any significant side effects.

When I put this in offset ( https://demos.explosion.ai/displacy/ ), the output contains a reference to 5-FU as a name phrase.

enter image description here

However, when I comment out the text and search for noun fragments, I am not showing 5-FU as part of the noun.

nlp = spacy.load('en') ax = nlp(mySentence) for w in ax.noun_chunks: print(w)

edit Also, when I search for tags with the code below, I am showing 5-FU as NN. If the annotation "Space" understands this one-dot word as a noun surrounded by prepositions, why shouldn't the word be taken as a noun? end edit

My version: enter image description here

What am I doing wrong? Is there a version difference between me and the version I am using? Is there a spaCy support team to resolve this issue?

Thank you so much!

+3

python spacy

mejobhoot June 12. 17 at 10:10

source to share

3 answers

DhruvPathak · Answer 1 · 2017-06-12T18:02:34+0000

Displacy does some preprocessing when displaying the parse tree. Here is a link to the parsing service (built in open spaces) used by display: https://github.com/explosion/spacy-services/blob/master/displacy/displacy_service/parse.py#L25

if collapse_phrases:
    for np in list(self.doc.noun_chunks):
        np.merge(np.root.tag_, np.root.lemma_, np.root.ent_type_)

Spacy concatenates noun fragments in a sentence rather than treating them as separate tokens, so your output is different.

enter image description here

Another difference will be the models you use. You may be using the smallest en_core_web_sm, whereas Spacy may use the larger en_core_web_md (although it is not officially mentioned anywhere)

Honey_Badger · Answer 2 · 2017-06-14T15:24:43+0000

I am trying to solve the same problem. DisplayCy and SpaCy outputs are different (both POS tags and relationship between words).

It doesn't seem like preprocessing virtualization is to blame as you can turn it off in DisplayCy - Settings> Collapse Phrases - for me the result is still not the same.

Perhaps you need to use the en_core_web_md model (not en_core_web_sm):

python -m spacy download en_core_web_md

However, I have not tested this yet.

Nishank · Answer 3 · 2017-07-03T07:13:48+0000

Since they are upgrading to V2.0 I faced a similar issue. Then I upgraded to V2.0 To install the model, you need to download it with the full name using the --direct flag:

python -m spacy download en_core_web_sm-2.0.0-alpha --direct   # English
python -m spacy download xx_ent_wiki_sm-2.0.0-alpha --direct   # Multi-language NER

You can load the model by calling spaCy loader. for example nlp = spacy.load('en_core_web_sm')

or import it as a module ( import en_core_web_sm

) and call its load () method, eg nlp = en_core_web_sm.load()

.

Follow the documentation at https://github.com/explosion/spaCy/releases/tag/v2.0.0-alpha

Spacy outputs and offsets are different

More articles: