Add word embedding to word2vec gensim model

I am looking for a way to dynamically add pre-prepared word vectors to the word2vec gensim model.

I have a pre-prepared word2vec model in txt (words and their nesting) and I need to get the distance between words (e.g. via gensim.models.Word2Vec.wmdistance ) between documents in a specific corpus and a new document.

In order not to load the entire dictionary, I would like to load only a subset of the pre-trained model words that are in the corpus. But if there are words in the new document that are not found in the corpus, but they are in the original vocabulary of the model, add them to the model so that they are taken into account in the calculation.

What I want is to save RAM, so possible things to help me:

  • Is there a way to add word vectors directly to the model?
  • Is there a way to load gensim from a matrix or other object? I could have this object in RAM and add new words to it before loading them into the model
  • I don't need this on gensim, so if you know another implementation for WMD that receives vectors as input that will work (although I do need it in Python)

Thanks in advance.

+3
python nlp word2vec


source to share


No one has answered this question yet

Check out similar questions:

2369
Add new keys to the dictionary?
29
Update gensim word2vec model
21
How to use Gensim doc2vec with pre-prepared word vectors?
nine
gensim word2vec: find the number of words in a dictionary
7
Is it possible to retrain the word2vec model (e.g. GoogleNews-vectors-negative300.bin) from the clauses in python?
4
Shrink Google Word2Vec Model with Gensim
2
How to convert gensim Word2Vec model to FastText model?
1
How does Word Mover Distance (WMD) use word2vec input space?
1
Difference between pre-prepared text attachments prepared by word2Vec (Google News Corpus)
1
Incremental Word2Vec learning model in gensim



All Articles
Loading...
X
Show
Funny
Dev
Pics