Add word embedding to word2vec gensim model

I am looking for a way to dynamically add pre-prepared word vectors to the word2vec gensim model.

I have a pre-prepared word2vec model in txt (words and their nesting) and I need to get the distance between words (e.g. via gensim.models.Word2Vec.wmdistance ) between documents in a specific corpus and a new document.

In order not to load the entire dictionary, I would like to load only a subset of the pre-trained model words that are in the corpus. But if there are words in the new document that are not found in the corpus, but they are in the original vocabulary of the model, add them to the model so that they are taken into account in the calculation.

What I want is to save RAM, so possible things to help me:

  • Is there a way to add word vectors directly to the model?
  • Is there a way to load gensim from a matrix or other object? I could have this object in RAM and add new words to it before loading them into the model
  • I don't need this on gensim, so if you know another implementation for WMD that receives vectors as input that will work (although I do need it in Python)

Thanks in advance.

+3


source to share





All Articles