How to increase weight of proper nouns in scikit TfidfVectorizer

I use sci-kit


to extract keywords from a list of scientific articles. There is an argument for stop_words, but I was wondering if I could give more weight / score to relevant names like "Bor" or "Japan".

Should I implement my own custom one tfidf vectorizer

or can I use it in one?

tf = TfidfVectorizer(strip_accents='ascii', 
                     min_df = 0,
                     stop_words = stopwords,
                     lowercase = True)



source to share

1 answer

You can do your own postrpocessing for the TfIdf matrix.

First, you need to go through all the words of the indices to find the indices for all Self-Entities, then go through the matrix and increase the weight for those indices.



All Articles