How to increase weight of proper nouns in scikit TfidfVectorizer

I use sci-kit

TdidfVectorizer

to extract keywords from a list of scientific articles. There is an argument for stop_words, but I was wondering if I could give more weight / score to relevant names like "Bor" or "Japan".

Should I implement my own custom one tfidf vectorizer

or can I use it in one?

tf = TfidfVectorizer(strip_accents='ascii', 
                     analyzer='word',
                     ngram_range=(1,1),
                     min_df = 0,
                     stop_words = stopwords,
                     lowercase = True)

      

+3


source to share


1 answer


You can do your own postrpocessing for the TfIdf matrix.



First, you need to go through all the words of the indices to find the indices for all Self-Entities, then go through the matrix and increase the weight for those indices.

+3


source







All Articles