How to increase weight of proper nouns in scikit TfidfVectorizer
I use sci-kit
TdidfVectorizer
to extract keywords from a list of scientific articles. There is an argument for stop_words, but I was wondering if I could give more weight / score to relevant names like "Bor" or "Japan".
Should I implement my own custom one tfidf vectorizer
or can I use it in one?
tf = TfidfVectorizer(strip_accents='ascii',
analyzer='word',
ngram_range=(1,1),
min_df = 0,
stop_words = stopwords,
lowercase = True)
+3
source to share