Are there any Lucene bots that master Shakespearean English?
I am trying to index some old documents for search - 16th, 17th, 18th century.
Modern stemmers, it seems, cannot cope with the outdated words: works, lives, walks.
Are there stemmers who have specialized in English since Shakespeare and the King James Bible? I am currently using solr.PorterStemFilterFactory
.
+3
source to share
1 answer
It looks like the changes are minimal for this.
Thus, it would be possible to copy / modify the PorterStemmer class and associated factories / filters.
Or it might be possible to add these specific rules as a regex filter in front of the Porter.
+1
source to share