Are there any Lucene bots that master Shakespearean English?

I am trying to index some old documents for search - 16th, 17th, 18th century.

Modern stemmers, it seems, cannot cope with the outdated words: works, lives, walks.

Are there stemmers who have specialized in English since Shakespeare and the King James Bible? I am currently using solr.PorterStemFilterFactory

.

+3


source to share


1 answer


It looks like the changes are minimal for this.

Thus, it would be possible to copy / modify the PorterStemmer class and associated factories / filters.



Or it might be possible to add these specific rules as a regex filter in front of the Porter.

+1


source







All Articles