Does lucene standardanalyzer remove temporary words and has a stem function?

Question

Does lucene standardanalyzer remove temporary words and has a stem function?

I tested the standardanalyzer with indexWriter and found that it removes stop words automatically, however I did not add the stop word list as the following code is what I used

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); 
        IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_35, analyzer);

where is the default stopword list? also, does this parser automatically stem the words?

+3

java search lucene

user1225072 18 March 12 at 12:45 am

source to share

1 answer

Michał Kosmulski · Accepted Answer · 2012-03-18T17:17:47+0000

According to the API docs, there is a default set of stop words (taken from English) stored in StandardAnalyzer.STOP_WORDS_SET

. It is used if you are creating a parser with a constructor public StandardAnalyzer(Version matchVersion)

, which is what you are doing. The set is exactly the same as StopAnalyzer.ENGLISH_STOP_WORDS_SET

. You can use one of the other constructors to pass a different (possibly empty) set of stop words to the parser.

StandardAnalyzer

does not evoke words. If you need to stop, use eg SnowballAnalyzer

.

Does lucene standardanalyzer remove temporary words and has a stem function?

More articles: