Lucene: search with partial words

I am working on integrating Lucene into our application. Lucene is currently working, for example, when I search for "Upload" and there is some text called "Upload" in the document, then it works, but when I search for "Uplo" it doesn't work. Any ideas?

Code:

  Directory directory = FSDirectory.open(path);
                IndexReader indexReader = DirectoryReader.open(directory);
                IndexSearcher indexSearcher = new IndexSearcher(indexReader);

                QueryParser queryParser = new QueryParser("contents", new SimpleAnalyzer());
                Query query = queryParser.parse(text);
                TopDocs topDocs = indexSearcher.search(query, 50);
                for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
                    org.apache.lucene.document.Document document = indexSearcher.doc(scoreDoc.doc);
                    objectIds.add(Integer.valueOf(document.get("id")));
                    System.out.println("");
                    System.out.println("id " + document.get("id"));
                    System.out.println("content " + document.get("contents"));
                }
                return objectIds;

      

Thank.

+3


source to share


3 answers


"Load" can be ONE token in your Lucene index, where the token will be the smallest object not shared further. If you want to combine partial words like "Uplo" then it is better to go to Lucene NGram Indexing . Note that if you upgrade to NGram indexing, you will have more space requirements for your inverted index.



+2


source


You can use pattern search.

"?" character to search for a single character character and "*" character to search for multiple characters (0 or more characters).



example - "Seal *"

0


source


Change

Query query = queryParser.parse(text);

      

To

 Query query = queryParser.parse("*"+text+"*");

      

Lucene supports single and multi-character group searches within the same term (not in phrase queries).

To search with a single character, use the ?? symbol.

Use "*" to search for a pattern with multiple characters.

The single character pattern search searches for terms that match the replaced single character. For example, to search for "text" or "test" you can use search:

te?t

      

A multi-character pattern search searches for 0 or more characters. For example, to find tests, tests, or testers, you can use search:

test*

      

You can also use a mid-term wildcard search.

te*t

      

Note. Can't you use * or? character as the first character of the search.

0


source







All Articles