Lucene - exact string matching

Question

Lucene - exact string matching

I am trying to create a Lucene 4.10 index. I just want to store in the index the exact strings that I have nested in the document, without tokens.

I am using StandardAnalyzer.

    Directory dir = FSDirectory.open(new File("myDire"));
    Analyzer analyzer = new StandardAnalyzer();
    IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_4_10_0, analyzer);
    iwc.setOpenMode(OpenMode.CREATE);
    IndexWriter writer = new IndexWriter(dir, iwc);
    StringField field1 = new StringField("1", content1, Store.YES);
    StringField field2 = new StringField("2", content2, Store.YES);
    StringField field3 = new StringField("3", content3, Store.YES);
    doc.add(field1);
    doc.add(field2);
    doc.add(field3);
    writer.addDocument(doc, analyzer);
    writer.close();

If I print out the contents of the index, I can see that my data is being stored, for example, my document has this "field 3":

    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<3:"Fuel Tank Capacity"@en>

I am trying to query the index to get it back:

    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer();
    QueryParser parser = new QueryParser("3", analyzer);
    String queryString = "\"\"Fuel Tank Capacity"\@en\"";
    Query query = parser.createPhraseQuery("3", QueryParser.escape(queryString));
    TopDocs docs = searcher.search(query, null, 20);

I'm trying to find @en's "Fuel Capacity" term (including quotes), so I tried to get away from them, and I added a couple more quotes around the terms to let Lucene understand what I'm looking for for all the texts.

If I print out the query, I get: 3: "Fuel tank capacity en" but I don't want to split the text by the @ symbol.

I believe my first problem is StandardAnalyzer because it seems to be tokenize if I'm not mistaken. However, I can't figure out how to query the index to get exactly @en's "fuel tank capacity" (including the quotes).

thank

+3

java tokenize lucene

LucaT 12 Sep 14 at 13:40

source to share

2 answers

When escaping a quote (or any other special character in Lucene), you need to use \, but don't forget that the backslash must be escaped inside a Java string.

The following works for me:

    Query q = new QueryParser(
            Version.LUCENE_4_10_0,
            "",
            new StandardAnalyzer(Version.LUCENE_4_10_0)
    ).parse("3:\"\\\"Fuel Tank Capacity\\\"@en\"");

How did I come to this?

Took the original line "Fuel Tank Capacity"@en
Added escaping, which is necessary for Lucene (escaped each "

with \

):\"Fuel Tank Capacity\"@en
Added escaped quotes at the beginning and end of the line: "\"Fuel Tank Capacity\"@en"
Added escaping, which is necessary for Java String (each slash becomes a double slash, double quotes are escaped with a backslash): \"\\\"Fuel Tank Capacity\\\"@en\"

0

mindas 12 Sep 14 at 13:48

source to share

femtoRgon · Accepted Answer · 2014-09-12T18:13:36+0000

You can keep things simple and just cut out QueryParser

of the equation. Since you are using StringField

, all the content of the field is one term, so a simple one TermQuery

should work well:

Query query = new TermQuery(new Term("3","\"Fuel Tank Capacity\"@en"));

Lucene - exact string matching

More articles: