Analyzer for finding phrases in ElasticSearch

I am using ElasticSearch 1.5.2. I want to enable phrases to be searched on my search engine.

Suppose the text

read with section 114 of the Indian Penal Code

Using the default parser, I cannot get any results in the search term

section 114 penal code

So, I added an analyzer:

        XContentBuilder settingsBuilder = XContentFactory.jsonBuilder()
            .startObject()
                .startObject("analysis")
                    .startObject("filter")
                        .startObject("filter_shingle")
                            .field("type","shingle")
                            .field("max_shingle_size",2)
                            .field("min_shingle_size",2)
                            .field("output_unigrams",false)
                        .endObject()
                        .startObject("filter_stemmer")
                            .field("type","porter_stem")
                            .field("language","English")
                        .endObject()
                    .endObject()
                    .startObject("tokenizer")
                        .startObject("my_ngram_tokenizer")
                            .field("type","nGram")
                            .field("min_gram",1)
                            .field("max_gram",1)
                        .endObject()
                    .endObject()
                    .startObject("analyzer")
                        .startObject("ShingleAnalyzer")
                            .field("tokenizer","my_ngram_tokenizer")
                            .array("filter","snowball","standard","lowercase","filter_stemmer","filter_shingle")
                        .endObject()
                    .endObject()
                .endObject()
            .endObject();

    client.admin().indices()
    .prepareCreate("temp_index").setSettings(settingsBuilder).get();

      

I index the file (already in an acceptable json format) like this:

String file1 = readFile("1.txt");
IndexResponse response1 = client.prepareIndex("new_index","docs").setSource(file1).execute().actionGet();

      

and executed the request with the matchQuery

following:

MatchQueryBuilder mqb1 = QueryBuilders.matchPhraseQuery("text", str).analyzer("ShingleAnalyzer");
SearchResponse matchResponse1 = client.prepareSearch().setQuery(mqb1).execute().actionGet();

      

But I still don't get any results. Could you please suggest me what to do?

EDIT: Actually, When I try to fetch any results from this parser I get no hits ... Even with the query "section" that is present in all the documents I indexed, I get no results, then like when i use the default parser for search i get some results. So this analyzer is not working or what?

EDIT: Example doc,

{
      "docName": "Adamji Umar Dalal vs The State Of Bombay",
      "text": "1.These two appeals by special leave are limited to the question of sentence only. In case No. 1783/P of 1950, which has given rise to Criminal Appeal No. 54 of 1951, the appellant Adamji Umar Dalal was tried along with five other persons on the following charges :- Firstly,that you at Bombay on or about the 29th day of December, 1949, in contravention of Government Notification No. 342/IV B, dated 27-1-46 issued under the Essential Supplies (Temporary Powers) Act, 1946, attempted to export by rail out of the State Of Bombay to Jalna, a place beyond the limits of Bombay State, 50 barrels of kerosene oil, without having any permit in that behalf, by misdescribing or causing the misdescription of the said barrels of oil as high speed diesel oil, and thereby committed an offence punishable under sections 7 and 8 of the Essential Supplies (Temporary Powers) Act. 2.Secondly, that you at Bombay, on or about the 29th day of December, 1949, attempted to export by rail 50 barrels of Kerosene oil by misdescribing or causing the misdescription of the same as high speed diesel oil, and abetted each other in the commission of the said offence and thereby committed an offence punishable under section 106 and 107 of the Indian Railway Act, read with section 114 of the Indian Penal Code. 3.In Cases Nos. 1784/P and 1785/P of 1950 the appellant was tired along with the same persons on similar charges in respect of two other lots of 50 and 15 barrels of Kerosene oil respectively. These two cases have given rise to Appeal No. 55 of 1951. 4.The circumstances under which these three cases arose are these. On the 29th December, 1949, three consignments of 50, 50 and 15 barrels had been booked from Wadi Bundar under the description of high speed diesel oil when in fact they contained kerosene oil and were to be despatched to Jalna. The police on getting information of this fact opened the railway wagons and took charge of the barrels kept in them. Accused 2,3 and 4 are members of a firm of commission agents. They had purchased the barrels of oil from Sunbeam Oil Company on behalf of three different principals. The first accused is a representative of one of these firms. Accused 5 and 6 are the godown keeper and the assistant godown keeper of the supplier company. All the barrels seized bore the mark Prakash Trades High Speed Diesel Oil, U. S. A. The third accused engaged two lorries to remove 100 barrels and they were loaded in the lorries and delivered to Sattar Latif, witness, who was the forwarding and carting agent at Wadi Bundar. He was instructed by the third accused for the booking of these barrels for Jalna in Hyderabad State, along with the third lot of 15 barrels. In the consignment note which concerned the 50 barrels purchased on behalf of the first accused his firm was shown as the consignor and the consignee was self. The consignment note was signed by Sattar Latif. In these documents the goods were described as high speed diesel oil. Similar consignment notes and risk notes were prepared in respect of the other two consignments. There was a ban on the export of kerosene oil to any place outside the State of Bombay. All the barrels had a white paint on them. It appeared to be new and below the paint on the barrels the words Kerosene oil was visible. On these facts the prosecution started three separate cases in respect of the three consignments of 50,50 and 15 barrels respectively on the charges set out above against all the six accused persons. All of them pleaded not guilty. 5.The fifth accused stated that accused 2 and 3 brought to him a delivery order asking him to delivery order asking him to deliver high speed diesel oil but that he delivered to them Kerosene oil at their request. The first accused admitted that he on behalf of his firm placed an order for 65 barrels of high speed diesel oil through the second accused but denied all knowledge about the alleged delivery of kerosene oil. The second accused said that he placed an order for diesel oil with Sunbeam Oil Company for 65 barrels and obtained a delivery order from the company and gave it to the third accused sent him to take delivery of the barrels from the godown of the company. He denied having told the fifth accused to deliver kerosene oil instead of diesel oil. The third accused admitted having taken delivery of the barrels on the instructions of the second accused and having sent them to Wadi Bundar in two lorries. He was surprised to learn that the barrels contained Kerosene oil. He denied that he ever asked the company to deliver kerosene oil for diesel oil. The fourth accused said that he personally took no part in the transaction and had committed no offence. The sixth accused stated that he had delivered the barrels as ordered by the fifth accused and had committed no offence. The learned Presidency Magistrate convicted accused 2,3 and 5 on the charges leveled against them and acquitted accused 1, 4 and 6 as he felt some doubt in regard to them. 6.The appellant (accused 3) in these two appeals was awarded the following sentences :- 1.In case No. 1783 P of 1950 he was sentenced to six months rigorous imprisonment and a fine of Rs. 15,000 under section 7 and 8 of the Essential Supplies (Temporary Powers) Act. For default in the payment of fine he was to undergo six months rigorous imprisonment. A fine of Rs. 1000 was awarded to him under section 106 of the Indian Railways Act and in default he was to undergo one month imprisonment. 2.In Case No. 1784-P of 1950, under section 7 and 8 of the Essential Supplies (Temporary Powers) Act he was awarded rigorous imprisonment for six months and a fine of Rs. 15,000 and in default six months rigorous imprisonment. Under the Railways Act he was fined in the sum of Rs. 1000 and in default he was ordered to undergo one month imprisonment. 3.In Case No. 1785-P of 1950, under section 7 and 8 of the Essential Supplies (Temporary Powers) Act he was awarded a sentence of one days imprisonment and a fine of Rs. 10,000 and in default rigorous imprisonment for six months. Under the Railways Act he was fined in the sum of Rs. 300 and in default he was ordered to undergo one month imprisonment. In the result in respect of these 115 barrels of oil a cumulative fine of Rs. 42,300 was imposed on the appellant besides the sentences of imprisonment. The learned Presidency Magistrate while imposing the sentence observed as follows :- Suchblack market transactions when detected must be crushed, else the common man has no escape from the plague. 7.On appeal the convictions and sentences were maintained except that the fine imposed on the fifth accused was remitted. The High Court held that having regard to the manner in which the offence was committed and the purpose for which kerosene was attempted to be sent outside the State of Bombay which obviously was to sell it in the black market the sentences passed could not be regarded as excessive. 8.The determination of the right measure of punishment is often a point of great difficulty and no hard and fast rule can be laid down, it being a matter of discretion which is to be guided by a variety of considerations, but the courts has always to bear in mind the necessity of proportion between an offence and the penalty. In imposing a fine it is necessary to have as much regard to the pecuniary circumstances of the accused persons to the character and magnitude of the offence, and where a substantial term of imprisonment is inflicted, an excessive fine should not accompany it except in exceptional cases. It seems to us that due regard has not been paid to these consideration in these cases and the zeal to crush the evil of black marketing and free the common man from this plague has perturbed the judicial mind in the determination of the measure of punishment. 9.The appellant was acting in these transactions on behalf of the first accused and other principals in the capacity of a member of a commission agency firm. It was asserted before us that its commission in this deal was half per cent on sale price. There is no evidence on the record about the accused pecuniary condition. His learned counsel emphatically asserted at the Bar that it was impossible for him to pay even a fraction of this heavy fine. The profit made on the sale of oil in the black market would in the ordinary course of business dealings go to the principals but its extent is not known nor found on the record. The first accused who was to profit by getting kerosene oil by this device has been acquitted and is not before us. The other persons on whose behalf the oil was purchased were not brought to trial. In these circumstances there is no material on the record justifying the imposition of such heavy fines on the appellant and these seem to us to be quite disproportionate to the offences. 10.It is no doubt true that the offence of black marketing is very generally prevalent in this country at the present moment and when it is brought home against a person, no leniency in the matter of sentence should be shown and a certain amount of severity may be very appropriate and even called for. In our opinion, however, when quite a substantial sentence of imprisonment was awarded to the appellant, a person belonging to the commission agency class, imposition of unduly heavy fines which may have been justified to some extent in the case of the principals, was not called for in his case. It is not the practice of this court to interfere by special leave in the matter of punishment imposed for crimes committed, except in exceptional cases where the sentences are unduly harsh and do not really advance the ends of justice. 12.For the reasons given above we think that it would meet the ends of justice if the fines imposed on the appellant by the Magistrate and upheld by the High Court are reduced in all cases as below :- 13.In case No. 1783-p of 1950, the sentence of fine is reduced to Rs. 1000 from Rs. 15000 and in default he will undergo imprisonment for a period of one month. 14.In case No. 1784-P of 1950, also the fine is reduced to Rs. 1000 from Rs. 15000 and in default he will undergo imprisonment for one month. 15.Similarly, in Case No. 1785-P of 1950, the sentence of fine is reduced to Rs. 1000 and in default he will undergo imprisonment for a month. 16.The fines in all the cases under the Indian Railways Act are reduced to one cumulative fine of Rs. 1000 instead of a fine of Rs. 2300 and in default he will undergo imprisonment for a month. In all other respects the appeals fail and are dismissed. 17.Sentences reduced."
    }

      

+3


source to share


1 answer


What I started with looks like this. Please note that how you search is just as important as how you index. And the first thing you want to set is what your users will give as input text (free range input, one word, can it specify which it should , optional ).

Afterwards, you need to establish which rules to match: exact match, phrase match, fuzzy match, do you care about winning or only if it matches, etc. You said a scoring mechanism which ranks results with exact match to be at the highest rank, then the non exact based matches according to their scores (say tf-idf )

.

This will be where I start:

{
  "settings": {
    "analysis": {
      "filter": {
        "filter_shingle": {
          "type": "shingle",
          "max_shingle_size": 8,
          "min_shingle_size": 2,
          "output_unigrams": false
        },
        "filter_stemmer": {
          "type": "porter_stem",
          "language": "english"
        }
      },
      "analyzer": {
        "ShingleAnalyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "snowball",
            "filter_stemmer",
            "filter_shingle"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "ShingleAnalyzer",
          "fields": {
            "raw_standard_analyzer": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

      

And a query that can have more should

s, depending on your rules for text matching
:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "text": "section 114 penal code"
          }
        },
        {
          "match": {
            "text.raw_standard_analyzer": "section 114 penal code"
          }
        }
      ]
    }
  }
}

      



Something like this in Java:

SearchResponse response = client().prepareSearch()         
        .setQuery(QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("text", "section 114 penal code"))
            .should(QueryBuilders.matchQuery("text.raw_standard_analyzer", "section 114 penal code")))
        .execute().actionGet();

      

Point:

  • you want more precise matches typed above: use shingles in one field and then do match

    in that field
  • you must also match regular words no matter where they are in the phrase: use the parser standard

    in the second field and add another one should

    withmatch

Then check it out, see what you come back. If you are not satisfied and you notice some of the documents that you wanted to get above, look at the documents, determine that they did not work, come up with a rule, find functions in ES that will help you implement a new rule, define a new field, add another one operator should

for this field.

+4


source







All Articles