Elasticsearch aggregates for faceted search excluding some fields

I have a store that uses elasticsearch 2.4 for grant search. But for now the existing filters (product attributes) are taken from mysql. I want to do this using elasticsearch clusters. But I got a problem: I don't need to aggregate all the attributes.

What is:

Display part:

...
'is_active' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'category_id' => [
    'type' => 'long',
    'index' => 'not_analyzed',
],
'attrs' => [
    'properties' => [
        'attr_name' => ['type' => 'string', 'index'     => 'not_analyzed'],
        'value' => [
            'type' => 'string',
            'index' => 'analyzed',
            'analyzer' => 'attrs_analizer',
        ],
    ]
],
...

      

Sample data:

{
    "id": 1,
    "is_active": "1",
    "category_id": 189,
    ...
    "price": "48.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "TP-Link"
      },
      {
        "attr_name": "Model",
        "value": "TL-1"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 2,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "12.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Lenovo"
      },
      {
        "attr_name": "Model",
        "value": "B570"
      },
      {
        "attr_name": "OS",
        "value": "Linux"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  },
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    ...
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "Windows"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

      

Attributes such as Model and Other are not used when filtering products, they only appear on the product page. For other attributes (Brand, OS and others ...) I want to get clusters.

When I try to aggregate a field attrs.value

, of course I get aggregations for all the data (including large "Other" fields, which may have a lot of HTML in them).

"aggs": {
    "facet_value": {
      "terms": {
        "field": "attrs.value",
        "size": 0
      }
    }
  }

      

How to exclude "attrs.attr_name": ["Model", "Other"]

?

Changing the display is a bad solution for me, but if it's unavoidable, tell me how to do it? Think I need to nest "attrs"?

UPD:

I want to receive: 1. All attributes that products have in a certain category, except for those that I specify in the settings of my system (in this example, I will exclude "Model" and "Other"). 2. The number of products next to each value.

It should look like this:

For the category "Laptops":

Brand:

  • Lenovo (18)
  • Asus (19)
  • .....

OS:

  • Windows (19)
  • Linux (5)
  • ...

For "computer monitors":

Brand:

  • Samsung (18)
  • LG (19)
  • .....

Resolution:

  • 1360x768 (19)
  • 1920x1080 (22)
  • ....

This is Term Aggregation . I use this for the number of products for each category. And I've tried this for attrs.value

, but I don't know how to exclude "attrs.value" that refer to " attrs.attr_name": "Model"

and "attrs.attr_name": "Other"

.

UPD2:

In my case, if map attrs is as a nested type, the index weight is increased by 30%. from 2700 to 3510Mi. If there is no other option, I'll have to come to terms with it.

+3


source to share


1 answer


you need to map the first attrs as a nested type and use nested aggregates .

PUT no_play
{
  "mappings": {
    "document_type" : {
      "properties": {
        "is_active" : {
          "type": "long"
        },
        "category_id" : {
          "type": "long"
        },
        "attrs" : {
          "type": "nested", 
          "properties": {
            "attr_name" : {
              "type" : "keyword"
            },
            "value" : {
              "type" : "keyword"
            }
          }
        }
      }
    }
  }
}


POST no_play/document_type
  {
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      },
      {
        "attr_name": "Other",
        "value": "<div>Some text of 'Other' property<br><img src......><ul><li>......</ul></div>"
      }
    ]
  }

      

Since you haven't specified how you want to combine.

Case 1) If you want to treat attrs as individual. This metric gives you the number of expressions in terms.

POST no_play/_search
{
  "size": 0,
  "aggs": {
    "nested_aggregation_value": {
      "nested": {
        "path": "attrs"
      },
      "aggs": {
        "value_term": {
          "terms": {
            "field": "attrs.value",
            "size": 10
          }
        }
      }
    }
  }
}

POST no_play/_search
    {
      "size": 0,
      "aggs": {
        "nested_aggregation_value": {
          "nested": {
            "path": "attrs"
          },
          "aggs": {
            "value_term": {
              "terms": {
                "field": "attrs.value",
                "size": 10
              },
              "aggs": {
                "reverse_back_to_roots": {
                  "reverse_nested": {
                  }
                }
              }
            }
          }
        }
      }
    }

      

Now, to get the root document score using the attrs value, you will need to bind the reverse nested aggregation to move the aggregator down to the root document level.

Think about the next document.



{
    "id": 3,
    "is_active": "1",
    "category_id": 242,
    "price": "24.00",
    "attrs": [
      {
        "attr_name": "Brand",
        "value": "Asus"
      },
      {
        "attr_name": "Model",
        "value": "QZ85"
      },
      {
        "attr_name": "OS",
        "value": "repeated value"
      },
      {
        "attr_name": "Other",
        "value": "repeated value"
      }
    ]
  }

      

For the first request, the count for the 'repeat value' will be 2, and for the second request it will be 1

Note

this is how you can do filtering to exclude

POST no_play/_search
{
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            }
                        }
                    }
                }
            }
        }
    }
}


POST no_play/_search
 {
    "size": 0,
    "aggs": {
        "nested_aggregation_value": {
            "nested": {
                "path": "attrs"
            },
            "aggs": {
                "filtered_results": {
                    "filter": {
                        "bool": {
                            "must_not": [{
                                "terms": {
                                    "attrs.attr_name": ["Model", "Brand"]
                                }
                            }]
                        }
                    },
                    "aggs": {
                        "value_term": {
                            "terms": {
                                "field": "attrs.value",
                                "size": 10
                            },
                            "aggs": {
                                "reverse_back_to_roots": {
                                    "reverse_nested": {}
                                }
                            }
                        }
                    }
                }
            }
        }
    }
 }

      

thank

0


source







All Articles