Get matched array element in Elastic Search query

In the movie database, I save the ratings (0 to 5 stars) that users have assigned for each movie. I have the following document structure indexed in Elastic Search (version 1.2.2)

"_index": "my_index"
"_type": "film",
"_id": "6629",
"_source": {
  "id": "6629",
  "title": "Fight Club",
  "ratings" : [
    { "user_id" : 1234, "rating_value" : 3 },
    { "user_id" : 4567, "rating_value" : 2 },
    { "user_id" : 7890, "rating_value" : 1 }
    .....
  ]
}

"_index": "my_index"
"_type": "film",
"_id": "6630",
"_source": {
  "id": "6630",
  "title": "Pulp Fiction",
  "ratings" : [
    { "user_id" : 1234, "rating_value" : 1 },
    { "user_id" : 7654, "rating_value" : 2 },
    { "user_id" : 4321, "rating_value" : 5 }
    .....
  ]
}

      

etc.

My goal is to get in one search all movies rated by a user (say user 1234), along with rating_value

If I do the following search

GET my_index/film/_search
{
  "query": {
    "match": {
      "ratings.user_id": "1234"
    }
  }
}

      

I get the entire document for all matched movies, and then I have to parse the entire array of ratings to see which element of the array matches my request, and what is the rating_value associated with user_id 1234.

Ideally I would like the result of this query to be

"hits": [ {
  "_index": "my_index"
  "_type": "film",
  "_id": "6629",
  "_source": {
    "id": "6629",
    "title": "Fight Club",
    "ratings" : [
      { "user_id" : 1234, "rating_value" : 3 }, // <= only the row that matches the query
    ]
  },
  "_index": "my_index"
  "_type": "film",
  "_id": "6630",
  "_source": {
    "id": "6630",
    "title": "Pulp Fiction",
    "ratings" : [
      { "user_id" : 1234, "rating_value" : 1 },  // <= only the row that matches the query
    ]
  }
} ]

      

Thank you in advance

+3


source to share


2 answers


I was able to get the values ​​using aggregations as mentioned in my previous comment.

Here follows how I did it.

First, the mapping I'm using is:

PUT test/movie/_mapping
{
  "properties": {
    "title":{
      "type": "string",
      "index": "not_analyzed"
    },
    "ratings": {
      "type": "nested"
    }
  }
}

      

I chose not to index the header, but you can use the fields attribute and store it as a "raw" field.

The movies are then indexed:

PUT test/movie/6629
{
  "title": "Fight Club",
  "ratings" : [
    { "user_id" : 1234, "rating_value" : 3 },
    { "user_id" : 4567, "rating_value" : 2 },
    { "user_id" : 7890, "rating_value" : 1 }
  ]
}


PUT test/movie/4456
{
  "title": "Jumanji",
  "ratings" : [
    { "user_id" : 1234, "rating_value" : 4 },
    { "user_id" : 4567, "rating_value" : 3 },
    { "user_id" : 4630, "rating_value" : 5 }
  ]
}

PUT test/movie/6547
{
  "title": "Hook",
  "ratings" : [
    { "user_id" : 1234, "rating_value" : 4 },
    { "user_id" : 7890, "rating_value" : 1 }
  ]
}

      



Aggregation request:

GET test/movie/_search
{
  "aggs": {
    "by_movie": {
      "terms": {
        "field": "title"
      },
      "aggs": {
        "ratings_by_user": {
          "nested": {
            "path": "ratings"
          },"aggs": {
            "for_user_1234": {
              "filter": {
                "term": {
                  "ratings.user_id": "1234"
                }
              },
              "aggs": {
                "rating_value": {
                  "terms": {
                    "field": "ratings.rating_value"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

      

Finally, here is the output from this query against previous docs:

"aggregations": {
  "by_movie": {
     "buckets": [
        {
           "key": "Fight Club",
           "doc_count": 1,
           "ratings_by_user": {
              "doc_count": 3,
              "for_user_1234": {
                 "doc_count": 1,
                 "rating_value": {
                    "buckets": [
                       {
                          "key": 3,
                          "key_as_string": "3",
                          "doc_count": 1
                       }
                    ]
                 }
              }
           }
        },
        {
           "key": "Hook",
           "doc_count": 1,
           "ratings_by_user": {
              "doc_count": 2,
              "for_user_1234": {
                 "doc_count": 1,
                 "rating_value": {
                    "buckets": [
                       {
                          "key": 4,
                          "key_as_string": "4",
                          "doc_count": 1
                       }
                    ]
                 }
              }
           }
        },
        {
           "key": "Jumanji",
           "doc_count": 1,
           "ratings_by_user": {
              "doc_count": 3,
              "for_user_1234": {
                 "doc_count": 1,
                 "rating_value": {
                    "buckets": [
                       {
                          "key": 4,
                          "key_as_string": "4",
                          "doc_count": 1
                       }
                    ]
                 }
              }
           }
        }
     ]
  }

      

}

It's a bit tedious due to the nested syntax, but you should be able to get the user provided rating (here, 1234) for each movie.

Hope this helps!

+3


source


Save ratings as subdocuments (or children) so you can request them separately.



A good explanation of the differences between nested documents and children can be found here: http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

+2


source







All Articles