Filter tuple by nested inner bag value

I am very beginner on PigLatin and I need help (basic I think).

My DESCRIBE details:

xmlToTuple: {(node_attr_id: int,tag: {(tag_attr_k: chararray,tag_attr_v: chararray)})}

      

and DUMP:

((704398904,{(lat,-13.00583333),(lon,45.24166667)}))
((1230941976,{(place,village)}))
((1230941977,{(name,Mtsahara)}))
((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))

      

I want to retrieve a record that has a specific value for the tag_attr_k field. For example, give me an entry that has tag_attr_k = amesity? It should be:

((1751057677,{(amenity,fast_food),(name,Brochetterie)}))
((100948360,{(amenity,ferry_terminal)}))
((362795028,{(amenity,fuel),(operator,Total)}))

      

Can anyone explain this to me? I lost a little ...

+3


source to share


3 answers


I found!



 XmlTag = FOREACH xmlToTuple GENERATE FLATTEN ($0);
    XmlTag2 = FOREACH XmlTag {
        tag_with_amenity = FILTER tag BY (tag_attr_k == 'amenity');
        GENERATE *, COUNT(tag_with_amenity) AS count;
    };
    XmlTag3 = FOREACH (FILTER XmlTag2 BY count > 0) GENERATE node_attr_id, node_attr_lon, node_attr_lat, tag;

      

+2


source


You should use a map instead of a bundle of tuples. The keys will be your tag_attr_k

s, and your values ​​will be tag_attr_v

s. So one line of your data would be for example

(1751057677,['amenity'#'fast_food', 'name',#'Brochetterie'])

      



Then you can check if the key exists by trying to access it and check if it matters NULL

.

FILTER xml BY tag_attr#'amenity' IS NOT NULL;

+3


source


To do this, you must use map

, not a list of tuples. Maps are built for this very purpose. http://pig.apache.org/docs/r0.10.0/basic.html#data-types

To filter, you run:

B = FILTER A BY mymap#'amenity' IS NOT NULL;

      

+1


source







All Articles