How to generate a JSON object from unstructured data in Hadoop MR?

I have a dataset as parent, child
------------
a, b
a, c
b, d
b, e
c, f
c, d
d, h
r, i
r, d
r, dd
, sd
, t
what I want to convert to JSON object . I'm trying to do it but don't know the right approach. so I am just creating a dataset tree structure that can help solve it. Can you give me a suggestion on what to do to achieve this.
enter image description here
I am facing a problem how to determine the parent of a node. if there are two trees in the image. please suggest me how can i do this.

The result of this should be

{
    a:{
            b: {d,e},
            c: { g: {h,i}, f }
        },
    p:{     q:{s,t}, r }
}

      

+3


source to share


1 answer


First of all, I would start looking for all the roots. This can be done by building a tree structure from input pairs using some existing module (I don't know which one ...), but it can be done easier. If you create two sets, one with all the parents and the other with all the children, and then subtract the parents from the children, you get the roots.

Then, starting from the roots, you can build a dictionary and then convert it to JSON (sorry, I'm not good at java, so I'm writing this in pseudocode).

for r in roots:
    ## init dictionary key with value = dictionary or Null  
    dict[r] = build_tree(r, pairs)

      



So, obviously, build_tree () should be a recursive dictionary update, where at each step you create a new dict with keys that are child parents of the input, and values ​​equal to build_tree () or Null if no more children are found. The last step, of course, should be a method to save the dictionary as JSON.

Note: the above assumes that you do not have circles, but if you do, you need a more complex algorithm and make some assumptions about how you, for example, stop recursion and make "some" reference to the beginning of the circle.

0


source







All Articles