Mapreduce implementation

The input data is a json file and the structure of the records is:

{id = x, h1 = 0.1, h2 = 0.3, h3 = 0.8, h4 = 0.7}.

The challenge is to implement mapreduce to get "h" triplets containing the peak. In the previous example, the output x-> h2,h3,h4

is because the h3 value is higher than its neighborhood. My idea - to implement a card that creates an entry as a x->h1(0.1)

, x->h2(0.3)

, x->h3(0.4)

..., and then cut that removes the peaks.

Is it correct? The map renders the step useless because the shuffle and sort step returns more or less the original structure. Does this introduce overhead? Or is it something bearable if you choose to use the MR paradigm?

+3


source to share





All Articles