Sorting in MapReduce Hadoop

I have a few basic questions about Hadoop MapReduce.

  • Suppose 100 copies are executed and zero reducer. Will it create 100 files? Are all people sorted? Is the output sorted throughout the cartographer?
  • Gearbox input - Key β†’ Values. For each key, are all values ​​sorted?
  • Suppose 50 reducers have been made. Will it generate 50 files? Are all individual files sorted? Are all gearbox outputs sorted?

Is there a place where guaranteed sorting happens in MapReduce?

+3


source to share


1 answer


1. Take if done 100 cards and zero gear. Will it generate 100 files?

Yes.

Are all people sorted?

Not. If gearboxes are not used, then the mapper output is not sorted. Sorting is performed only during the decrease phase.

Are all sorter outputs sorted?

No, for the same reason as above.

2. Input for the gearbox - Key β†’ Values. For each key, are all values ​​sorted?

Not. However, the keys are sorted. After the shuffle phase, in which the reducer receives the mapper output, it resets the sorted mapper output keys (since there is a decrement phase) and when it starts decreasing, the keys are sorted.



3. Use if 50 gearboxes are made. Will it generate 50 files?

Yes. (unless you are using MultipleOutputs )

Are all individual files sorted?

Not. Sorted input does not guarantee that the output is sorted. The output depends on the algorithm you use in the reduction method.

Are the entire gearbox outlet sorted?

No, for the same reason as above. However, if you are using Identity Reducer i.e. You just write the reducer input as you receive it, the reducer output will be sorted by PER REDUCER, not globally.

Is there a place where guaranteed sorting happens in MapReduce?

Sorting occurs when there is a decrement phase and is applied to the output keys of each mapper and the input keys of each reducer. If you want to globally sort the reducer input, you can either use a single reducer or TotalOrderPartitioner , which is a bit tricky ...

+5


source







All Articles