What logic should be used when using custom separator on the map to solve this problem
SOLUTION 1: I think the path is a combiner, not a delimiter. The combinator will sum the local sums of words starting with the letter "A" and then emit a partial sum (not number 1 always) to the reducers.
SOLUTION 2: However, if you insist on using a custom separator for this, you can simply handle words starting with the letter "A" in a separate reducer than all other words, ie. dedicate reducer only for words starting with the letter "A".
SOLUTION 3: Also, if you don't mind "cheating" a bit, you can define a counter for words starting with the letter "A" and increase it during the map phase. Then just ignore those words (no need to send them over the network) and use the default separator for other words. When the job ends, retrieve the counter value.
SOLUTION 4: If you don't mind "cheating" even more, define 26 counters, one for each letter, and simply increase them in the map phase, according to the first letter of the current word. You cannot use reducers (set the number of reducers to 0). This will save all sorting and shuffling. When the job ends, retrieve the value of all counters.
source to share