Hadoop map code entry with hexadecimal values

Question

Hadoop map code entry with hexadecimal values

I have a list of tweets as input to hdfs and am trying to do a map shrink task. This is my mapper implementation:

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
  try {
    String[] fields = value.toString().split("\t");
    StringBuilder sb = new StringBuilder();
    for (int i = 1; i < fields.length; i++) {
      if (i > 1) {
        sb.append("\t");
      }
      sb.append(fields[i]);
    }
    tid.set(fields[0]);
    content.set(sb.toString());
    context.write(tid, content);
  } catch(DecoderException e) {
    e.printStackTrace();
  }
}

As you can see, I tried to split the input into "\ t", but the input (value.toString ()) looks like this when I print it out:

2014\x091880284777\x09argento_un\x090\x090\x09RT @topmusic619: #RETWEET THIS!!!!!\x5CnFOLLOW ME &amp
; EVERYONE ELSE THAT RETWEETS THIS FOR 35+ FOLLOWERS\x5Cn#TeamFollowBack #Follow2BeFollowed #TajF\xE2\x80\xA6

here's another example:

2014\x0934447260\x09RBEKP\x090\x090\x09\xE2\x80\x9C@LENEsipper: Wild lmfaooo RT @Yerrp08: L**o some
 n***a nutt up while gettin twerked

I noted that there \x09

should be a tab character (ASCII 09 - tab), so I tried using apache Hex

:

    String tmp = value.toString();
    byte[] bytes = Hex.decodeHex(tmp.toCharArray());

But the function decodeHex

returns null.

This is weird as some of the characters are in hex and others are not. How can I decode them?

Edit: Also note that apart from tab

, emojis

also encoded as hex values.

+3

java string unicode utf-8 hadoop

OMGPOP Apr 14. 17 at 4:19 am

source to share

No one has answered this question yet

Check out similar questions:

6170

Is Java "pass-by-reference" or "pass-by-value"?

2171

How to determine if an array contains a specific value in Java?

1818

How to get enum value from string value in Java?

1544

Sorting map <Key, value> by value

1492

Does Java support default parameter values?

996

What are the possible Hibernate hbm2ddl.auto config values and what do they do

663

Convert hex string to int in Python

646

Working with "Xerces hell" in Java / Maven?

598

How can I convert a byte array to a hex string in Java?

0

Hadoop- Basic question regarding display function input

Hadoop map code entry with hexadecimal values

More articles: