Hadoop map code entry with hexadecimal values

I have a list of tweets as input to hdfs and am trying to do a map shrink task. This is my mapper implementation:

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
  try {
    String[] fields = value.toString().split("\t");
    StringBuilder sb = new StringBuilder();
    for (int i = 1; i < fields.length; i++) {
      if (i > 1) {
        sb.append("\t");
      }
      sb.append(fields[i]);
    }
    tid.set(fields[0]);
    content.set(sb.toString());
    context.write(tid, content);
  } catch(DecoderException e) {
    e.printStackTrace();
  }
}

      

As you can see, I tried to split the input into "\ t", but the input (value.toString ()) looks like this when I print it out:

2014\x091880284777\x09argento_un\x090\x090\x09RT @topmusic619: #RETWEET THIS!!!!!\x5CnFOLLOW ME &amp
; EVERYONE ELSE THAT RETWEETS THIS FOR 35+ FOLLOWERS\x5Cn#TeamFollowBack #Follow2BeFollowed #TajF\xE2\x80\xA6

      

here's another example:

2014\x0934447260\x09RBEKP\x090\x090\x09\xE2\x80\x9C@LENEsipper: Wild lmfaooo RT @Yerrp08: L**o some
 n***a nutt up while gettin twerked

      

I noted that there \x09

should be a tab character (ASCII 09 - tab), so I tried using apache Hex

:

    String tmp = value.toString();
    byte[] bytes = Hex.decodeHex(tmp.toCharArray());

      

But the function decodeHex

returns null.

This is weird as some of the characters are in hex and others are not. How can I decode them?

Edit: Also note that apart from tab

, emojis

also encoded as hex values.

+3
java string unicode utf-8 hadoop


source to share


No one has answered this question yet

Check out similar questions:

6170
Is Java "pass-by-reference" or "pass-by-value"?
2171
How to determine if an array contains a specific value in Java?
1818
How to get enum value from string value in Java?
1544
Sorting map <Key, value> by value
1492
Does Java support default parameter values?
996
What are the possible Hibernate hbm2ddl.auto config values ​​and what do they do
663
Convert hex string to int in Python
646
Working with "Xerces hell" in Java / Maven?
598
How can I convert a byte array to a hex string in Java?
0
Hadoop- Basic question regarding display function input



All Articles
Loading...
X
Show
Funny
Dev
Pics