Writing to file in UTF-8 encoding in FSDataOutputStream (Hadoop)?

I am trying to write a csv to a file in a MapReduce reducer function. Here is my code:

public class DataSet311Reducer extends Reducer<Text, Text, Text, Text> {

  @Override
  public void reduce(Text key, Iterable<Text> values, Context context)
      throws IOException, InterruptedException {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    Path path = new Path(key.toString().toLowerCase() + ".csv");
    FSDataOutputStream os = fs.create(path);
    os.writeChars("KEY,DATE,AGENCY,DESCRIPTOR,LOCATIONTYPE,INCIDENTZIP,INCIDENTADDRESS,LATITUDE,LONGITUDE\n");
    StringBuilder sb = new StringBuilder();
    for (Text value : values) {
      sb.append(value.toString());
      sb.append("|");
      os.writeUTF(value.toString());
      os.writeUTF("\n");
    }
    os.close();
    context.write(key, new Text(sb.toString()));
  }
}

      

I need to save a file in UTF-8 encoding for use with CartoDB. When checking the file file, it shows me

unspecified.csv: application/octet-stream; charset=binary

How to save content with correct encoding and header content?

+3
java csv utf-8 mapreduce cartodb


source to share


No one has answered this question yet

Check out similar questions:

1571
How to avoid Java code in JSP files?
1315
How do I create a file and write it in Java?
1140
UTF-8 all the way through
674
What's the difference between UTF-8 and UTF-8 without BOM?
571
Writing DataFrame for pandas to CSV file
505
Excel to CSV with UTF8 encoding
390
Is it possible to force Excel to recognize CSV UTF-8 files automatically?
389
Working with UTF-8 encoding in Python source code
318
UTF-8, UTF-16 and UTF-32
6
Key mismatch type from card: expected .. Text received ... LongWritable



All Articles
Loading...
X
Show
Funny
Dev
Pics