Writing to file in UTF-8 encoding in FSDataOutputStream (Hadoop)?
I am trying to write a csv to a file in a MapReduce reducer function. Here is my code:
public class DataSet311Reducer extends Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path(key.toString().toLowerCase() + ".csv");
FSDataOutputStream os = fs.create(path);
os.writeChars("KEY,DATE,AGENCY,DESCRIPTOR,LOCATIONTYPE,INCIDENTZIP,INCIDENTADDRESS,LATITUDE,LONGITUDE\n");
StringBuilder sb = new StringBuilder();
for (Text value : values) {
sb.append(value.toString());
sb.append("|");
os.writeUTF(value.toString());
os.writeUTF("\n");
}
os.close();
context.write(key, new Text(sb.toString()));
}
}
I need to save a file in UTF-8 encoding for use with CartoDB. When checking the file file, it shows me
unspecified.csv: application/octet-stream; charset=binary
How to save content with correct encoding and header content?
+3
source to share
No one has answered this question yet
Check out similar questions: