Read and write file with windows-1252
I am trying to write a file with some German characters to disk and read it using Windows-1252
encoding. I don't understand why, but my output looks like this:
<title>W�hrend und im Anschluss an die Exkursion stehen Ihnen die Ansprechpartner f�r O-T�ne</title>
<p>Die Themen im �berblick</p>
Any thoughts? Here is my code. You will need spring-core and commons-io to get it running.
private static void write(String fileName, Charset charset) throws IOException {
String html = "<html xmlns=\"http://www.w3.org/1999/xhtml\">" +
"<head>" +
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=windows-1252\">" +
"<title>Während und im Anschluss an die Exkursion stehen Ihnen die Ansprechpartner für O-Töne</title>" +
"</head>" +
"<body>" +
"<p>Die Themen im Überblick</p>" +
"</body>" +
"</html>";
byte[] bytes = html.getBytes(charset);
FileOutputStream outputStream = new FileOutputStream(fileName);
OutputStreamWriter writer = new OutputStreamWriter(outputStream, charset);
IOUtils.write(bytes, writer);
writer.close();
outputStream.close();
}
private static void read(String file, Charset windowsCharset) throws IOException {
ClassPathResource pathResource = new ClassPathResource(file);
String string = IOUtils.toString(pathResource.getInputStream(), windowsCharset);
System.out.println(string);
}
public static void main(String[] args) throws IOException {
Charset windowsCharset = Charset.forName("windows-1252");
String file = "test.txt";
write(file, windowsCharset);
read(file, windowsCharset);
}
source to share
Your writing method is wrong. You are using writing to write bytes. A writer must be used to write characters or strings.
You have already encoded a string in bytes with a line
byte[] bytes = html.getBytes(charset);
These bytes can simply be written to the output stream:
IOUtils.write(bytes, outputStream);
This makes the writer unnecessary (removes him) and now you get the correct output.
source to share
First make sure the compiler and editor are using the same encoding. This can be verified by trying (ugly) \uXXXX
escaping:
während w\u00E4hrend
Then
"<meta http-equiv='Content-Type' content='text/html; charset="
+ charset.name() + "' />" +
byte[] bytes = html.getBytes(charset);
Files.write(Paths.get(fileName), bytes);
Ahh, check that the file is in Windows-1252 too. A programmer's editor like NotePad ++ or JEdit lets you play with encodings.
source to share