Creating a tar archive with national symbols in Java

Do you know some library / way in Java to create a tar archive with filenames in the appropriate Windows national codepage (like cp1250).

I tried with Java tar , example code:

final TarEntry entry = new TarEntry( files[i] );
String filename = files[i].getPath().replaceAll( baseDir, "" );
entry.setName( new String( filename.getBytes(), "Cp1250" ) );
out.putNextEntry( entry );
...

      

This does not work. National symbols are broken when I remove the tar in the windows. I also found a strange thing: under Linux, Polish national characters are displayed correctly only when I used ISO-8859-1:

entry.setName( new String( filename.getBytes(), "ISO-8859-1" ) );

      

Even though the correct Polish codepage is ISO-8859-2, that doesn't work either. I also tried Cp852 for windows, no effect.

I know the limitations of the tar format, but changing it is not an option.

Thanks for the suggestions,

+2


source to share


2 answers


Officially, TAR does not support non-ASCII headers. However, I was able to use UTF-8 encoded names on Linux.

You should try this,



String filename = files[i].getName();
byte[] bytes = filename.getBytes("Cp1250")
entry.setName(new String(bytes, "ISO-8859-1"));
out.putNextEntry( entry );

      

This at least keeps the bytes in Cp1250 in the TAR headers.

+1


source


tar does not allow non-ASCII values ​​in headers. If you try a different encoding, the result probably depends on what the target platform decides to do with those byte values. It looks like your target tar program is interpreting the bytes as ISO-8859-1, so it "works".

Have a look at extended attributes? http://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5&manpath=FreeBSD+8-current



I'm not an expert here, but this seems to be the only official way to put any non-ASCII values ​​in the header of a tar file.

0


source







All Articles