How to manipulate subversion unicode-named files in Windows?

Let's say I am using Windows 7 with codepage 950 (Big5 Traditional Chinese), I want to manipulate some files mixed with unicode name like ็ฎ€ไฝ“ไธญๆ–‡ๆ–‡ไปถ.txt

(GB2312 Simplified Chinese) using svn.

If I am using chcp 950 when I run:

svn add .\็ฎ€ไฝ“ไธญๆ–‡ๆ–‡ไปถ.txt

      

I am getting the error:

svn: warning: W155010: 'D:\path\to\work-dir\?ไฝ“ไธญๆ–‡ๆ–‡ไปถ.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

      

If I use chcp 65001 (UTF-8) I get an even worse error:

svn: warning: W155010: 'D:\path\to\work-dir\?ไฝ“svn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation

      

I would like to try chcp 1200 (UCS-LE) but it says:

Invalid code page

      

It seems that TortoiseSVN can manage these files correctly. However, I need to write scripts that call svn to run multiple automated jobs. Is there any solution available?

+3


source to share


1 answer


Programs such as svn that use the MS implementation for the IO functions of the standard library file cannot read command input or filenames containing characters outside the current code page. You would have to go chcp

to the appropriate code page for each file separately (e.g. 936 for Chinese).

In theory, the 65001 page code can span every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is used. Microsoft's persistent mistake in fixing this long-standing issue leaves UTF-8 a second-class citizen on Windows.



In the future it looks like http://subversion.tigris.org/issues/show_bug.cgi?id=1537 to fix this issue by using direct Win32 APIs instead of C stdlib to write to console, although I can't see where the associated change is code is to check if console login and file access are allowed.

+1


source







All Articles