How to manipulate subversion unicode-named files in Windows?
Let's say I am using Windows 7 with codepage 950 (Big5 Traditional Chinese), I want to manipulate some files mixed with unicode name like ็ฎไฝไธญๆๆไปถ.txt
(GB2312 Simplified Chinese) using svn.
If I am using chcp 950 when I run:
svn add .\็ฎไฝไธญๆๆไปถ.txt
I am getting the error:
svn: warning: W155010: 'D:\path\to\work-dir\?ไฝไธญๆๆไปถ.txt'
not found
svn: E200009: Could not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation
If I use chcp 65001 (UTF-8) I get an even worse error:
svn: warning: W155010: 'D:\path\to\work-dir\?ไฝsvn: E200009: C
ould not add all targets because some targets don't exist
svn: E200009: Illegal target for the requested operation
I would like to try chcp 1200 (UCS-LE) but it says:
Invalid code page
It seems that TortoiseSVN can manage these files correctly. However, I need to write scripts that call svn to run multiple automated jobs. Is there any solution available?
source to share
Programs such as svn that use the MS implementation for the IO functions of the standard library file cannot read command input or filenames containing characters outside the current code page. You would have to go chcp
to the appropriate code page for each file separately (e.g. 936 for Chinese).
In theory, the 65001 page code can span every character, but unfortunately the MS C runtime has serious bugs that usually break applications when this code page is used. Microsoft's persistent mistake in fixing this long-standing issue leaves UTF-8 a second-class citizen on Windows.
In the future it looks like http://subversion.tigris.org/issues/show_bug.cgi?id=1537 to fix this issue by using direct Win32 APIs instead of C stdlib to write to console, although I can't see where the associated change is code is to check if console login and file access are allowed.
source to share