Git -p4 post and author coding
Today I can migrate some pretty old perforce repositories to git. While this is really interesting, there is one thing that caught my attention. All the special characters in the commit messages and even the author names are not correctly encoded.
So, I tried to figure out where the problem came from.
- first of all the perforce server does not support unicode, so setting P4CHARSET has no effect, but
Unicode clients require a unicode enabled server.
- then I checked the output of simple commands such as
p4 users
where is really in ANSI (consulting notepad ++ or ISO-8859-1 as perfile -bi
when redirected output) - the command
locale
reports LANG = en_US.UTF-8 ...
In the end, I think all p4 output is in ISO-8859-1, but git-p4 uses UTF-8 instead.
I have tried rewriting the commit messages with
git filter-branch --msg-filter 'iconv -f iso-8859-1 -t utf-8' -- --all
but that doesn't fix the problem, especially if they aren't designed to rewrite author names.
Can anyone suggest how to get the output to be translated to UTF-8 before git-p4 gets it?
Update:
I tried to "overwrite" the default p4 commands with a simple shell script I added to the PATH
/usr/bin/p4 $@ | iconv -f iso-8859-1 -t utf-8
but this destroys the python marshalling objects that are obviously being used:
File "/usr/local/bin/git-p4", line 2467, in getBranchMapping
for info in p4CmdList(command):
File "/usr/local/bin/git-p4", line 480, in p4CmdList
entry = marshal.load(p4.stdout)
ValueError: bad marshal data
Update2:
As shown here Changing the default encoding for Python? I tried to set python encoding to ascii:
export export PYTHONIOENCODING="ascii"
python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'
Output:
('ascii', 'ascii')
but all posts and authors have been wrongly moved.
Update 3:
Even trying to fix the git -p4.py function def commit(self, details, files, branch, parent = "")
didn't help: Change
self.gitStream.write(details["desc"])
to one of these
self.gitStream.write(details["desc"].encode('utf8', 'replace'))
self.gitStream.write(unicode(details["desc"],'utf8')
just raised:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 29: ordinal not in range(128)
as i am not a python developer i have no idea what to do next.
source to share