Git -p4 post and author coding

Today I can migrate some pretty old perforce repositories to git. While this is really interesting, there is one thing that caught my attention. All the special characters in the commit messages and even the author names are not correctly encoded.

So, I tried to figure out where the problem came from.

  • first of all the perforce server does not support unicode, so setting P4CHARSET has no effect, but Unicode clients require a unicode enabled server.

  • then I checked the output of simple commands such as p4 users

    where is really in ANSI (consulting notepad ++ or ISO-8859-1 as per file -bi

    when redirected output)
  • the command locale

    reports LANG = en_US.UTF-8 ...

In the end, I think all p4 output is in ISO-8859-1, but git-p4 uses UTF-8 instead.

I have tried rewriting the commit messages with

git filter-branch --msg-filter 'iconv -f iso-8859-1 -t utf-8' -- --all

      

but that doesn't fix the problem, especially if they aren't designed to rewrite author names.

Can anyone suggest how to get the output to be translated to UTF-8 before git-p4 gets it?

Update:

I tried to "overwrite" the default p4 commands with a simple shell script I added to the PATH

/usr/bin/p4 $@ | iconv -f iso-8859-1 -t utf-8

      

but this destroys the python marshalling objects that are obviously being used:

  File "/usr/local/bin/git-p4", line 2467, in getBranchMapping
    for info in p4CmdList(command):
  File "/usr/local/bin/git-p4", line 480, in p4CmdList
    entry = marshal.load(p4.stdout)
ValueError: bad marshal data

      

Update2:

As shown here Changing the default encoding for Python? I tried to set python encoding to ascii:

export export PYTHONIOENCODING="ascii"
python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'

      

Output:

('ascii', 'ascii')

      

but all posts and authors have been wrongly moved.

Update 3:

Even trying to fix the git -p4.py function def commit(self, details, files, branch, parent = "")

didn't help: Change

self.gitStream.write(details["desc"])

      

to one of these

self.gitStream.write(details["desc"].encode('utf8', 'replace'))
self.gitStream.write(unicode(details["desc"],'utf8')

      

just raised:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 29: ordinal not in range(128)

      

as i am not a python developer i have no idea what to do next.

+3


source to share


1 answer


I suspect the type details["desc"]

is a byte string. (str for python2).

Therefore you need decode

Unicode before you get encode

it.

print type(details["desc"])

      



to find out the type.

details["desc"].decode("iso-8859-1").encode("UTF-8")

      

can help convert from iso-8859-1 to UTF-8.

+1


source







All Articles