Git and renaming and replacing files

I usually don't have a problem with renaming with git, but I'm facing a really tricky problem I'm trying to solve.

For various reasons, I have a situation where we have a file dir1/file

. Due to some long-standing decisions, it is in a completely wrong place and needs to be moved to dir2/file

.

However, there is a lot of code that needs to be changed, and for various reasons, we need to save the file in a new location and in an old location for a while.

So a natural (ish) approach would be to do this:

git mv dir1/file dir2/file
git commit -a

      

so far so good:

> git diff master --name-status --find-renames
R100 dir1/file dir2/file

      

So then we do

ln -s ../dir2/file dir1/file
git commit -a

      

but it happens

> git diff master --name-status --find-renames
A    dir2/file
T    dir1/file

      

And if someone changes dir1/file

to master and I try to pull it out, I am told that the merge conflict with dir1/file1

and dir2/file1

remains the same. I thought I was reading other posts that tracked git content but seem to track file names as well as content. And the fact that the content has been moved is completely absent.

So how do I get git to know that I actually renamed the file and then added a new file that just has the same name as the old one?

Note. I would rather not do this multiple times. There are several such files that are affected, and the likelihood that someone is making changes to one of them in parallel is quite high and there is no guarantee that they will be able to pull to get renamed and then pull to get a soft link ...

Example of adding. I was removing a function from a python module __init__.py

that should never have been there, __init__.py

should have been empty. This also doesn't show up as a rename. Even though the content of the new file is 99% identical to the original __init__.py

, and the content of the new __init__.py

0% is identical to the content of the old one. Everything is fine until I add a file with the same name.

+3


source to share


1 answer


Git essentially tracks content, not - or rather we should say "in addition to" - us. Diff goes wrong because it git diff

(necessarily) tries to match names and compare the contents of two separate commits (or one commit and current working directory, or one commit and current index, etc., but these are just variations on the "compare two commits ").

More specifically, when git diff

comparing trees 1T1

and T2

, by default, it is assumed that the only candidates for renaming are those where some filename exists in T1

but not in T2

, and another (different) filename exists in T2

but not in T1

.

So when you make the first commit, you have two commits - let those A and B - with two trees where it dir1/file1

"disappears" from A and dir2/file2

reappears in B. This is a candidate for renaming -detection, and since the content of the file is 100% identical, git easily sees the rename and gives you the R100

diff output .

When you make a second commit, you add commit C with a third tree. Comparing B and C works great: dir2/file

appears in both, and the new symlink dir1/file

only appears in C, and the diff output from that pair is good too. The problem arises when comparing A and C: now dir1/file1

appears in both, but dir2/file2

- only in C, and git diff

does not understand that there is a candidate for renaming.

There is a flag --find-copies-harder

- or you can specify -C

more than once - which (rather unsurprisingly) makes the copy / rename detection code more complex. In this case, git will consider the possibility that a file that "appears unchanged" (has the same name in both trees) can be copied or renamed into another file that "appears new" (exists in the second tree, but not in the first). This is not enabled by default because the fully generic version is extremely computationally intensive.


Unfortunately, there is no way to control the diff options used when calculating diff sets for git merge

. The merge command sets some defaults (-M50%, etc.) and does a few differences and doesn't let you install --find-copies-harder

. So even if it works for leadership git diff

, it won't resolve your merge conflict.



Note that when you do a merge, 2 git only computes two sets of differences: from merge base 3 to current HEAD

and that from merge to merged commit (git merges a commit, not a branch: the fact that the result merges this branch when this the commit is the end of the branch, is sort of a "deliberate" match "). So you can do the rename as one commit and symlink as the second, but to get to git merge

" see "the rename you have to do two separate git merge

s as well. but for this you will need to do gitdiff

machines are smarter, so that he can at least understand that changing the file type gives a much better chance of finding the rename if it "finds copies / renames a little harder".

(Note that adding this to the diff engine will fix both problems - git diff without seeing the rename, and git merging without seeing the rename right away.)


1 Trees here mean complete file trees, not git tree

objects.

2 This is particularly the case for a two-parent merge. Octopus merges are handled differently. I have not burst into the internal fusion of octopuses and I can no longer say anything about it.

3 . The merge base depends on two (or more) commits to be merged, and to complicate matters with the default ( recursive

) strategy , if there are multiple merge candidates, git computes a "virtual merge base" which does not necessarily match any actual commit. The details are not something I can explain correctly here: I know the general idea, but not the specifics inside git, and in any case it is rarely important and not directly related to your problem. There's a pretty good example here if you want to read more, although this example uses some pretty clear-cut terminology.

+5


source







All Articles