Find source code in binary

Let's say I have a project that I released under the GPL with the sources available to everyone. Later I find a very similar product, but as a closed source, distributed binary - just by someone else.

Is there a good way to know that they are using my source code in their product?

If the solution is to reverse engineer the binary somehow, is there any way to automate it?

EDIT: Clarification. Bug hunting is one option, but not final, especially if the project is a library and the binary, for example, has added its own GUI. I am interested in a situation where it is not clear that the code has been removed.

0


source to share


8 answers


Look for Moles Software. This method attempts to establish links between software based on binary code or dynamic behavior. Christian Kollberg is an expert on software watermarks from which birthmarks were derived. It's still in a research country.



+2


source


Bugs.

If a closed release release shares most of the bugs with your project, it is probably "picked up".



You can also try decompiling your own binary with the decompiled closed source binary ... although that probably won't be reliable.

+5


source


Obviously, if the suspect binary is not stripped

, you can simply search for any characters that have the same name as your code.

+3


source


There's a lot of work in there on decompilation and reverse engineering binaries. World expert, probably Cristina Cifuentes . She did a lot with decompilation. It would also be interesting to write to Alex Aiken and ask if he has a tool for Software Similarity Measure that can be adapted to binaries.

+2


source


The obvious method is to search for strings. run the unix strings tool and see if the binary contains any literal string from your code. mainly messages such as error messages and text in messages.

+2


source


You can try to parse both programs and compare the assembly, but if they used a different compiler, then the program may differ slightly. There are some free disassemblers available, or the debugger can go through the build as well.

Also, there isn't really an easy way to find out about it.

+1


source


The surest way I can think of is similar to the word "Esquivalience" in the oxford dictionary.
Just add some binary array with unique content somewhere in your code and don't forget to do some simple use of it so the linker doesn't optimize it. You should probably confuse it partially so that it is not obvious to the casual reader that this is redundant. Then open the compiled binary with a hex editor and find it.

0


source


Why aren't you looking at the symbol table using nm?

$ nm a.out
...

      

0


source







All Articles