How do I properly set up targets when preprocessing C / C ++ code with sed or awk?

I want to link my code directly, pre-processing the source files with sed / awk. I cannot use other methods like debugger trace or gcc option -finstrument-functions

. In this latter case, the addresses are being reinstalled in some way I cannot control and I miss the correspondence with the symbol table. Other methods presented here here (ptrace, etrace, callgraph, etc.) Or works here for a simple example, but not in my real project.

The problem is that when handling large open source projects, the standards for writing functions differ not only between C and C ++ files, but often in the same file. {

can be at the end of the argument list, or on another line, structures or assignments can use a leading one {

, which makes a simple parsing function false.

So the solution provided in the links above, which inserts the macro at the beginning of the function definition, does not work at all, and it is impractical to manually adjust with Kilos of Code (KLOC).

sed 's/^{/{ENTRY/'

      

So how do you set up robust function definitions in C / C ++ code with regular expressions that can be used in sed or awk? Perhaps using the gcc precompiler piece of code? I'm looking for something, possibly offline.

+3


source to share


2 answers


sed

or awk

(or any purely textual approach) are the wrong tools to reliably execute C code (and you should probably work on a preprocessed form).

You want to work on some form of AST compiler . Of course, internal representations within a compiler are compiler-specific (and perhaps even its version).

If you are using recent GCC , you can customize it using MELT (and add your gaps to GCC) or with your own C ++ plugin.



If you are using Clang / LLVM you can also customize it by adding your skips.

The Coccinelle tool might also be relevant.

Any such approach requires a significant amount of work (probably weeks), since you will need to understand in detail the internal representations of the particular compiler you are using. And C is hard enough to make it non-trivial.

+7


source


You cannot do this with any tool that does not understand the specific version of C your code is written in (like C ++ or ANSI-C or C-99). As a trivial example - what does "//" mean in a "C-function"? It's okay if it's a literal pair of slanting worms inside the line, and if it's outside the line it might be the start of a comment if it's C ++ or C-99 code, but it's not the start of a comment in ANSI-C. What if he's inside /* ... // ... */

? If what looks like a function definition follows "//", is it really a function?



You don't say what you want to do ("preprocess the code" doesn't tell us anything), but you should look into what I wrote in Remove multi-line comments to use gcc to mark up your comment code and then a C beautifier like "indent "or" cb "to format your code consistently and / or look at" cscope "or" ccalls "if you're just looking for a tool to enumerate functions.

+1


source







All Articles