Conditionally change C expressions in the C source file

I have a C file in which we move the logging infrastructure. So,

if ( logging_level >= LEVEL_FINE )
  printf("Value at %p is %d\n", p, i);

      

becomes

do_log2(LEVEL_FINE, "Value at %p is %d\n", _ptr(p), _num(i));

      

do_log2 means a log with two arguments.

For this I need a C parsing and modification framework.

What tool can I use to accomplish this most easily?

Note. printf can also appear in the file as:

if ( logging_level >= LEVEL_FINE )
{
  printf("Value at %p is %d\n", 
  p, 
  i);
}

      

(indented and in block). Therefore it will be difficult to do it from simple text analysis in perl.

EDIT: This is my last perl code that does what I want

#!/usr/bin/perl -W 
$source=<<'END';
include etc
if ( logging_level >= LEVEL_DEBUG )
{
  printf("1:Value at %p is %d\n",
  p(1),
  i(2));
}
hello();
if ( logging_level >= LEVEL_FINE )
{
  printf("2:Value  is %d\n", i);
  printf("3:Value  is %d\n", i);
}
if ( logging_level >= LEVEL_FINE )
{
  printf("2:Value  is %d\n"
     "and other info", i);
}

other();
if(logging_level>=LEVEL_INFO){printf("4:Value at %p is %d %d\n",p(x),i,j);}
if(logging_level>=LEVEL_FINE) printf("5:Just sayin\"\n");
printf("not logging statement\n").
END

while( $source =~ m/\G(.*?\n)\s* if \s* \( \s* logging_level \s* >= \s* ([A-Z_0-9]+) \s* \) \s*(\{?)/sgxc )
{
    my $othercode = $1;
    my $loglevel=$2;
    my $inblock = $3;
    print("$othercode");

    while($source =~ m/\G\s*printf \( ([^;]*) \) \;/sgxc )
    {
    my $insideprint = $1;
    unless ($insideprint =~ /((\"([^\"\\]|\\.)*\")(\s*(\"([^\"\\]|\\.)*\"))*)/g) #fixing stackoverflow quote problem "
    {
        die "First arg not string literal";
    }
    my $formatstr = $1;
    my $remain = substr($insideprint, pos($insideprint));
    $remain =~ tr/\n \t//d;
    my @args = split(",", $remain);
    shift @args;

    my $numargs = @args;

    print "do_log${numargs}($loglevel, $formatstr";
    for (my $i=0; $i < $numargs; $i++)
    {
        unless ($formatstr =~ /%([a-z]+)/g)
        {
        die "Not enough format for args : $formatstr, args = ", join(",", @args), "\n";
        }
        my $lastchar = substr($1, length($1) -1);
        my $wrapper = "";
        if ($lastchar eq "u" || $lastchar eq  "d")
        { $wrapper = "_numeric";}
        elsif($lastchar eq "p"){ $wrapper = "_ptr";}
        elsif($lastchar eq "s"){ $wrapper = "_str";}
        else { die "Unknown format char %$lastchar in $formatstr"; }

        print ", ${wrapper}($args[$i])";
    }
    print ");";
    last unless ($inblock);
    }
# eat trailing }
    if ($inblock)
    {
    if ($source =~ m/\G \s* \} /sgxc)
    {
    }
    else
    {
    }
    }
}
#whatever is left 
print substr($source, pos($source));

      

output:

include etc
do_log2(LEVEL_DEBUG, "1:Value at %p is %d\n", _ptr(p(1)), _numeric(i(2)));
hello();
do_log1(LEVEL_FINE, "2:Value  is %d\n", _numeric(i));
do_log1(LEVEL_FINE, "3:Value  is %d\n", _numeric(i));
do_log1(LEVEL_FINE, "2:Value  is %d\n"
         "and other info", _numeric(i));

other();
do_log3(LEVEL_INFO, "4:Value at %p is %d %d\n", _ptr(p(x)), _numeric(i), _numeric(j));
do_log0(LEVEL_FINE, "5:Just sayin\"\n");
printf("not logging statement\n").

      

Woohoo! Now let's apply to the actual source code.

+2


source to share


6 answers


You need a Program Transformation System that can parse C and perform transformations in essence of the code (for example, to the appropriate compiler data structures), not text (so it doesn't get confused with text layout, etc.). (Programming transformation is a generalization of refactoring.)

The DMS Software Reengineering Toolkit is such a program conversion system, and it has a C parser that is applied to a very large C system.

With DMS, your change can be written as:

domain C; -- work with C language syntax

rule change_logging(exp: p, exp: i, s: literal_string, c:literal_integer): stmt -> stmt
  "if ( logging_level >= \l )
      printf(\s, \p, \i);"
  ->  
  "do_log2(\l, \s, _ptr(\p), _num(\i));".

      

\ k - metaphors ("in C, you should specify inside the quotes of the rule!) or meta-variables (\ p \ i \ s) of the corresponding syntax type.



In practice, one of them writes a set of interacting transformation rules to accomplish a more complex task (you probably also have log1 and log3 events).

The pattern is converted, like parsed C code, to equivalent compiler data structures, and then mapped to compiler data structures for C code, so the text formatting does not matter. When a match is found, it is replaced with the right-hand compiler data structures of the rule (2 →). After all the transformations have been applied, the resulting compiler data structures are used to regenerate the modified text by applying the opposite parsing: prettyprinting. Veil, your change is done.

There are some complications with macros and preprocessor directives, but it's even worse if you do it with string hacking techniques, which are often implemented with Perl.

There are also complications associated with reasoning about side effects, reaching definitions, pointer values, etc .; DMS provides support for all of these problems.

+4


source


You don't need to read the arguments:

#include <stdio.h> 
#include <stdarg.h> 

void do_log( int level, char *format, ... ){
  va_list ap;
  va_start( ap, format );
  printf( "level: %i ", level ); vprintf( format, ap ); puts("");
  va_end(ap);
}

int main(){
  do_log( 1, "zero" );
  do_log( 2, "one: %i", 1 );
  do_log( 3, "one: %i two: %i", 1, 2 );
}

      

I would rewrite the code with perl. I don't understand why this is difficult.

EDIT: I wrote some perl code to rewrite parts of the logging code:



#!/usr/bin/perl -W 

$source=<<'END';
if ( logging_level >= LEVEL_FINE ).
{
  printf("1:Value at %p is %d\n",.
  p(1),
  i(2));
}

if(logging_level>=LEVEL_FINE){printf("2:Value at %p is %d\n",p(x),i,j);}
END

$res = '';
while( $source =~ /\G(.*?)if\s*\(\s*logging_level\s*>=\s*([A-Z_]+)\s*\)\s*{\s*printf\s*(\(((?:[^()]+|(?3))+)\))\s*;\s*}/sg ){
  $lastpos = pos($source); $res .= $1; $l=$2; $p=$4; $p =~ s/[\r\n\s]+//g;
  $c = $p =~ tr/,/,/;
  $res .= "do_log$c($l,$p);";
}
print $res, substr($source,$lastpos);

      

Result:

do_log2(LEVEL_FINE,"1:Valueat%pis%d\n",p(1),i(2));

do_log3(LEVEL_FINE,"2:Valueat%pis%d\n",p(x),i,j);

      

I add a simple argument counting the code. Hope to help.

+3


source


Coccinelle's solution would be:

@@
expression p, i;
@@

-if (logging_level> = LEVEL_FINE)
- printf ("Value at% p is% d \ n", p, i);
+ do_log2 (LEVEL_FINE, "Value at% p is% d \ n", _ptr (p), _num (i));

There is no general way to solve the calc_value problem mentioned above, but one could find code that has this problem like this:

@@
expression p, i;
@@

* if (logging_level> = LEVEL_FINE)
   {...
* printf ("Value at% p is% d \ n", p, i);
   ...}

The result will look like a diff, but the minuses in column 0 are for specifying items of interest, not items to remove.

+2


source


Benefits of C99 and __VA_ARGS__

!

How rigid do you have two example layouts? Specifically, have you ever had another activity (like a loop) inside conditions if (logging_level...)

with bindings? Or several operators printf()

under the control of one if

?

If you don't have a lot of creativity in how the debug (ab) code is used, you can do it with an ad hoc Perl script - not pretty, but it's a one-time change (although it will probably work on many files).

Processing embellishments parameters, as in _ptr(p)

and _num(i)

, adds another layer of complexity. You will need to parse the string literal (trusting someone to use nothing but the string literal) to determine what types of arguments should be.

Overall, this is not a trivial exercise, especially if the developers are creative. I would expect to write a script that handles 90% or more of the cases and then handle exceptions if found.

+1


source


You can also check out coccinelle , which is also used by Linux kernel hackers to do large scale auto-scaling code conversion using semantic patching.

Hope it helps

+1


source


How many times is it linked to logging_level

? The process is called refactoring. If changing is trivial, a good regular expression can be used in your favorite editor. But often the code has many variations on the same topic. In this case, they can all be found via logging_level. You can remove them step by step by hiding the value logging_level

to the code (so you get a compiler warning, but it will still work). Or use an editor like source-insight that can show you all the links in one go.

Some examples of variations (which are hard to find with a script):

if ( logging_level >= LEVEL_FINE )
  printf("Value at %p is %d\n", p, i);

if ( logging_level >= LEVEL_FINE ) {
  calculated_value = i*2/3;
  printf("Value at %p is %f\n", p, calculated_value);
}

      

(note the parentheses and computed variables).

For each file with the old construction, you can search:

Search for: if \(\s*logging_level\s*>=\s*(LEVEL_[a-zA-Z]+)

replace with:do_log2(\1,)

You can include printf, but only if your editor supports multi-line templates.

0


source







All Articles