Regular expression lookaround

I don't think this is only possible with regular expressions, but I'm not an expert, so I thought it might be worth asking.

I am trying to do a massive search and replace on C # code using .NET regex. What I want to do is find the line of code where a specific function is called for a variable of type DateTime. eg:

axRecord.set_Field("CreatedDate", m_createdDate);

      

and I would know that this DateTime variable b / c earlier in this code file would be the line:

DateTime m_createdDate;

      

but it seems that I cannot use the named group in the negative lookbehind like:

(?<=DateTime \k<1>.+?)axRecord.set_[^ ]+ (?<1>[^ )]+)

      

and if I try to match all the text between the variable declaration and the function call like this:

DateTime (?<1>[^;]+).+?axRecord.set.+?\k<1>

      

it will find the first match - first based on the first declared variable, but then it won't be able to find any other matches because the code is laid out like this:

DateTime m_First;
DateTime m_Second;
...
axRecord.set_Field("something", m_First);
axRecord.set_Field("somethingElse", m_Second);

      

and the first match includes the second variable declaration.

Is there a good way to do this with just regular expressions, or do I need to resort to scripting in my logic?

+1


source to share


5 answers


Try the following:

@"(?s)set_Field\(""[^""]*"",\s*(?<vname>\w+)(?<=\bDateTime\s+\k<vname>\b.+)"

      



By executing the lookbehind function, you force the regexp to look for method calls in a specific order: the order in which the variables are declared. What you want to do is first match the likely method invocation method and then use lookbehind to check the type of the variable.

I just got closer to the part that corresponds to the method call. As others have said, any regex you use must be tailored to suit your code; there is no general solution.

0


source


Take a look at my answer to this question Get method content from C # file

It provides links to pages that show you how to use the built-in .net parser to do this simply and reliably (ie without asking "what is similar to the usage I am looking for", but parsing the code correctly with VS Code Analysis Tools ).



I know this is not a RegEx answer, but I don't think RegEx is the answer.

+5


source


It will be difficult to do this with a single regex. However, this can be done if you are considering handling low-state rows.

Note. I can't tell you exactly what you're trying to match with the axRecord line, so you'll probably need to adjust this regex accordingly.

void Process(List<string> lines) {
  var comp = StringComparer.Ordinal;
  var map = new Hashset<string>comp);
  var declRegex = new Regex("^\s(?<type>\w+)\s*(?<name>m_\w+)\s*";);
  var toReplaceRegex = new Regex("^\s*axRecord.set_(?<toReplace>.*(?<name>m_\w+).*)");

  for( var i = 0; i < lines.Length; i++) {
    var line = lines[i];
    var match = declRegex.Match(line);
    if ( match.Success ) {
      if ( comp.Equals(match.Groups["type"], "DateTime") ) {
        map.Add(comp.Groups["name"]);
      } else {
        map.Remove(comp.Groups["name"]);
      }
      continue;
    }

    match = toReplaceRegex.Match(line);
    if ( match.Success && map.Contains(match.Groups["name"]) ) {
      // Add your replace logic here
    }
}

      

+1


source


This cannot be done with regular expressions. First, the C # grammar is not regular; but more importantly, you are talking about parsing expressions that are not lexically related. For this kind of thing, you need a complete semantic analysis. This means lexer, parser, name binding and finally type checker. Once you have the annotated AST, you can find the field you want and just read the type.

I guess this is a lot more work than you want, though with about half the full blown C # compiler in mind.

0


source


This is strange. I managed to create a regex that will find it, but it only matches one.

(?<=private datetime (?<1>\b\w+\b).+?)set_field[^;]+?\k<1>

      

so if I can't use the named group in the lookbehind, I can at least set the named group in the lookbehind and use it in the match. But then it looks like when it only matches a function call (which is exactly what I wanted) the caret position is moved to that line and so it can't find any new matches as it passed its declarations. or maybe I don't understand how the engine works.

I'm guessing I'm looking for a regex variant that hints that it will look into matches for more matches. who come to think about it, it looks like it would be necessary for basic regex html parsing. you'll find a tag and then a closing tag, and the entire page is wrapped in that match, so you won't find any other tags unless you recursively apply the template to each match.

Does anyone know anything about this or am I talking crazy?

0


source







All Articles