Help with regex tag removal

I have lines of the form: "[user: fred] [priority: 3] Lorem ipsum dolor sit amet." where the area enclosed in square brackets is a tag (in the format [key: value]). I need to remove a specific tag if it has a key with the following extension method:

public static void RemoveTagWithKey(this string message, string tagKey) {
    if (message.ContainsTagWithKey(tagKey)) {
        var regex = new Regex(@"\[" + tagKey + @":[^\]]");
        message = regex.Replace(message , string.Empty);
    }
}
public static bool ContainsTagWithKey(this string message, string tagKey) {
    return message.Contains(string.Format("[{0}:", tagKey));
}

      

Only the tag with the specified key should be removed from the string. My regex doesn't work because it's stupid. I need help to write it correctly. Alternatively, implementation without regex is encouraged.

+2


source to share


4 answers


If you want to do it without Regex, it is not difficult. You're already looking for a specific tag key, so you can just search for "[" + tagKey, then search from there to close "]" and delete everything in between those offsets. Something like...

int posStart = message.IndexOf("[" + tagKey + ":");
if(posStart >= 0)
{
    int posEnd = message.IndexOf("]", posStart);
    if(posEnd > posStart)
    {
        message = message.Remove(posStart, posEnd - posStart);
    }
}

      



Is it better than Regex solution? Since you are only looking for a specific key, I think this is probably on simplicity. I love Regexes, but they are not always the clearest answer.

Edit: Another reason the IndexOf () solution might be better is that it means there is only one rule for finding the start of a tag, whereas in the source code Contains()

that looks for something like "[tag: "and then uses a regular expression that uses a slightly different expression for replacement / removal. In theory, you can have text that meets one criterion but not another.

+1


source


I know there are many more feature-rich tools out there, but I love the simplicity and cleanliness of Code Regx Tester (also YART: Another Regex Tester). Shows groups and snapshots in a tree view, quite fast, very small, open source. It also generates code in C ++, VB, and C #, and can automatically escape or exclude regular expressions for those languages. I put it in the VS tools folder (C: \ Program Files \ Microsoft Visual Studio 9.0 \ Common7 \ Tools) and set a menu item for it in the Tools menu with Tools> External Tools so that I can quickly launch it from VS.

Regular expressions are sometimes very difficult to write, and I know it really helps to be able to validate the regular expressions and see the results as you go.



alt text
(source: dotnet2themax.com )

Another really popular (but not free) option is Regex Buddy .

+3


source


Try this instead:

new Regex(@"\[" + tagKey + @":[^\]+]");

      

The only thing I changed is add +

to the template [^\]

, which means you are matching one or more characters that are not backslashes.

+1


source


I think this is the regex you are looking for:

string regex = @"\[" + tag + @":[^\]+]\]";

      

Plus, you don't need to do a separate check to see if there are tags of this type. Just replace the regular expression; if there is no match, the original string is returned.

public static string RemoveTagWithKey(string message, string tagKey) {
    string regex = @"\[" + tag + @":[^\]+]\]";
    return Regex.Replace(message, regex, string.Empty);
}

      

You seem to be writing an extension method, but I wrote it as a static utility method to keep things simple.

+1


source







All Articles