C #: defining how a string looks like this pattern; possible regex

Consider a line that looks like this:

RR1 S5 C92

This is the address of a rural road for delivering mail to a country: a rural route, a section, a compartment. Each letter is followed by a number and a space. Usually one to three digits, but you never know how many digits it can be! If the user is lazy, they can enter zero, one, or more spaces.

Question: What regular expression would you use to determine if a given string matches this pattern?

Its use would be something like this:

string ruralPattern; //a regex pattern here
bool isRural = Regex.Match(someString, ruralPattern);

      

Update: Thanks for your suggestions! Performance and usage will reside in a static method in an assembly called from the web service. Lines checked for this pattern will be no more than 50 characters. The method will be called approximately once every 5 seconds. Any suggestions for saving it? We really appreciate it!

+2


source to share


4 answers


This should work:

^[Rr][Rr]\d+ *[Ss]\d+ *[Cc]\d+$

      

or as per another comment

^[Rr][Rr][0-9]+ *[Ss][0-9]+ *[Cc][0-9]+$

      



What does all this mean:

  • ^ - beginning of line
  • [Rr] - next char must be R or r
  • [Rr] - next char must be R or r
  • \ d + or [0-9] + - the next part must be 1 or more digits
  • (space) * - allow 0 or more spaces
  • [Ss] - next char must be S or s
  • \ d + or [0-9] + - the next part must be 1 or more digits
  • (space) * - allow 0 or more spaces
  • [Cc] - the next char must be C or c
  • \ d + or [0-9] + - the next part must be 1 or more digits
  • $ - end of line

There might be a more elegant solution out there, but it's pretty easy to read.

Edit: Updated to include some comments input

+9


source


What about...

someString = someString.Trim(); // eliminate leading/trailing whitespace
bool isRural = Regex.Match(
   someString,
   @"^rr\d+\s*s\d+\s*c\d+$",
   RegexOptions.IgnoreCase);

      



This removes upper and lower case correspondence within the template and uses \s

to allow any (unarmed) space character (e.g. tabs). If you only want spaces, then you '\s'

should change to ' '

.

+3


source


Let's clarify the following assumptions:

  • The line has three sections.
  • Section 1 always starts with an upper or lower case RR and ends with one or more decimal digits.
  • Section 2 always starts with an upper or lower case S and ends with one or more decimal digits.
  • Section 3 always starts with a C above or below and ends with one or more decimal digits.

It would be enough for simplicity.

[Rr][Rr][0-9]+[ ]+[Ss][0-9]+[ ]+[Cc][0-9]+

      

  • [Rr] means exactly one alphabet R, upper or lower case.
  • [0-9] means exactly one decimal digit.
  • [0-9] + means at least one or more decimal digits.
  • [] + means at least one or more spaces.

However, to be useful, generally when you use a regular expression, we also find separate sections to use the matching feature to help us assign the values โ€‹โ€‹of the individual sections to their respective / individual variables.

Thus, the following regex is more useful.

([Rr][Rr][0-9]+)[ ]+([Ss][0-9]+)[ ]+([Cc][0-9]+)

      

Let's apply this regex to string

string inputstr = "Holy Cow RR12 S53 C21";

      

This is what your regex helper will tell you:

start pos=9, end pos=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = S53
Group(3) = C21

      

There are three pairs of elliptical / parentheses. Each pair is a section of a line that the regex compiler calls the group.

The regex compiler will cause a match

  • the whole matched string as group 0
  • rural route as group 1
  • as group 2 and
  • as group 3.

Naturally, groups 1, 2, and 3 will meet with matches if and only if group 0 has a match.

Hence your algorithm will use this with the following pseudocode

string postalstr, rroute, site, compart;
if (match.group(0)!=null)
{
  int start = match.start(0);
  int end = match.end(0);
  postalstr = inputstr.substring(start, end);

  start = match.start(1);
  end = match.end(1);
  rroute = inputstr.substring(start, end);

  start = match.start(2);
  end = match.end(2);
  site = inputstr.substring(start, end);

  start = match.start(3);
  end = match.end(3);
  compart = inputstr.substring(start, end);
}

      

Also, you may want to enter a database table with columns: rr, site, compartments, but you only need the numbers entered without the alphabets "rr", "s", or "c". This will be a nested grouping regex to use.

([Rr][Rr]([0-9]+))[ ]+([Ss]([0-9]+))[ ]+([Cc]([0-9]+))

      

And the responder will tell you the following when a match is found for group 0:

start=9, end=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = 12
Group(3) = S53
Group(4) = 53
Group(5) = C21
Group(6) = 21

      

+1


source


FYI: If you are going to use this RegEx to test a lot of data, your best bet is to tell .NET to precompile it - it will compile to IL and give a performance boost, not just interpreting the RegEx pattern every time. Specify it as a static member on which class your method contains, for example:

private static Regex re = new Regex("pattern", RegexOptions.Compiled | RegexOptions.IgnoreCase);

      

... and a method for checking whether a string matches a pattern ...

bool matchesString = re.IsMatch("string");

      

Good luck.

0


source







All Articles