C #: defining how a string looks like this pattern; possible regex

Consider a line that looks like this:

RR1 S5 C92

This is the address of a rural road for delivering mail to a country: a rural route, a section, a compartment. Each letter is followed by a number and a space. Usually one to three digits, but you never know how many digits it can be! If the user is lazy, they can enter zero, one, or more spaces.

Question: What regular expression would you use to determine if a given string matches this pattern?

Its use would be something like this:

``````string ruralPattern; //a regex pattern here
bool isRural = Regex.Match(someString, ruralPattern);
```

```

Update: Thanks for your suggestions! Performance and usage will reside in a static method in an assembly called from the web service. Lines checked for this pattern will be no more than 50 characters. The method will be called approximately once every 5 seconds. Any suggestions for saving it? We really appreciate it!

This should work:

``````^[Rr][Rr]\d+ *[Ss]\d+ *[Cc]\d+\$
```

```

or as per another comment

``````^[Rr][Rr][0-9]+ *[Ss][0-9]+ *[Cc][0-9]+\$
```

```

What does all this mean:

• ^ - beginning of line
• [Rr] - next char must be R or r
• [Rr] - next char must be R or r
• \ d + or [0-9] + - the next part must be 1 or more digits
• (space) * - allow 0 or more spaces
• [Ss] - next char must be S or s
• \ d + or [0-9] + - the next part must be 1 or more digits
• (space) * - allow 0 or more spaces
• [Cc] - the next char must be C or c
• \ d + or [0-9] + - the next part must be 1 or more digits
• \$ - end of line

There might be a more elegant solution out there, but it's pretty easy to read.

Edit: Updated to include some comments input

``````someString = someString.Trim(); // eliminate leading/trailing whitespace
bool isRural = Regex.Match(
someString,
@"^rr\d+\s*s\d+\s*c\d+\$",
RegexOptions.IgnoreCase);
```

```

This removes upper and lower case correspondence within the template and uses `\s`

to allow any (unarmed) space character (e.g. tabs). If you only want spaces, then you `'\s'`

should change to `' '`

.

Let's clarify the following assumptions:

• The line has three sections.
• Section 1 always starts with an upper or lower case RR and ends with one or more decimal digits.
• Section 2 always starts with an upper or lower case S and ends with one or more decimal digits.
• Section 3 always starts with a C above or below and ends with one or more decimal digits.

It would be enough for simplicity.

``````[Rr][Rr][0-9]+[ ]+[Ss][0-9]+[ ]+[Cc][0-9]+
```

```
• [Rr] means exactly one alphabet R, upper or lower case.
• [0-9] means exactly one decimal digit.
• [0-9] + means at least one or more decimal digits.
• [] + means at least one or more spaces.

However, to be useful, generally when you use a regular expression, we also find separate sections to use the matching feature to help us assign the values โโof the individual sections to their respective / individual variables.

Thus, the following regex is more useful.

``````([Rr][Rr][0-9]+)[ ]+([Ss][0-9]+)[ ]+([Cc][0-9]+)
```

```

Let's apply this regex to string

``````string inputstr = "Holy Cow RR12 S53 C21";
```

```

This is what your regex helper will tell you:

``````start pos=9, end pos=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = S53
Group(3) = C21
```

```

There are three pairs of elliptical / parentheses. Each pair is a section of a line that the regex compiler calls the group.

The regex compiler will cause a match

• the whole matched string as group 0
• rural route as group 1
• as group 2 and
• as group 3.

Naturally, groups 1, 2, and 3 will meet with matches if and only if group 0 has a match.

Hence your algorithm will use this with the following pseudocode

``````string postalstr, rroute, site, compart;
if (match.group(0)!=null)
{
int start = match.start(0);
int end = match.end(0);
postalstr = inputstr.substring(start, end);

start = match.start(1);
end = match.end(1);
rroute = inputstr.substring(start, end);

start = match.start(2);
end = match.end(2);
site = inputstr.substring(start, end);

start = match.start(3);
end = match.end(3);
compart = inputstr.substring(start, end);
}
```

```

Also, you may want to enter a database table with columns: rr, site, compartments, but you only need the numbers entered without the alphabets "rr", "s", or "c". This will be a nested grouping regex to use.

``````([Rr][Rr]([0-9]+))[ ]+([Ss]([0-9]+))[ ]+([Cc]([0-9]+))
```

```

And the responder will tell you the following when a match is found for group 0:

``````start=9, end=21
Group(0) = Rr12 S53 C21
Group(1) = Rr12
Group(2) = 12
Group(3) = S53
Group(4) = 53
Group(5) = C21
Group(6) = 21
```

```
FYI: If you are going to use this RegEx to test a lot of data, your best bet is to tell .NET to precompile it - it will compile to IL and give a performance boost, not just interpreting the RegEx pattern every time. Specify it as a static member on which class your method contains, for example:

``````private static Regex re = new Regex("pattern", RegexOptions.Compiled | RegexOptions.IgnoreCase);
```

```

... and a method for checking whether a string matches a pattern ...

``````bool matchesString = re.IsMatch("string");
```

```

Good luck.

All Articles