Regex to find 3 or more cases

Is it possible to match 3 or more cases of any repeated substring, more than 1 character, within a string using a regex?

For example:

The string ABCD ABC AB ACD DE

has a substring AB

3 or more times.

The letter D also appears 3 times, but it's only one character, so it doesn't have to match.

I think this is outside the scope of regex, but if there is any genius, I put the question exactly the same.

What I am trying to do is reduce the size of the encrypted query strings by replacing the repeating character sequences with one character, but the string must have a "legend" attached (or appended) to it to indicate what has been reduced.

For example:

The string ABCD ABC AB ACD DE

will be changed to something like .CD .C . ACD DE

, and something needs to be added to it in order to know what. is now AB.

Something like .AB-

where -

acts as a terminator. So replacing less than 3 will actually increase the size of the entire string.

I am using Classic ASP, but I am quite happy with the C # solution. Then I can use that in the dll and use it that way.

+3


source to share


1 answer


There are many limitations here due to the greediness of the regex and the consumption of characters as it walks along a string. Namely, you will only be able to match one result that saturates the greed of the regular expression the most.

(..+).*\1.*\1

      

The example above shows at least two characters and will match if the same substring exists twice later. This works for



ABCD ABC AB ACD DE
ABCD ABCD AB ABCD DE

      

However, in the latter case, only "ABCD" is matched; AB doesn't.

If that doesn't work for you, I would recommend using a solution other than regex, such as splitting words with spaces and checking each one. It will probably be more efficient than regular expression.

+4


source







All Articles