Do regex implementations really need the split () function?

Is there any application for regex operation split()

that cannot be done with a single operation match()

(or search()

, findall()

etc.)?

For example, instead of doing

subject.split('[|]')

      

you can get the same result when calling

subject.findall('[^|]*') 

      

And in almost all regex machines (except .NET and JGSoft) split()

can't do things like "split on |

unless escaped \|

" because you need to have unlimited repetition inside lookbehind.

So, instead of doing something completely unreadable (nested lookbehind!)

splitArray = Regex.Split(subjectString, @"(?<=(?<!\\)(?:\\\\)*)\|");

      

you can just do (even in JavaScript which doesn't support any kind)

result = subject.match(/(?:\\.|[^|])*/g);

      

This made me wonder: is there anything I can do in split()

that cannot be achieved with one match()

/ findall()

? I'm willing to bet you not, but I'm probably missing something.

(I define "regex" in the modern, irregular sense, that is, using everything modern regex has, like backreferences and lookups).

+3


source to share


1 answer


The purpose of regular expressions is to describe the syntax of a language. These regular expressions can then be used to search for strings that match the syntax of these languages. Here it is.

What you actually do with matches depends on your needs. If you are looking for all matches, repeat the search process and collect the matches. If you want to split the string, repeat the search process and split the input string at the position of the matches where they were found.

Basically, regex libraries can only do one thing: search for a match. Everything else is just extensions.



A good example for this is JavaScript, where available RegExp.prototype.exec

, which actually does the matching. Any other method that takes a regular expression (eg, RegExp.prototype.test

, String.prototype.match

, String.prototype.search

), just use the basic functions RegExp.prototype.exec

:

// pseudo-implementations
RegExp.prototype.test = function(str) {
    return RegExp(this).exec(str);
};
String.prototype.match = function(pattern) {
    return RegExp(pattern).exec(this);
};
String.prototype.search = function(pattern) {
    return RegExp(pattern).exec(this).index;
};

      

+2


source







All Articles