JavaScript is split into char but ignores double escaped characters

I am trying to do something similar to this, but cannot get it to work.

How to split comma separated string while ignoring missing commas?

I tried to figure it out but didn't seem to work.

I would like to split the string into :

, but not escaped \\:


(my escape char is a double slash)

: dtet:du\\,eduh ei\\:di:e,j


expected output:["dtet"] ["du\\,eduh ei\\:di][e,j"]

regex link: https://regex101.com/r/12j6er/1/

+3


source to share


3 answers


See the function called splitOnNonEscapedDelimeter()

, which takes a split string

and delimeter

to split, which in this case is :

. The usage is inside the function onChange()

.

Note that you must exit from delimeter

which you are switching to splitOnNonEscapedDelimeter()

so that it is not interpreted as a special character in the regex .

function nonEscapedDelimeter(delimeter) {
  return new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
}

function nonEscapedDelimeterAtEnd(delimeter) {
  return new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
}

function splitOnNonEscapedDelimeter(string, delimeter) {
  const reMatch = nonEscapedDelimeter(delimeter)
  const reReplace = nonEscapedDelimeterAtEnd(delimeter)

  return string.match(reMatch).slice(0, -1).map(section => {
    return section.replace(reReplace, '$1')
  })
}

function onChange() {
  console.log(splitOnNonEscapedDelimeter(i.value, ':'))
}

i.addEventListener('change', onChange)

onChange()
      

<textarea id=i>dtet:du\\,eduh ei\\:di:e,j</textarea>
      

Run codeHide result


Requirements

This solution makes use of ES2015 features String.raw()

and template literals for convenience, although they are not required. See the related documentation above to understand how they work and use a polyfill like this one if your target framework doesn't include support for those features.

Explanation

new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')

      



The function nonEscapedDelimeter()

creates a regex that does almost what is required, except for a few quirks that need to be adjusted with some post-processing.

string.match(reMatch)

      

The regexp used in String#match()

splits a string into sections that either end with unescaped delimeter

or to the end of the line. This also has the side effect of matching a section of width 0 at the end of the line, so we need

.slice(0, -1)

      

to remove this match in post-processing.

new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)

...

.map(section => {
  return section.replace(reReplace, '')
})

      

Since every section currently ends delimeter

for the last one (which ends at the end of the line), except, we have to .map()

match the array and remove the unescaped one delimeter

(thus why is nonEscapedDelimeterAtEnd()

so complex) if there is one.

+1


source


This is a bit of a long approach. but works for you. JavaScript regular expressions do not support lookbehinds. But you can do it just by changing the original string and splitting the string with lookahead. And then the reverse array and all the strings in it and you get your result.



function reverse(s) {
  var o = '';
  for (var i = s.length - 1; i >= 0; i--)
    o += s[i];
  return o;
}


var str = "dtet:du\\,eduh ei\\:di:e,j";
var res = reverse(str);
var result  = res.split(/:(?!\\)/g);
result  = result.reverse();
for(var i = 0; i < result.length; i++){
	result[i] = reverse(result[i]);
}

console.log(result);
      

Run codeHide result


+2


source


I could suggest two solutions. One is based on customizing the contents of an array and which uses a regular expression.

Solution 1:

Approach: Divide by :

, then drag the elements into a new array and glue them back so they don't get split.

function splitcolon(input) {
    var inparts = input.split(":");
    var outparts = [];
    var splitwaslegit = true;
    inparts.forEach(function(part) {
        if (splitwaslegit) {
            outparts.push(part);
        } else { // the split was not justified, so reverse it by gluing this part to the previous one
            outparts[outparts.length-1] += ":" + part;
        }
        // the next split was legit if this part doesn't end on \\
        splitwaslegit = (part.substring(part.length-2) !== "\\\\");
    });
    return outparts;
}

      

Tested in chrome:

splitcolon("dtet:du\\\\,eduh ei\\\\:di:e,j")
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

      

Note:
You can of course also use a loop for

or underscore each

instead offorEach

Solution 2:

Approach: if there is a char or string that you can be 100% sure it won't be in the input, you can use that char / string as a temporary delimiter inserted by a regex like:

var tmpdelim = "\x00"; // must *never* ever occur in input string

var input = "dtet:du\\\\,eduh ei\\\\:di:e,j";
input.replace(/(^.?|[^\\].|.[^\\]):/g, "$1" + tmpdelim).split(tmpdelim);

      

Result:

(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

      

Explanation of regex /(^.?|[^\\].|.[^\\]):/g

:

/

- regex start
(

- first group match

^.?

- we are at the beginning of the input or in any of the char from it (it takes 2 to call)
|

- or
[^\\].

- any char that is not \

, followed by any other char
|

- or
.[^\\]

- any char, followed by followed by anything other than \


)

- match group stop 1 :

- match group (which cannot be \\

) must be followed :


/

- end of
g

regex - regex global modifier (matches all encounters, not just the first)

which we will replace with $1 + tmpdelim

so that everything in match group 1 was followed by our special separator (instead of :

), which we can then use to separate.

Bonus solution

Manjo Verma answers as a one-liner:

input.split("").reverse().join("").split(/:(?!\\\\)/).reverse().map(x => x.split("").reverse().join(""));

      

Result:

(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]

      

+2


source







All Articles