JavaScript is split into char but ignores double escaped characters
I am trying to do something similar to this, but cannot get it to work.
How to split comma separated string while ignoring missing commas?
I tried to figure it out but didn't seem to work.
I would like to split the string into :
, but not escaped \\:
(my escape char is a double slash)
: dtet:du\\,eduh ei\\:di:e,j
expected output:["dtet"] ["du\\,eduh ei\\:di][e,j"]
regex link: https://regex101.com/r/12j6er/1/
source to share
See the function called splitOnNonEscapedDelimeter()
, which takes a split string
and delimeter
to split, which in this case is :
. The usage is inside the function onChange()
.
Note that you must exit from
delimeter
which you are switching tosplitOnNonEscapedDelimeter()
so that it is not interpreted as a special character in the regex .
function nonEscapedDelimeter(delimeter) {
return new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
}
function nonEscapedDelimeterAtEnd(delimeter) {
return new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
}
function splitOnNonEscapedDelimeter(string, delimeter) {
const reMatch = nonEscapedDelimeter(delimeter)
const reReplace = nonEscapedDelimeterAtEnd(delimeter)
return string.match(reMatch).slice(0, -1).map(section => {
return section.replace(reReplace, '$1')
})
}
function onChange() {
console.log(splitOnNonEscapedDelimeter(i.value, ':'))
}
i.addEventListener('change', onChange)
onChange()
<textarea id=i>dtet:du\\,eduh ei\\:di:e,j</textarea>
Requirements
This solution makes use of ES2015 features String.raw()
and template literals for convenience, although they are not required. See the related documentation above to understand how they work and use a polyfill like this one if your target framework doesn't include support for those features.
Explanation
new RegExp(String.raw`[^${delimeter}]*?(?:\\\\${delimeter}[^${delimeter}]*?)*(?:${delimeter}|$)`, 'g')
The function nonEscapedDelimeter()
creates a regex that does almost what is required, except for a few quirks that need to be adjusted with some post-processing.
string.match(reMatch)
The regexp used in String#match()
splits a string into sections that either end with unescaped delimeter
or to the end of the line. This also has the side effect of matching a section of width 0 at the end of the line, so we need
.slice(0, -1)
to remove this match in post-processing.
new RegExp(String.raw`([^\\].|.[^\\]|^.?)${delimeter}$`)
...
.map(section => {
return section.replace(reReplace, '')
})
Since every section currently ends delimeter
for the last one (which ends at the end of the line), except, we have to .map()
match the array and remove the unescaped one delimeter
(thus why is nonEscapedDelimeterAtEnd()
so complex) if there is one.
source to share
This is a bit of a long approach. but works for you. JavaScript regular expressions do not support lookbehinds. But you can do it just by changing the original string and splitting the string with lookahead. And then the reverse array and all the strings in it and you get your result.
function reverse(s) {
var o = '';
for (var i = s.length - 1; i >= 0; i--)
o += s[i];
return o;
}
var str = "dtet:du\\,eduh ei\\:di:e,j";
var res = reverse(str);
var result = res.split(/:(?!\\)/g);
result = result.reverse();
for(var i = 0; i < result.length; i++){
result[i] = reverse(result[i]);
}
console.log(result);
source to share
I could suggest two solutions. One is based on customizing the contents of an array and which uses a regular expression.
Solution 1:
Approach: Divide by :
, then drag the elements into a new array and glue them back so they don't get split.
function splitcolon(input) {
var inparts = input.split(":");
var outparts = [];
var splitwaslegit = true;
inparts.forEach(function(part) {
if (splitwaslegit) {
outparts.push(part);
} else { // the split was not justified, so reverse it by gluing this part to the previous one
outparts[outparts.length-1] += ":" + part;
}
// the next split was legit if this part doesn't end on \\
splitwaslegit = (part.substring(part.length-2) !== "\\\\");
});
return outparts;
}
Tested in chrome:
splitcolon("dtet:du\\\\,eduh ei\\\\:di:e,j")
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
Note:
You can of course also use a loop for
or underscore each
instead offorEach
Solution 2:
Approach: if there is a char or string that you can be 100% sure it won't be in the input, you can use that char / string as a temporary delimiter inserted by a regex like:
var tmpdelim = "\x00"; // must *never* ever occur in input string
var input = "dtet:du\\\\,eduh ei\\\\:di:e,j";
input.replace(/(^.?|[^\\].|.[^\\]):/g, "$1" + tmpdelim).split(tmpdelim);
Result:
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
Explanation of regex /(^.?|[^\\].|.[^\\]):/g
:
/
- regex start (
- first group match ^.?
- we are at the beginning of the input or in any of the char from it (it takes 2 to call) |
- or [^\\].
- any char that is not \
, followed by any other char |
- or .[^\\]
- any char, followed by followed by anything other than \
)
- match group stop 1
:
- match group (which cannot be \\
) must be followed :
/
- end of g
regex - regex global modifier (matches all encounters, not just the first)
which we will replace with $1 + tmpdelim
so that everything in match group 1 was followed by our special separator (instead of :
), which we can then use to separate.
Bonus solution
Manjo Verma answers as a one-liner:
input.split("").reverse().join("").split(/:(?!\\\\)/).reverse().map(x => x.split("").reverse().join(""));
Result:
(3) ["dtet", "du\\,eduh ei\\:di", "e,j"]
source to share