RegEx: remove all (including) last underscore and file extension
I want to remove ISO codes and underscore from all elements in an array while keeping the file extension. The ISO code always appears before the file extension.
Source array:
var SrcFiles = [
"File_with_nr1_EN.txt",
"File_has_NR_3_ZHHK.txt",
"File_yy_nr_2_DE.pdf"
];
I want it to look like this:
var SrcFiles = [
"File_with_nr1.txt",
"File_has_NR_3.txt",
"File_yy_nr_2.pdf"
];
How should I do it? Possibly with a regular expression, but how? I found a good regex just to match the end of the file, but I don't know how it can help me.
const re = /(?:\.([^.]+))?$/;
source to share
You can write everything up to the last _
, match _
and 1 + uppercase letters, and then capture point and subsequent 1+ characters other than point to the end of the line:
/^(.*)_[A-Z]+(\.[^.]+)$/
and replace with $1$2
where $1
is a backreference to group 1 and $2
refers to a value in group 2.
[A-Z]+
can be increased to [A-Z]{2,}
(since ISO codes are usually at least 2 characters long), and if there might be a hyphen in there, use _[A-Z-]{2,}
.
See JS demo:
var SrcFiles = [
"File_with_nr1_EN.txt",
"File_has_NR_3_ZHHK.txt",
"File_yy_nr_2_DE.pdf"
];
var res = SrcFiles.map(x => x.replace(/^(.*)_[A-Z]+(\.[^.]+)$/, '$1$2'));
// ES5
//var res = SrcFiles.map(function(x) {return x.replace(/^(.*)_[A-Z]+(\.[^.]+)$/, '$1$2'); });
console.log(res);
source to share
Look _
for, followed by anything that is not _
( [^_]
), and then: a .
, followed by anything that is not _
at the end of ( $
) . The bold part should be written as $1
.
var SrcFiles = [
"File_with_nr1_EN.txt",
"File_has_NR_3_ZHHK.txt",
"File_yy_nr_2_DE.pdf"
];
var re = /_[^_]+(\.[^_]+)$/;
console.log(SrcFiles.map(f => f.replace(re, "$1")));
source to share
Regex:
("^_)*_[A-Z]+(\.[^.]+",?)
Replacement:
$1$2
Checkout https://regex101.com/r/h0gukN/2
I joined the part in front of the ISO line and the rest of it together. hope this helps: P
source to share
Those pattern is not that complicated. Take a look:
1_EN.txt
Stands for the next template: \d+_\S+.
where _\S+
you want to delete. Then you can achieve this with the following replacement:
s/(\d+)_\S+\./$1./g
The first group, followed by a "period", is what you want to keep in the text. The 'g' constraint means you want to keep the replacement for all other matched patterns along the text.
Result details:
-
1_EN. is replaced by 1.
1.1. Group: Moderators
-
3_ZHHK. replaced by 3.
2.1. Group: Moderators
-
2_DE. replaced by 2.
3.1. Group: Moderators
source to share