RegEx: remove all (including) last underscore and file extension

I want to remove ISO codes and underscore from all elements in an array while keeping the file extension. The ISO code always appears before the file extension.

Source array:

var SrcFiles = [
"File_with_nr1_EN.txt",
"File_has_NR_3_ZHHK.txt",
"File_yy_nr_2_DE.pdf"
];

      

I want it to look like this:

var SrcFiles = [
"File_with_nr1.txt",
"File_has_NR_3.txt",
"File_yy_nr_2.pdf"
];

      

How should I do it? Possibly with a regular expression, but how? I found a good regex just to match the end of the file, but I don't know how it can help me.

const re = /(?:\.([^.]+))?$/;

      

+3


source to share


4 answers


You can write everything up to the last _

, match _

and 1 + uppercase letters, and then capture point and subsequent 1+ characters other than point to the end of the line:

/^(.*)_[A-Z]+(\.[^.]+)$/

      

and replace with $1$2

where $1

is a backreference to group 1 and $2

refers to a value in group 2.

[A-Z]+

can be increased to [A-Z]{2,}

(since ISO codes are usually at least 2 characters long), and if there might be a hyphen in there, use _[A-Z-]{2,}

.



See JS demo:

var SrcFiles = [
"File_with_nr1_EN.txt",
"File_has_NR_3_ZHHK.txt",
"File_yy_nr_2_DE.pdf"
];

var res = SrcFiles.map(x => x.replace(/^(.*)_[A-Z]+(\.[^.]+)$/, '$1$2'));
// ES5
//var res = SrcFiles.map(function(x) {return x.replace(/^(.*)_[A-Z]+(\.[^.]+)$/, '$1$2'); });
console.log(res);
      

Run codeHide result


+1


source


Look _

for, followed by anything that is not _

( [^_]

), and then: a .

, followed by anything that is not _

at the end of ( $

)
. The bold part should be written as $1

.



var SrcFiles = [
  "File_with_nr1_EN.txt",
  "File_has_NR_3_ZHHK.txt",
  "File_yy_nr_2_DE.pdf"
];

var re = /_[^_]+(\.[^_]+)$/;

console.log(SrcFiles.map(f => f.replace(re, "$1")));
      

Run codeHide result


REGEX101 DEMO !

+3


source


Regex:

("^_)*_[A-Z]+(\.[^.]+",?)

      

Replacement:

$1$2

      

Checkout https://regex101.com/r/h0gukN/2

I joined the part in front of the ISO line and the rest of it together. hope this helps: P

+1


source


Those pattern is not that complicated. Take a look:

1_EN.txt

      

Stands for the next template: \d+_\S+.

where _\S+

you want to delete. Then you can achieve this with the following replacement:

s/(\d+)_\S+\./$1./g

      

The first group, followed by a "period", is what you want to keep in the text. The 'g' constraint means you want to keep the replacement for all other matched patterns along the text.

Result details:

  • 1_EN. is replaced by 1.

    1.1. Group: Moderators

  • 3_ZHHK. replaced by 3.

    2.1. Group: Moderators

  • 2_DE. replaced by 2.

    3.1. Group: Moderators

+1


source







All Articles