Bash: how to extract numbers preceded by _ and then
I have the following format for filenames: filename_1234.svg
How can I get numbers preceded by an underscore character and then a period. Between characters.svg
I tried:
width=${fileName//[^0-9]/}
but if filename also contains a number, it will return all numbers in the filename, eg.
file6name_1234.svg
I found solutions for two underscores (and split them into an array), but I'm looking for a way to check both the underscore and the period.
source to share
You can use simple parameter expansion with substring removal to simply trim right up to and including '.'
and then trim left and up including '_'
, leaving the number you want, for example
$ width=filename_1234.svg; val="${width%.*}"; val="${val##*_}"; echo $val
1234
note: #
aligns from left to the first occurrence, and ##
aligns to the last. %
and %%
work the same on the right side.
Clarifications:
-
width=filename_1234.svg
-width
contains your filename -
val="${width%.*}"
-val
containsfilename_1234
-
val="${val##*_}"
- finallyval
contains1234
Of course, there is no need to use a temporary value, for example val
if your intent is width
to keep the width. I just used temp to protect the original content from being modified width
. If you want to get the final number in width
, just replace val
with width
everywhere above and act directly on width
.
note 2: using shell capabilities like parameter expansion prevents a separate subshell from being created and creates a separate process that occurs when using a utility like sed
, grep
or awk
(or anything else that is not part of the shell for that matter).
source to share
Try using the following code:
filename="filename_6_1234.svg"
if [[ "$filename" =~ ^(.*)_([^.]*)\..*$ ]];
then
echo "${BASH_REMATCH[0]}" #will display 'filename_6_1234.svg'
echo "${BASH_REMATCH[1]}" #will display 'filename_6'
echo "${BASH_REMATCH[2]}" #will display '1234'
fi
Explanation:
-
=~
: bash operator for regular expression comparison -
^(.*)_([^.])\..*$
: we are looking for any character, followed by an underscore, followed by any character, followed by a period and an extension. We create 2 capture groups, one for before the last underscore, one for after -
BASH_REMATCH
: an array containing the captured groups
source to share
There is a cut-away solution:
name="file6name_1234.svg"
num=$(echo "$name" | cut -d '_' -f 2 | cut -d '.' -f 1)
echo "$num"
-d
is intended to indicate a separator.
-f
refers to the desired field.
I don't know anything about performance, but it's simple to understand and easy to maintain.
source to share