Extracting substring from variable using bash script
I have a bash variable with a value something like this:
10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
There are no spaces within the values. This value can be very long or very short. There are couples here like 65:3.0
. I know the meaning of the number from the first part of a pair, say 65
. I want to extract a number 3.0
or a pair 65:3.0
. I don't know the position (offset) 65
.
I would be grateful for a bash - script that can do this kind of extraction. Thank.
source to share
Here's a clean Bash solution:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
while read -r -d, i; do
[[ $i = 65:* ]] || continue
echo "$i"
done <<< "$var,"
You can use break
after echo "$i"
if only one is 65:...
in var
, or if you only want the first.
To get the value of 3.0
: echo "${i#*:}"
.
Another (pure Bash) approach, without parsing the string. I am assuming that you are only looking for the first 65
in the line and that it is present in the line:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
value=${var#*,65:}
value=${value%%,*}
echo "$value"
It will be very slow for long lines!
Same as above, but outputs all values that match 65
(or none if not present):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
tmpvar=${tmpvar#*,65:}
echo "${tmpvar%%,*}"
done
Same thing, it will be slow for long lines!
The fastest I can get in pure Bash is my original answer (and that's ok with 10,000 fields):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
[[ $i = 65:* ]] || continue
echo "$i"
done
Actually no, the fastest I can get in pure Bash, with this regex:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"
Checking this vs awk
,
-
where
65:3.0
at the end:printf -v var '%s:3.0,' {100..11000} var+=65:42.0 time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (average), whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.008s (medium level too).
-
where
65:3.0
not at the end:printf -v var '%s:3.0,' {1..10000} time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (average) and with early release:
time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
shows 0m0.010s (average), whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.002s (rough average).
source to share
Using sed
sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"
output:
65:3.0
There are two different ways of protection 65:3.0
: first-in-line or last-in-line . Above, commas
are added to surround a variable that makes it appear independently. Below, the Gnu extension is \?
used to indicate the occurrence of zero or one .
sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var
Both descriptors 65:3.0
regardless of where they appear on the line.
source to share