Extracting substring from variable using bash script
I have a bash variable with a value something like this:
10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
There are no spaces within the values. This value can be very long or very short. There are couples here like 65:3.0
. I know the meaning of the number from the first part of a pair, say 65
. I want to extract a number 3.0
or a pair 65:3.0
. I don't know the position (offset) 65
.
I would be grateful for a bash - script that can do this kind of extraction. Thank.
Awk is probably the simplest approach:
awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0
Or get a pair:
$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0
Here's a clean Bash solution:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
while read -r -d, i; do
[[ $i = 65:* ]] || continue
echo "$i"
done <<< "$var,"
You can use break
after echo "$i"
if only one is 65:...
in var
, or if you only want the first.
To get the value of 3.0
: echo "${i#*:}"
.
Another (pure Bash) approach, without parsing the string. I am assuming that you are only looking for the first 65
in the line and that it is present in the line:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
value=${var#*,65:}
value=${value%%,*}
echo "$value"
It will be very slow for long lines!
Same as above, but outputs all values that match 65
(or none if not present):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
tmpvar=${tmpvar#*,65:}
echo "${tmpvar%%,*}"
done
Same thing, it will be slow for long lines!
The fastest I can get in pure Bash is my original answer (and that's ok with 10,000 fields):
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
[[ $i = 65:* ]] || continue
echo "$i"
done
Actually no, the fastest I can get in pure Bash, with this regex:
var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0
[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"
Checking this vs awk
,
-
where
65:3.0
at the end:printf -v var '%s:3.0,' {100..11000} var+=65:42.0 time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (average), whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.008s (medium level too).
-
where
65:3.0
not at the end:printf -v var '%s:3.0,' {1..10000} time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
shows 0m0.020s (average) and with early release:
time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
shows 0m0.010s (average), whereas:
time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
shows 0m0.002s (rough average).
try
echo $var | tr , '\n' | awk '/65/'
Where
-
tr , '\n'
include a comma on a new line -
awk '/65/'
select line with 65
or
echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}'
Where
-
-F:
use: as separator -
$1 == 65
select line with 65 as first field -
{ print $2}
print the second field
Here gnu awk
awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0
With grep:
grep -o '\b65\b[^,]*' <<<"$var"
65:3.0
Or
grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0
Parameter \K
ignores everything before the matching pattern and ignores the pattern. This is Perl compatibility ( -P
) for the command grep
.
Using sed
sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"
output:
65:3.0
There are two different ways of protection 65:3.0
: first-in-line or last-in-line . Above, commas
are added to surround a variable that makes it appear independently. Below, the Gnu extension is \?
used to indicate the occurrence of zero or one .
sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var
Both descriptors 65:3.0
regardless of where they appear on the line.
Try egrep like below:
echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |