Extracting substring from variable using bash script

I have a bash variable with a value something like this:

10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

      

There are no spaces within the values. This value can be very long or very short. There are couples here like 65:3.0

. I know the meaning of the number from the first part of a pair, say 65

. I want to extract a number 3.0

or a pair 65:3.0

. I don't know the position (offset) 65

.

I would be grateful for a bash - script that can do this kind of extraction. Thank.

+3


source to share


7 replies


Awk is probably the simplest approach:

awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0

      



Or get a pair:

$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0

      

+5


source


Here's a clean Bash solution:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

while read -r -d, i; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done <<< "$var,"

      

You can use break

after echo "$i"

if only one is 65:...

in var

, or if you only want the first.

To get the value of 3.0

: echo "${i#*:}"

.


Another (pure Bash) approach, without parsing the string. I am assuming that you are only looking for the first 65

in the line and that it is present in the line:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

value=${var#*,65:}
value=${value%%,*}
echo "$value"

      

It will be very slow for long lines!


Same as above, but outputs all values ​​that match 65

(or none if not present):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
    tmpvar=${tmpvar#*,65:}
    echo "${tmpvar%%,*}"
done

      



Same thing, it will be slow for long lines!


The fastest I can get in pure Bash is my original answer (and that's ok with 10,000 fields):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done

      


Actually no, the fastest I can get in pure Bash, with this regex:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"

      


Checking this vs awk

,

  • where 65:3.0

    at the end:

    printf -v var '%s:3.0,' {100..11000}
    var+=65:42.0
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    
          

    shows 0m0.020s (average), whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    
          

    shows 0m0.008s (medium level too).

  • where 65:3.0

    not at the end:

    printf -v var '%s:3.0,' {1..10000}
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    
          

    shows 0m0.020s (average) and with early release:

    time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
    
          

    shows 0m0.010s (average), whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    
          

    shows 0m0.002s (rough average).

+4


source


try

echo $var | tr , '\n' | awk '/65/' 

      

Where

  • tr , '\n'

    include a comma on a new line
  • awk '/65/'

    select line with 65

or

echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}' 

      

Where

  • -F:

    use: as separator
  • $1 == 65

    select line with 65 as first field
  • { print $2}

    print the second field
+3


source


Here gnu awk

awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0

      

+3


source


With grep:

grep -o '\b65\b[^,]*' <<<"$var"
65:3.0

      

Or

grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0

      

Parameter

\K

ignores everything before the matching pattern and ignores the pattern. This is Perl compatibility ( -P

) for the command grep

.

+3


source


Using sed

sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"

      

output:

65:3.0

      

There are two different ways of protection 65:3.0

: first-in-line or last-in-line . Above, commas

are added to surround a variable that makes it appear independently. Below, the Gnu extension is \?

used to indicate the occurrence of zero or one .

sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var

      

Both descriptors 65:3.0

regardless of where they appear on the line.

+2


source


Try egrep like below:

echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' | 

      

+1


source







All Articles