Extracting substring from variable using bash script

Question

Extracting substring from variable using bash script

I have a bash variable with a value something like this:

10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

There are no spaces within the values. This value can be very long or very short. There are couples here like 65:3.0

. I know the meaning of the number from the first part of a pair, say 65

. I want to extract a number 3.0

or a pair 65:3.0

. I don't know the position (offset) 65

.

I would be grateful for a bash - script that can do this kind of extraction. Thank.

+3

bash awk

user3282777 Dec 20. '14 at 8:19

source to share

7 replies

user000001 · Answer 1 · 2014-12-20T08:26:24+0000

Awk is probably the simplest approach:

awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0

Or get a pair:

$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0

gniourf_gniourf · Answer 2 · 2014-12-20T08:39:33+0000

Here's a clean Bash solution:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

while read -r -d, i; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done <<< "$var,"

You can use break

after echo "$i"

if only one is 65:...

in var

, or if you only want the first.

To get the value of 3.0

: echo "${i#*:}"

.

Another (pure Bash) approach, without parsing the string. I am assuming that you are only looking for the first 65

in the line and that it is present in the line:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

value=${var#*,65:}
value=${value%%,*}
echo "$value"

It will be very slow for long lines!

Same as above, but outputs all values that match 65

(or none if not present):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
    tmpvar=${tmpvar#*,65:}
    echo "${tmpvar%%,*}"
done

Same thing, it will be slow for long lines!

The fastest I can get in pure Bash is my original answer (and that's ok with 10,000 fields):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done

Actually no, the fastest I can get in pure Bash, with this regex:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"

Checking this vs awk

,

where 65:3.0

at the end:

printf -v var '%s:3.0,' {100..11000}
var+=65:42.0
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"

shows 0m0.020s (average), whereas:

time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }

shows 0m0.008s (medium level too).

where 65:3.0

not at the end:

printf -v var '%s:3.0,' {1..10000}
time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"

shows 0m0.020s (average) and with early release:

time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"

shows 0m0.010s (average), whereas:

time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }

shows 0m0.002s (rough average).

Archemar · Answer 3 · 2014-12-20T08:23:57+0000

try

echo $var | tr , '\n' | awk '/65/'

Where

tr , '\n'

include a comma on a new line
awk '/65/'

select line with 65

or

echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}'

Where

-F:

use: as separator
$1 == 65

select line with 65 as first field
{ print $2}

print the second field

Jotne · Answer 4 · 2014-12-20T08:26:05+0000

Here gnu awk

awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0

αғsιη · Answer 5 · 2014-12-20T09:42:04+0000

With grep:

grep -o '\b65\b[^,]*' <<<"$var"
65:3.0

Or

grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0

Parameter

\K

ignores everything before the matching pattern and ignores the pattern. This is Perl compatibility ( -P

) for the command grep

.

David C. Rankin · Answer 6 · 2014-12-20T08:34:04+0000

Using sed

sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"

output:

65:3.0

There are two different ways of protection 65:3.0

: first-in-line or last-in-line . Above, commas

are added to surround a variable that makes it appear independently. Below, the Gnu extension is \?

used to indicate the occurrence of zero or one .

sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var

Both descriptors 65:3.0

regardless of where they appear on the line.

SMA · Answer 7 · 2014-12-20T08:27:08+0000

Try egrep like below:

echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' |

Extracting substring from variable using bash script

More articles: