Extract lines corresponding to the minimum value in the last column

Question

Extract lines corresponding to the minimum value in the last column

I need help extracting all lines from a file with the minimum number in the last column i.e. 7 in this case.

Example file:

File-1.txt

VALID_PATH :  [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 56, 23, 125, 111]  7
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 111]  8
VALID_PATH :  [102, 80, 112, 109, 23, 125, 110, 111]    8
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH :  [102, 80, 127, 88, 112, 109, 23, 125, 111]    9
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 110, 111]    9
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH :  [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH :  [102, 80, 127, 6, 112, 37, 56, 23, 125, 111]  10
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 110, 111]    10

Here I want to extract all rows with 7, which is the smallest value (minimum value) in the last column, and save the output to another File-2.txt file, only extracting the values enclosed in [] as shown below.

File-2.txt

102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111

I could use awk to get the smallest value as "7" from the last column using the following code:

awk 'BEGIN{getline;min=max=$NF}
NF{
    max=(max>$NF)?max:$NF
    min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt

and print only the values in square brackets [] using awk code like below:

awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt

but I'm stuck with assigning the smallest value obtained from the first awk script, ie 7 in this case, to extract the corresponding numbers enclosed in [] as shown in file-2.txt.

Any help with this issue would be appreciated.

+3

awk

Asha 03 Apr 17 at 10:44

source to share

5 answers

Using sort

as a helper to get clean code:

$ sort -t\] -nk 2 your_file |awk '$NF!=L && L{exit}{L=$NF;print $2}' FS='[][]'
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111

+2

klashxx 03 Apr 17 at 11:31

source to share

read once (ex: for streaming / channel info) with minimal memory usage

awk -F'[][]' '
   # init counter
   NR == 1 { m = $3 + 1 }

   # add or replace content into the buffer if counter is lower or equal
   $3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }

   # at the end, print buffer
   END { print b }
   ' YourFile

+2

NeronLeVelu 03 Apr 17 at 11:41

source to share

Reading the same file twice, instead of using array

practically a bit slower as we read the file 2 times, but zero memory overhead.

awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
     $NF==min{ print $2 }' file file

Explanation

awk -F'[][]' 'FNR==NR{                           # This block we read file 
                                                 # and will find whats minimum                                                 
             if(min > $NF || min==""){ 
                min=$NF                          # NF gives no of fields, assign the value of $NF to variable min 
             } 
              next 
     }
     $NF==min{                                   # Here we read file 2nd time, if last field value is equal to minimum
              print $2 
     }' file file

Input

$ cat file
VALID_PATH :  [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 56, 23, 125, 111]  7
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 111]  8
VALID_PATH :  [102, 80, 112, 109, 23, 125, 110, 111]    8
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH :  [102, 80, 127, 88, 112, 109, 23, 125, 111]    9
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 110, 111]    9
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH :  [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH :  [102, 80, 127, 6, 112, 37, 56, 23, 125, 111]  10
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 110, 111]    10

Output

$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
       $NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111

+1

Akshay hegde 03 Apr 17 at 11:07

source to share

$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'

-F'[][]'

set FS to regexp [][]

which means "or [or]", meaning your input string will be split into 3 fields.
-vmin=99999

set the min variable to 99999

. This variable will store the minimum value of the last field
$NF <= min {min = $NF; print $2}

If the current last field is less than or equal, then it is stored in a variable min

, then update min

and output what we need.

+1

komar 03 Apr 17 at 11:13

source to share

RavinderSingh13 · Accepted Answer · 2017-04-03T10:53:48+0000

@Asha: @try:

awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file

Will add a description shortly.

EDIT: Below is the description too.

awk '{
Q=$NF;                    ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,"");       ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF="";                   ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q}  ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it value to each line last field and keeping the minimum value in it as per requirement.
END{print A[MIN]}         ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file              ##### Mentioning the Input_file here.

Extract lines corresponding to the minimum value in the last column

More articles: