Extract lines corresponding to the minimum value in the last column

I need help extracting all lines from a file with the minimum number in the last column i.e. 7 in this case.

Example file:

File-1.txt

VALID_PATH :  [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 56, 23, 125, 111]  7
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 111]  8
VALID_PATH :  [102, 80, 112, 109, 23, 125, 110, 111]    8
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH :  [102, 80, 127, 88, 112, 109, 23, 125, 111]    9
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 110, 111]    9
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH :  [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH :  [102, 80, 127, 6, 112, 37, 56, 23, 125, 111]  10
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 110, 111]    10

      

Here I want to extract all rows with 7, which is the smallest value (minimum value) in the last column, and save the output to another File-2.txt file, only extracting the values ​​enclosed in [] as shown below.

File-2.txt

102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111

      

I could use awk to get the smallest value as "7" from the last column using the following code:

awk 'BEGIN{getline;min=max=$NF}
NF{
    max=(max>$NF)?max:$NF
    min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt

      

and print only the values ​​in square brackets [] using awk code like below:

awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt

      

but I'm stuck with assigning the smallest value obtained from the first awk script, ie 7 in this case, to extract the corresponding numbers enclosed in [] as shown in file-2.txt.

Any help with this issue would be appreciated.

+3


source to share


5 answers


@Asha: @try:

awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file

      

Will add a description shortly.



EDIT: Below is the description too.

awk '{
Q=$NF;                    ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,"");       ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF="";                   ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q}  ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it value to each line last field and keeping the minimum value in it as per requirement.
END{print A[MIN]}         ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file              ##### Mentioning the Input_file here.

      

+1


source


Using sort

as a helper to get clean code:



$ sort -t\] -nk 2 your_file |awk '$NF!=L && L{exit}{L=$NF;print $2}' FS='[][]'
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111

      

+2


source


read once (ex: for streaming / channel info) with minimal memory usage

awk -F'[][]' '
   # init counter
   NR == 1 { m = $3 + 1 }

   # add or replace content into the buffer if counter is lower or equal
   $3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }

   # at the end, print buffer
   END { print b }
   ' YourFile

      

+2


source


Reading the same file twice, instead of using array

practically a bit slower as we read the file 2 times, but zero memory overhead.

awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
     $NF==min{ print $2 }' file file

      

Explanation

awk -F'[][]' 'FNR==NR{                           # This block we read file 
                                                 # and will find whats minimum                                                 
             if(min > $NF || min==""){ 
                min=$NF                          # NF gives no of fields, assign the value of $NF to variable min 
             } 
              next 
     }
     $NF==min{                                   # Here we read file 2nd time, if last field value is equal to minimum
              print $2 
     }' file file

      

Input

$ cat file
VALID_PATH :  [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH :  [102, 112, 37, 56, 23, 125, 111]  7
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 111]  8
VALID_PATH :  [102, 80, 112, 109, 23, 125, 110, 111]    8
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH :  [102, 80, 127, 88, 112, 109, 23, 125, 111]    9
VALID_PATH :  [102, 80, 112, 37, 109, 23, 125, 110, 111]    9
VALID_PATH :  [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH :  [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH :  [102, 80, 127, 6, 112, 37, 56, 23, 125, 111]  10
VALID_PATH :  [102, 80, 127, 6, 112, 109, 23, 125, 110, 111]    10

      

Output

$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
       $NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111

      

+1


source


$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'

      

  • -F'[][]'

    set FS to regexp [][]

    which means "or [or]", meaning your input string will be split into 3 fields.
  • -vmin=99999

    set the min variable to 99999

    . This variable will store the minimum value of the last field
  • $NF <= min {min = $NF; print $2}

    If the current last field is less than or equal, then it is stored in a variable min

    , then update min

    and output what we need.
+1


source







All Articles