Extract lines corresponding to the minimum value in the last column
I need help extracting all lines from a file with the minimum number in the last column i.e. 7 in this case.
Example file:
File-1.txt
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Here I want to extract all rows with 7, which is the smallest value (minimum value) in the last column, and save the output to another File-2.txt file, only extracting the values enclosed in [] as shown below.
File-2.txt
102, 80, 112, 109, 23, 125, 111 102, 81, 112, 109, 23, 125, 111 102, 112, 37, 109, 23, 125, 111 102, 112, 37, 56, 23, 125, 111
I could use awk to get the smallest value as "7" from the last column using the following code:
awk 'BEGIN{getline;min=max=$NF}
NF{
max=(max>$NF)?max:$NF
min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt
and print only the values in square brackets [] using awk code like below:
awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt
but I'm stuck with assigning the smallest value obtained from the first awk script, ie 7 in this case, to extract the corresponding numbers enclosed in [] as shown in file-2.txt.
Any help with this issue would be appreciated.
source to share
@Asha: @try:
awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file
Will add a description shortly.
EDIT: Below is the description too.
awk '{
Q=$NF; ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,""); ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF=""; ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q} ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it value to each line last field and keeping the minimum value in it as per requirement.
END{print A[MIN]} ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file ##### Mentioning the Input_file here.
source to share
read once (ex: for streaming / channel info) with minimal memory usage
awk -F'[][]' '
# init counter
NR == 1 { m = $3 + 1 }
# add or replace content into the buffer if counter is lower or equal
$3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }
# at the end, print buffer
END { print b }
' YourFile
source to share
Reading the same file twice, instead of using array
practically a bit slower as we read the file 2 times, but zero memory overhead.
awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
$NF==min{ print $2 }' file file
Explanation
awk -F'[][]' 'FNR==NR{ # This block we read file
# and will find whats minimum
if(min > $NF || min==""){
min=$NF # NF gives no of fields, assign the value of $NF to variable min
}
next
}
$NF==min{ # Here we read file 2nd time, if last field value is equal to minimum
print $2
}' file file
Input
$ cat file
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Output
$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
$NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
source to share
$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'
-
-F'[][]'
set FS to regexp[][]
which means "or [or]", meaning your input string will be split into 3 fields. -
-vmin=99999
set the min variable to99999
. This variable will store the minimum value of the last field -
$NF <= min {min = $NF; print $2}
If the current last field is less than or equal, then it is stored in a variablemin
, then updatemin
and output what we need.
source to share