Invalid Numeric Comparisons in awk

I recently found a script on the site:

bash, find closest next value, forward and backward

it's relatively old and takes 50 repetitions to comment, which I don't have. I'm trying to get it to work, and don't know awk syntax very well, but I'm trying. In my test file, I am using:

 -3.793  0.9804E+00  0.3000E+02
 -3.560  0.1924E-01  0.3000E+02
 -3.327  0.3051E-04  0.3000E+02
 -3.093  0.3567E-08  0.3000E+02
 -2.860  0.3765E-06  0.3000E+02
 -2.627  0.1119E-02  0.3000E+02
 -2.394  0.2520E+00  0.3006E+02

      

and here is the script:

{
if ($fld > tgt) {
    del = $fld - tgt
    if ( (del < minGtDel) || (++gtHit == 1) ) {
        minGtDel = del
        minGtVal = $fld
    }
}
else if ($fld < tgt) {
    del = tgt - $fld
    if ( (del < minLtDel) || (++ltHit == 1) ) {
        minLtDel = del
        minLtVal = $fld
    }
}
else {
    minEqVal = $fld
}
}
END {
print (minGtVal == "" ? "NaN" : minGtVal)
print (minLtVal == "" ? "NaN" : minLtVal)
}

      

which when run like this:

$ awk -v fld=1 -v tgt=-3 -f awk DOSCAR

      

produces:

 -2.860
 NaN

      

there is a lower bound though and I'm not really sure how to fix this. There were no negative numbers in the original post, so they didn't have this issue. Any help is appreciated.

+4


source to share


1 answer


You have a blank line in your input file that invokes the classic awk catch.

The main problem is the curious behavior of awk's comparison operators, which don't require you to specify whether you want a numeric or string comparison. (<view> This is why automatic comparison operators are a bad idea. </ opinion>)

In short, awk has three scalar types: numbers, strings, and "numeric strings". Literals in a program are numbers or strings, and arithmetic operators always result in a number, and string concatenation always results in a string. But the values ​​you are comparing - $fld

and tgt

- are potentially "numeric strings" because they come from user input.

A "numeric string" is a string obtained from user input that "looks like" a number. In general, the definition of "looks like a number" is not surprising, except for one detail: the empty string does not count.

If you compare two numbers, the comparison is numeric. If you compare two strings, the comparison is lexicographic. But if one (or both) of the things you are comparing are potentially "numeric string", then the type of comparison depends on whether it is a "numeric string" or not. If it is a "numeric string", it is converted to a number; otherwise, the other value is converted to a string.



Therefore, if it $fld

is an empty string, then comparing it with tgt

will be a string comparison, not a numeric comparison. An empty string is the smallest possible string to compare strings, so it will be smaller. However, when you then compute $fld - tgt

, then $fld

will be cast to a number, in which case the empty string turns to 0.

So there are two possibilities. The simplest one is to make it $fld

change to a number; this will at least agree:

{
    val = $fld + 0
    if (val > tgt) {
        del = val - tgt
        if ( (del < minGtDel) || (++gtHit == 1) ) {
            minGtDel = del
            minGtVal = val
        }
    }
    else if (val < tgt) {
        del = tgt - val
        if ( (del < minLtDel) || (++ltHit == 1) ) {
            minLtDel = del
            minLtVal = val
        }
    }
    else {
        minEqVal = val
    }  
}
END {
    print (minGtVal == "" ? "NaN" : minGtVal)
    print (minLtVal == "" ? "NaN" : minLtVal)
}

      

Another way is to exclude lines where the specified field cannot be a number. A simple and generally reliable test for numeric values ​​is to compare the value to itself as reduced to a number:

(val = $fld + 0) == $fld {
    if (val > tgt) {
        del = val - tgt
        if ( (del < minGtDel) || (++gtHit == 1) ) {
            minGtDel = del
            minGtVal = val
        }
    }
    else if (val < tgt) {
        del = tgt - val
        if ( (del < minLtDel) || (++ltHit == 1) ) {
            minLtDel = del
            minLtVal = val
        }
    }
    else {
        minEqVal = val
    }  
}
END {
    print (minGtVal == "" ? "NaN" : minGtVal)
    print (minLtVal == "" ? "NaN" : minLtVal)
}

      

+6


source







All Articles