Invalid Numeric Comparisons in awk
I recently found a script on the site:
bash, find closest next value, forward and backward
it's relatively old and takes 50 repetitions to comment, which I don't have. I'm trying to get it to work, and don't know awk syntax very well, but I'm trying. In my test file, I am using:
-3.793 0.9804E+00 0.3000E+02
-3.560 0.1924E-01 0.3000E+02
-3.327 0.3051E-04 0.3000E+02
-3.093 0.3567E-08 0.3000E+02
-2.860 0.3765E-06 0.3000E+02
-2.627 0.1119E-02 0.3000E+02
-2.394 0.2520E+00 0.3006E+02
and here is the script:
{
if ($fld > tgt) {
del = $fld - tgt
if ( (del < minGtDel) || (++gtHit == 1) ) {
minGtDel = del
minGtVal = $fld
}
}
else if ($fld < tgt) {
del = tgt - $fld
if ( (del < minLtDel) || (++ltHit == 1) ) {
minLtDel = del
minLtVal = $fld
}
}
else {
minEqVal = $fld
}
}
END {
print (minGtVal == "" ? "NaN" : minGtVal)
print (minLtVal == "" ? "NaN" : minLtVal)
}
which when run like this:
$ awk -v fld=1 -v tgt=-3 -f awk DOSCAR
produces:
-2.860
NaN
there is a lower bound though and I'm not really sure how to fix this. There were no negative numbers in the original post, so they didn't have this issue. Any help is appreciated.
source to share
You have a blank line in your input file that invokes the classic awk catch.
The main problem is the curious behavior of awk's comparison operators, which don't require you to specify whether you want a numeric or string comparison. (<view> This is why automatic comparison operators are a bad idea. </ opinion>)
In short, awk has three scalar types: numbers, strings, and "numeric strings". Literals in a program are numbers or strings, and arithmetic operators always result in a number, and string concatenation always results in a string. But the values ββyou are comparing - $fld
and tgt
- are potentially "numeric strings" because they come from user input.
A "numeric string" is a string obtained from user input that "looks like" a number. In general, the definition of "looks like a number" is not surprising, except for one detail: the empty string does not count.
If you compare two numbers, the comparison is numeric. If you compare two strings, the comparison is lexicographic. But if one (or both) of the things you are comparing are potentially "numeric string", then the type of comparison depends on whether it is a "numeric string" or not. If it is a "numeric string", it is converted to a number; otherwise, the other value is converted to a string.
Therefore, if it $fld
is an empty string, then comparing it with tgt
will be a string comparison, not a numeric comparison. An empty string is the smallest possible string to compare strings, so it will be smaller. However, when you then compute $fld - tgt
, then $fld
will be cast to a number, in which case the empty string turns to 0.
So there are two possibilities. The simplest one is to make it $fld
change to a number; this will at least agree:
{
val = $fld + 0
if (val > tgt) {
del = val - tgt
if ( (del < minGtDel) || (++gtHit == 1) ) {
minGtDel = del
minGtVal = val
}
}
else if (val < tgt) {
del = tgt - val
if ( (del < minLtDel) || (++ltHit == 1) ) {
minLtDel = del
minLtVal = val
}
}
else {
minEqVal = val
}
}
END {
print (minGtVal == "" ? "NaN" : minGtVal)
print (minLtVal == "" ? "NaN" : minLtVal)
}
Another way is to exclude lines where the specified field cannot be a number. A simple and generally reliable test for numeric values ββis to compare the value to itself as reduced to a number:
(val = $fld + 0) == $fld {
if (val > tgt) {
del = val - tgt
if ( (del < minGtDel) || (++gtHit == 1) ) {
minGtDel = del
minGtVal = val
}
}
else if (val < tgt) {
del = tgt - val
if ( (del < minLtDel) || (++ltHit == 1) ) {
minLtDel = del
minLtVal = val
}
}
else {
minEqVal = val
}
}
END {
print (minGtVal == "" ? "NaN" : minGtVal)
print (minLtVal == "" ? "NaN" : minLtVal)
}
source to share