Propagation of NaN by Calculation

Usually NaN (not a number) is propagated through calculations, so I don't have to check NaN every step of the way. This almost always works, but apparently there are exceptions. For example:

>>> nan = float('nan')
>>> pow(nan, 0)
1.0

      

I found the following comment on this:

Extending silent NaNs through arithmetic operations allows errors to be detected at the end of a sequence of operations without extensive testing in between. Note, however, that depending on the language and function, NaNs may be removed in expressions that would give a constant result for all other floating point values, for example. NaN ^ 0, which can be defined as 1, so in a general later test for the flag set INVALID is needed to detect all cases where NaNs are entered.

To satisfy those desiring a stricter interpretation of how a function's authority should operate, the 2008 standard defines two additional function powers; pown (x, n), where the exponent must be an integer, and powr (x, y), which returns NaN when the parameter is NaN or exponentiation is undefined.

Is there a way to check the INVALID flag mentioned above via Python? Alternatively, is there any other approach for detecting cases where NaN does not propagate?

Motivation: I decided to use NaN for missing data. In my application, missing inputs can lead to no result. It works great except for what I've described.

+3


source to share


4 answers


I realize it has been a month since this was asked, but I ran into a similar problem (ie pow(float('nan'), 1)

throws an exception in some Python implementations, like Jython 2.52b2) and I found the answers given weren "What I looking for ".

Using the MissingData type suggested with 6502 seems to be the way to go, but I need a specific example. I tried Ethan Furman's NullType class, but found it didn't work with any arithmetic operations as it doesn't force data types (see below), and I also didn't like that he explicitly named every arithmetic function that was overridden ...

Starting with Ethan example and setup code I found here , I came to the class below. Although the class is heavily commented, you can see that there are actually only a few lines of functional code on it.

The key points are: 1. Use the coerce () function to return two NoData objects for mixed arithmetic operations (for example, NoData + float) and two strings for string-based operations (for example, concat). 2. Use getattr () to return a NoData () object to call all other attributes / accessors 3. Use the call () method to implement all other methods of the NoData () object: by returning a NoData () object

Here are some examples of its use.



>>> nd = NoData()
>>> nd + 5
NoData()
>>> pow(nd, 1)
NoData()
>>> math.pow(NoData(), 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: nb_float should return float object
>>> nd > 5
NoData()
>>> if nd > 5:
...     print "Yes"
... else:
...     print "No"
... 
No
>>> "The answer is " + nd
'The answer is NoData()'
>>> "The answer is %f" % (nd)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: float argument required, not instance
>>> "The answer is %s" % (nd)
'The answer is '
>>> nd.f = 5
>>> nd.f
NoData()
>>> nd.f()
NoData()

      

I noticed that using pow with NoData () calls the ** operator and therefore works with NoData, but using math.pow is not how it first tries to convert the NoData () object to float. I'm happy to use non math pow - hopefully 6502, etc. Used math.pow when they had problems with pow in their comments above.

Another problem that I can't think of a way to solve is the use with the format operator (% f) ... In this case, NoData methods are not called, the operator just fails if you don't provide a float. Anyway, here is the class itself.

class NoData():
"""NoData object - any interaction returns NoData()"""
def __str__(self):
    #I want '' returned as it represents no data in my output (e.g. csv) files
    return ''        

def __unicode__(self):
    return ''

def __repr__(self):
    return 'NoData()'

def __coerce__(self, other_object):
    if isinstance(other_object, str) or isinstance(other_object, unicode):
        #Return string objects when coerced with another string object.
        #This ensures that e.g. concatenation operations produce strings.
        return repr(self), other_object  
    else:
        #Otherwise return two NoData objects - these will then be passed to the appropriate
        #operator method for NoData, which should then return a NoData object
        return self, self

def __nonzero__(self):
    #__nonzero__ is the operation that is called whenever, e.g. "if NoData:" occurs
    #i.e. as all operations involving NoData return NoData, whenever a 
    #NoData object propagates to a test in branch statement.       
    return False        

def __hash__(self):
    #prevent NoData() from being used as a key for a dict or used in a set
    raise TypeError("Unhashable type: " + self.repr())

def __setattr__(self, name, value):
    #This is overridden to prevent any attributes from being created on NoData when e.g. "NoData().f = x" is called
    return None       

def __call__(self, *args, **kwargs):
    #if a NoData object is called (i.e. used as a method), return a NoData object
    return self    

def __getattr__(self,name):
    #For all other attribute accesses or method accesses, return a NoData object.
    #Remember that the NoData object can be called (__call__), so if a method is called, 
    #a NoData object is first returned and then called.  This works for operators,
    #so e.g. NoData() + 5 will:
    # - call NoData().__coerce__, which returns a (NoData, NoData) tuple
    # - call __getattr__, which returns a NoData object
    # - call the returned NoData object with args (self, NoData)
    # - this call (i.e. __call__) returns a NoData object   

    #For attribute accesses NoData will be returned, and that it.

    #print name #(uncomment this line for debugging purposes i.e. to see that attribute was accessed/method was called)
    return self

      

+3


source


If it just pow()

gives you headaches, you can easily override it to bring it back NaN

in any circumstance.

def pow(x, y):
    return x ** y if x == x else float("NaN")

      

If NaN

you can use it as an exhibitor, you should also check it out; this throws ValueError

an exception unless the base is 1 (presumably the theory is that 1 for any cardinality, even one that is not a number, is 1).

(And of course pow()

actually accepts three operands, the third is optional, an omission I'll leave as an exercise ...)



Unfortunately, the operator **

has the same behavior and there is no way to override it for built-in numeric types. An opportunity to catch this is to write a subclass float

that implements __pow__()

and __rpow__()

and uses that class for your values NaN

.

Python does not seem to provide access to any flags set by the computation; even if it did, this is what you would need to check after every single operation.

In fact, upon further examination, I believe that the best solution might be to simply use an instance of a dummy class for the missing values. Python will stifle any operation you try to do on these values ​​by throwing an exception, and you can catch the exception and return the default or whatever. There is no reason to continue with the rest of the calculation if the required value is missing, so the exception should be fine.

+2


source


Why use NaN

one that already has different semantics instead of using your defined instance of the class MissingData

?

Defining operations on instances MissingData

for distribution should be easy ...

+2


source


To answer your question: No, there is no way to check flags with regular floats. However, you can use a Decimal class, which provides much greater control .,. but a little slower.

Another option is to use a class EmptyData

or Null

such as this one:

class NullType(object):
    "Null object -- any interaction returns Null"
    def _null(self, *args, **kwargs):
        return self
    __eq__ = __ne__ = __ge__ = __gt__ = __le__ = __lt__ = _null
    __add__ = __iadd__ = __radd__ = _null
    __sub__ = __isub__ = __rsub__ = _null
    __mul__ = __imul__ = __rmul__ = _null
    __div__ = __idiv__ = __rdiv__ = _null
    __mod__ = __imod__ = __rmod__ = _null
    __pow__ = __ipow__ = __rpow__ = _null
    __and__ = __iand__ = __rand__ = _null
    __xor__ = __ixor__ = __rxor__ = _null
    __or__ = __ior__ = __ror__ = _null
    __divmod__ = __rdivmod__ = _null
    __truediv__ = __itruediv__ = __rtruediv__ = _null
    __floordiv__ = __ifloordiv__ = __rfloordiv__ = _null
    __lshift__ = __ilshift__ = __rlshift__ = _null
    __rshift__ = __irshift__ = __rrshift__ = _null
    __neg__ = __pos__ = __abs__ = __invert__ = _null
    __call__ = __getattr__ = _null

    def __divmod__(self, other):
        return self, self
    __rdivmod__ = __divmod__

    if sys.version_info[:2] >= (2, 6):
        __hash__ = None
    else:
        def __hash__(yo):
            raise TypeError("unhashable type: 'Null'")

    def __new__(cls):
        return cls.null
    def __nonzero__(yo):
        return False
    def __repr__(yo):
        return '<null>'
    def __setattr__(yo, name, value):
        return None
    def __setitem___(yo, index, value):
        return None
    def __str__(yo):
        return ''
NullType.null = object.__new__(NullType)
Null = NullType()

      

You can change the methods __repr__

and __str__

. Also, keep in mind that Null

it cannot be used as a dictionary key and is not stored in the set.

+2


source







All Articles