How do I set up a class with all methods and functions like inline like float, but contains additional data?

I am working with 2 datasets of order ~ 100,000 values. These 2 datasets are just lists. Each item in the list is a small class.

class Datum(object):
    def __init__(self, value, dtype, source, index1=None, index2=None):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2

      

For every base element in one list, there is a corresponding null element in the other list, which is the same dtype, source, index1, and index2, which I use to sort the two datasets that they align. Then I do various jobs with the corresponding data point values ​​that are always floating.

Currently, if I want to determine the relative values ​​of the floats in the same dataset, I am doing something like this.

minimum = min([x.value for x in data])
for datum in data:
    datum.value -= minimum

      

However, it would be nice if my own class inherited from float and could act like this.

minimum = min(data)
data = [x - minimum for x in data]

      

I tried the following.

class Datum(float):                                                                                                                                                                                                                                        
    def __new__(cls, value, dtype, source, index1=None, index2=None):                                                        
        new = float.__new__(cls, value)                                                                            
        new.dtype = dtype                                                                                          
        new.source = source                                                                                        
        new.index1 = index1                                                                                                  
        new.index2 = index2
        return new

      

However, by doing

data = [x - minimum for x in data]

      

removes all additional attributes (dtype, source, index1, index2).

How do I set up a class that works like a float but contains additional data that I create for it?

UPDATE: I do many types of non-subtraction math operations, so rewriting all methods that work with floats would be very troublesome, and to be honest, I'm not sure if I could rewrite them correctly.

+3


source to share


2 answers


I suggest subclassing floats and using a couple of decorators to "grab" the float output from any method (except __new__

, of course) and return an object Datum

instead of an object float

.

First, we write a method decorator (which is not actually used as a decorator below, it is just a function that modifies the output of another function, AKA is a wrapper):

def mydecorator(f,cls):
    #f is the method being modified, cls is its class (in this case, Datum)
    def func_wrapper(*args,**kwargs):
        #*args and **kwargs are all the arguments that were passed to f
        newvalue = f(*args,**kwargs)
        #newvalue now contains the output float would normally produce
        ##Now get cls instance provided as part of args (we need one
        ##if we're going to reattach instance information later):
        try:
            self = args[0]
            ##Now check to make sure new value is an instance of some numerical 
            ##type, but NOT a bool or a cls type (which might lead to recursion)
            ##Including ints so things like modulo and round will work right
            if (isinstance(newvalue,float) or isinstance(newvalue,int)) and not isinstance(newvalue,bool) and type(newvalue) != cls:
                ##If newvalue is a float or int, now we make a new cls instance using the
                ##newvalue for value and using the previous self instance information (arg[0])
                ##for the other fields
                return cls(newvalue,self.dtype,self.source,self.index1,self.index2)
        #IndexError raised if no args provided, AttributeError raised of self isn't a cls instance
        except (IndexError, AttributeError): 
            pass
        ##If newvalue isn't numerical, or we don't have a self, just return what
        ##float would normally return
        return newvalue
    #the function has now been modified and we return the modified version
    #to be used instead of the original version, f
    return func_wrapper

      

The first decorator only applies to the method it is attached to. But we want it to decorate all (in fact, almost all) methods inherited from float

(well, those that appear in floats anyway __dict__

). This second decorator will apply our first decorator to all methods in the float subclass except those listed as exceptions ( see this answer ):

def for_all_methods_in_float(decorator,*exceptions):
    def decorate(cls):
        for attr in float.__dict__:
            if callable(getattr(float, attr)) and not attr in exceptions:
                setattr(cls, attr, decorator(getattr(float, attr),cls))
        return cls
    return decorate

      

We now write the subclass the same way as before, but decorate and exclude __new__

from embellishment (I think we could exclude as well __init__

, but __init__

won't return anything):



@for_all_methods_in_float(mydecorator,'__new__')
class Datum(float):
    def __new__(klass, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        return super(Datum,klass).__new__(klass,value)
    def __init__(self, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2
        super(Datum,self).__init__()

      

Here are our testing procedures; iteration works correctly:

d1 = Datum(1.5)
d2 = Datum(3.2)
d3 = d1+d2
assert d3.source == 'source'
L=[d1,d2,d3]
d4=max(L)
assert d4.source == 'source'
L = [i for i in L]
assert L[0].source == 'source'
assert type(L[0]) == Datum
minimum = min(L)
assert [x - minimum for x in L][0].source == 'source'

      

Notes:

  • I am using Python 3. Not sure if this will affect you.
  • This approach effectively overrides any float method other than exceptions, even those for which the result does not change. There might be side effects for this (subclassing the inline and then overriding all of its methods) eg. performance hit or something else; I really do not know.
  • This will brighten up the nested classes as well.
  • This same approach can also be implemented using a metaclass.
+2


source


The problem is what you do:

x - minimum

      

in terms of the types you do:

datum - float, or datum - integer

      

Python doesn't know how to do any of these anyway, so what it does is look at the parent argument classes if possible. since datum is of type float, it can easily use float - and the calculation ends

float - float 

      

which will obviously lead to a "float" - python has no way of knowing how to construct a datum object unless you tell it to.

To solve this problem, you either need to implement math operators so python knows how to do it datum - float

, or come up with a different design.



Assuming "dtype", "source", index1 and index2 should remain unchanged after calculation - then your class is needed as an example:

def __sub__(self, other):
      return datum(value-other, self.dtype, self.source, self.index1, self.index2)

      

this should work - not tested

and now this will allow you to do it

d = datum(23.0, dtype="float", source="me", index1=1)
e = d - 16
print e.value, e.dtype, e.source, e.index1, e.index2

      

which should result in:

7.0 float  me  1  None

      

+1


source







All Articles