Version list instead of immutable list?

Update:

  • Since CPython 3.6, dictionaries have a version (thanks to pylang for showing this to me).
  • If they added the same version to the list and made it public, all 3 statements from my original post will go through! This would definitely suit my needs. Their implementation is different from what I imagined, but I like it.
  • However, I don't feel like I can use the dictionary version:
    • This is not public. Jake Vanderplace shows you how to expose it in a post, but he cautions: definitely not code that you should use for any purpose other than just fun use. I agree with his views.
    • In all my use cases, the data is conceptual arrays of elements, each with the same structure. The list of tuples is a natural fit. Using a dictionary will make the code less natural and probably more cumbersome.
  • Does anyone know if there are any plans to add a version to the list?
  • Are there plans to make this public?

If you plan to add a version to the list and make it publicly available, I would be uncomfortable exposing an incompatible VersionedList now. I would just meet the minimum I need and get through.


Original post below


It turns out that in many times when I wanted to have an immutable list, VersionedList would work much the same (sometimes even better).

  • Has anyone implemented a version list?
  • Is there a better Pythonic concept that suits my needs? (See Motivation below.)

What do I mean by version:

  • A class that behaves like a list
  • Any change to an instance or items in an instance results in an update instance.version()

    . So, if it alist

    is a regular list:

    a = VersionedList(alist)
    a_version = a.version()
    change(a)
    assert a_version != a.version()
    reverse_last_change(a)  
    
          

  • If the list was hashable, hash () would achieve the above and satisfy all the needs outlined in the motivation below. We need to define version () in such a way that it doesn't have all the same problems as hash ().

    If identical data in the two lists is unlikely to ever happen, other than on initialization, we have no reason to check for deep equality. From ( https://docs.python.org/3.5/reference/datamodel.html#object. Hash ) The only property required is that objects that compare the same have the same hash value. If we do not impose this requirement on "version ()", it seems likely that "version ()" will not have all the same problems as lists not available for the schedule. Thus, unlike a hash, identical content does not mean the same version

    #contents of 'a' are now identical to original, but...
    assert a_version != a.version()
    
    b = VersionedList(alist)
    c = VersionedList(alist)
    assert b.version() != c.version()
    
          

  • For VersionList, it would be nice if any attempt to change the result would __get__

    automatically result in a copy instead of changing the underlying implementation data. I think the only other option would be to __get__

    always return a copy of the elements and that would be very inefficient for all use cases I can think of. I think we need to restrict the elements to immutable objects (deeply immutable, for example: exclude tuples with list items). I can think of 3 ways to achieve this:

    • Allow only elements that cannot contain mutable elements (int, str, etc., are exact but exclude tuples). (This limits my affairs too much)
    • Add code to __init__

      , __set__

      etc. to traverse the input to deeply check volatile subelements. (expensive, any way to avoid this?)
    • Also allows for more complex elements, but requires them to be deeply immutable. They might want them to display an attribute deeply_immutable

      . (This turns out to be lightweight for all use cases I have)

Motivation:

  • If I parse a dataset, I often have to go through multiple steps that return large datasets (note that because the dataset is ordered, it is best represented as a list, not a set).

    If at the end of a few steps (ex: 5) it turns out that I need to do another analysis (ex: back in step 4), I want to know that the dataset from step 3 hasn't been accidentally changed, so I can start from step 4 instead of repeating steps 1-3.

  • I have functions (breakpoints, first derivative, second derivative, offset, contour, etc.) that depend on and return objects with an array (in the sense of linear algebra). The underlying "array" knots

    .

    control-points()

    depends on: knots

    , algorithm_enum


    first-derivative()

    depends on: control-points()

    , knots


    offset()

    depends on: first-derivative()

    , control-points()

    , knots

    , offset_distance


    outline()

    it depends on: offset()

    ,end_type_enum

    If offset_distance

    changes, I want to avoid re-evaluating first-derivatives () and breakpoints (). To avoid recalculation, I need to know that nothing has accidentally changed the resulting "arrays".

    If the "swirls" change, I need to recalculate everything and not depend on the previous resulting "arrays".

    To achieve this, knots

    all objects with an array can be VersionedList.

FYI: I was hoping to use an efficient class like numpy.ndarray. In most use cases, elements are logically structured. Keeping track of multidimensional indexes mentally meant implementing and debugging algorithms is many times more difficult with ndarray. An implementation based on lists of named sets of named sets has proven to be much more robust.

+3


source to share


1 answer


Private dicts in 3.6

In Python 3.6, dictionaries are now closed ( PEP 509 ) and compact ( issue 27350 ), which track versions and preserve order accordingly. These features are currently correct using the CPython 3.6 implementation. Despite the challenge, Jake Vanderplace demonstrates on his blog a detailed demo of demonstrating this CPython versioning functionality within regular Python. We can use his approach to:

  • determine when the dictionary is updated
  • save order

Example

import numpy as np

d = {"a": np.array([1,2,3]),
     "c": np.array([1,2,3]),
     "b": np.array([8,9,10]),
    }

for i in range(3):
    print(d.get_version())                                 # monkey-patch
# 524938
# 524938
# 524938

      

Please note that the version number does not change until the dictionary is updated, as shown below:

d.update({"c": np.array([10, 11, 12])})
d.get_version()
# 534448

      



Also, the order of insertion is preserved (the following has been tested in restarted Python 3.5 and 3.6 sessions):

list(d.keys())
# ['a', 'c', 'b']

      

You might be able to take advantage of this new behavior in the dictionary, saving you the trouble of implementing a new data type.


More details

For those interested, the latter get_version()

is a headless method for any vocabulary implemented in Python 3.6 using the following modified code derived from Jake VanderPlace's blog post. This code was run before callingget_version().

import types
import ctypes
import sys
assert (3, 6) <= sys.version_info < (3, 7)                 # valid only in Python 3.6

py_ssize_t = ctypes.c_ssize_t  

# Emulate the PyObjectStruct from CPython
class PyObjectStruct(ctypes.Structure):
    _fields_ = [('ob_refcnt', py_ssize_t),
                ('ob_type', ctypes.c_void_p)]


# Create a DictStruct class to wrap existing dictionaries
class DictStruct(PyObjectStruct):
    _fields_ = [("ma_used", py_ssize_t),
                ("ma_version_tag", ctypes.c_uint64),
                ("ma_keys", ctypes.c_void_p),
                ("ma_values", ctypes.c_void_p),
               ]

    def __repr__(self):
        return (f"DictStruct(size={self.ma_used}, "
                f"refcount={self.ob_refcnt}, "
                f"version={self.ma_version_tag})")

    @classmethod
    def wrap(cls, obj):
        assert isinstance(obj, dict)
        return cls.from_address(id(obj))

assert object.__basicsize__ == ctypes.sizeof(PyObjectStruct)
assert dict.__basicsize__ == ctypes.sizeof(DictStruct)


# Code for monkey-patching existing dictionaries
class MappingProxyStruct(PyObjectStruct):
    _fields_ = [("mapping", ctypes.POINTER(DictStruct))]

    @classmethod
    def wrap(cls, D):
        assert isinstance(D, types.MappingProxyType)
        return cls.from_address(id(D))

assert types.MappingProxyType.__basicsize__ == ctypes.sizeof(MappingProxyStruct)


def mappingproxy_setitem(obj, key, val):
    """Set an item in a read-only mapping proxy"""
    proxy = MappingProxyStruct.wrap(obj)
    ctypes.pythonapi.PyDict_SetItem(proxy.mapping,
                                    ctypes.py_object(key),
                                    ctypes.py_object(val))

mappingproxy_setitem(dict.__dict__,
                     'get_version',
                     lambda self: DictStruct.wrap(self).ma_version_tag)

      

+1


source







All Articles