Use Python 2 Dict Comparison in Python 3

I'm trying to port some code from Python 2 to Python 3. These are ugly things, but I'm trying to get Python 3 results as identical as possible to Python 2. I have code similar to this:

import json

# Read a list of json dictionaries by line from file.

objs = []
with open('data.txt') as fptr:
    for line in fptr:
        objs.append(json.loads(line))

# Give the dictionaries a reliable order.

objs = sorted(objs)

# Do something externally visible with each dictionary:

for obj in objs:
    do_stuff(obj)

      

When I port this code from Python 2 to Python 3, I get the error:

TypeError: unorderable types: dict() < dict()

      

So, I changed the sorted string:

objs = sorted(objs, key=id)

      

But the order of the dictionaries was still changing between Python 2 and Python 3.

Is there a way to replicate Python 2 comparison logic in Python 3? Is it just what has been id

used before and is not reliable between Python versions?

+3


source to share


4 answers


If you want the same behavior as earlier versions of Python 2.x in both versions (using arbitrary sort order instead) and 3.x (which refuses to sort dicts), Ned Batchelder answers the question of how sorting dicts works , you get part of the way there, but not completely.


First, it provides an cmp

old-style function, not a key

new-style function . Fortunately, both 2.7 and 3.x have functools.cmp_to_key

solutions to this problem. (Of course, you could rewrite the code as a key function instead, but that might make it difficult to see the differences between posted code and your code ...)


More importantly, not only does it not do the same in 2.7 and 3.x, it doesn't even work in 2.7 and 3.x. To see why, take a look at the code:

def smallest_diff_key(A, B):
    """return the smallest key adiff in A such that A[adiff] != B[bdiff]"""
    diff_keys = [k for k in A if A.get(k) != B.get(k)]
    return min(diff_keys)

def dict_cmp(A, B):
    if len(A) != len(B):
        return cmp(len(A), len(B))
    adiff = smallest_diff_key(A, B)
    bdiff = smallest_diff_key(B, A)
    if adiff != bdiff:
        return cmp(adiff, bdiff)
    return cmp(A[adiff], b[bdiff])

      

Note that it calls cmp

on non-matching values.



If dicts can contain other dicts, then relying on what cmp(d1, d2)

will end up calling that function ... which is clearly not new Python.

Also, in 3.x cmp

no longer exists.

Also, it depends on the fact that any value can be compared to any other value - you can get arbitrary results, but you won't get an exception. This was true (except for a few rare cases) in 2.x, but it is not true in 3.x. This might not be a problem for you if you don't want to compare dicts with non-comparable values ​​(for example, if it's okay to {1: 2} < {1: 'b'}

throw an exception), but otherwise it is.

And of course, if you don't want arbitrary results for comparing dict, do you really want arbitrary results for comparing values?

The solution to all three problems is simple: you have to replace cmp

, not name it. So, something like this:

def mycmp(A, B):
    if isinstance(A, dict) and isinstance(B, dict):
        return dict_cmp(A, B)
    try:
        return A < B
    except TypeError:
        # what goes here depends on how far you want to go for consistency

      

If you need the exact rules for comparing objects of different types that were used 2.7, they are documented , you can implement them, But if you don't need so many details, you can write something simpler here (or maybe just not lure into trap TypeError

if the exception mentioned above is acceptable).

+3


source


Is there a way to replicate Python 2 comparison logic in Python 3? Is it just that ID has been used before and is not reliable between Python versions?

id

never "reliable". id

you get for any given object is a completely arbitrary value; it can differ from one run to the next, even on the same machine and Python version.

Python 2.x doesn't actually document what it sorts by id

. All he says :

Results other than equality are resolved sequentially, but not otherwise determined.

But that only makes the point even better: the order is explicitly defined to be arbitrary (except for matching during any given run). This is exactly the same guarantee you get by sorting with help key=id

in Python 3.x, whether or not it actually works the same way. *

So you are doing the same in 3.x. The fact that two arbitrary orders are different means that an arbitrary one is arbitrary.




If you want some kind of repeatable ordering for the dict based on what it contains, you just have to decide what the order is and then you can build it. For example, you can sort the elements in order and then compare them (pass the same key function recursively if the elements contain or contain dicts). **

And by designing and implementing some sane, not arbitrary order, it will of course work the same in 2.7 and 3.x.


* Note that this is not equivalent for identity mappings, only for ordering comparisons. If you only use it for sorted

, it will result in your strain not being more stable. But since it's in any order, it hardly matters.

** Note that Python 2.x used a rule like this. From a note to the above: "Earlier versions of Python used lexicographic comparison of sorted (key, value) lists, but this was very expensive for the usual case of equality comparison." So it tells you that this is a sane rule - as long as it really is the rule you want and you don't mind the cost of doing it.

0


source


The logic in CPython2.x is somewhat complex since the behavior is dictated dict.__cmp__

. The python implementation can be found here .

However, if you really want a reliable order, you need to sort the better key than id

. You can use functools.cmp_to_key

to convert a comparison function from the linked answer to a key function, but this is actually not a very good order as it is completely arbitrary.

It is best to sort all dictionaries by field value (or multiple fields). operator.itemgetter

can be used pretty well for this purpose. Using this function as a key function should give you consistent results for any somewhat modern python implementation and version.

0


source


If you just want an order that is consistent across multiple Python runs on potentially different platforms, but doesn't really care about the actual order, then a simple solution is to dump the dicts to JSON before sorting them:

import json

def sort_as_json(dicts):
    return sorted(dicts, key=json.dumps)

print(list(sort_as_json([{'foo': 'bar'}, {1: 2}])))
# Prints [{1: 2}, {'foo': 'bar'}]

      

Obviously this only works if your dicts are JSON representable, but since you are loading them from JSON anyway, this shouldn't be a problem. In your case, you can achieve the same result by simply sorting the file you are loading from objects before deserializing the JSON.

0


source







All Articles