Recursive function in Python: the best way to get a list of specific nested elements

I have a tree of nested dictionaries. This is a small extract to give you an idea:

db = {
    'compatibility': {
        'style': {
            'path_to_file': 'compatibility/render/style.py',
            'checksum': {
                '0.0.3':'AAA55d796c25ad867bbcb8e0da4e48d17826e6f9fce',
                '0.0.2': '55d796c25ad867bbcb8e0da4e48d17826e6f9fe606',}}},
    'developer': {
        'render': {
            'installation': {
                'path_to_file': 'developer/render/installation.py',
                'checksum': {
                    '0.0.1': 'c1c0d4080e72292710ac1ce942cf59ce0e26319cf3'}},
            'tests': {
                'path_to_file': 'developer/render/test.py',
                'checksum': {
                    '0.0.1': 'e71173ac43ecd949fdb96cfb835abadb877a5233a36b115'}}}}}

      

I want to get a list of all dictionary modules nested in a tree. That way I could drill through the list and check the checksum of each file (note that modules can be at different levels, like in the example above).

I wrote the following recursive function for this. I know that every module has keys "path_to_file" and "checksum", so I use that to check if a dict is a module. Note that I had to wrap the recursive function inside another function that contains the list so that the list is not overwritten every time the recursive function is executed.

def _get_modules_from_db(dictionary):
    def recursive_find(inner_dictionary):
        for k, v in inner_dictionary.iteritems():
            if (isinstance(v, dict) and
                    not sorted(v.keys()) == ['path_to_file', 'sha512sum']):
                recursive_find(v)
            else:
                leaves.append(v)
    leaves = []
    recursive_find(dictionary)
    return leaves

      

This approach works, however to wrap a function seems very ugly to me. So my question for the pros on Stack Overflow is:

Is there a simpler (or better) approach that you would recommend achieving, without the need to wrap the function?

+3


source to share


3 answers


First, the only reason you need to wrap this function is because you are creating a recursive_find

mutation of the closing cell leaves

instead return

. Sometimes it's a useful performance optimization (although just as often it's a pessimization), and sometimes it's just not clear how to do it otherwise, but this time it's trivial:

def _get_modules_from_db(dictionary):
    leaves = []
    for k, v in dictionary.iteritems():
        if (isinstance(v, dict) and
            not sorted(v.keys()) == ['path_to_file', 'sha512sum']):
            leaves.extend(_get_modules_from_db(v))
        else:
            leaves.append(v)
    return leaves

      


For additional improvements: I would probably turn this into a generator (at least in 3.3+, s yield from

; in 2.7 I might think twice). And, while we are doing this, I would compare the key view (in 3.x) or set(v)

(in 2.x) with a set, rather than do unnecessary sorting (and no reason for .keys()

using set

or sorted

) and use !=

instead of not

and ==

. And, if there is no good reason to actually accept subclassing dict

and dict

, I would either tone it down or use it collections.[abc.]Mapping

. So:

def _get_modules_from_db(dictionary):
    for k, v in dictionary.items():
        if isinstance(v, Mapping) and v.keys() != {'path_to_file', 'sha512sum'}:
            yield from _get_modules_from_db(v)
        else:
            yield v

      

Or, alternatively, pull out the base files, so you can call it directly on the line:



def _get_modules_from_db(d):
    if isinstance(d, Mapping) and d.keys() != {'path_to_file', 'sha512sum'}:
        for v in d.values():
            yield from _get_modules_from_db(v)
    else:
        yield d

      

I think it's a little more readable than yours and it's 6 lines instead of 11 (although version 2.x will be 7 lines). But I don't see anything wrong with your version.


If you don't know how to turn this 3.3+ code into 2.7 / 3.2 code:

  • Rewrite yield from eggs

    as for egg in eggs: yield egg

    .
  • Mapping

    is in collections

    , not collections.abc

    .
  • Use set(v)

    instead v.keys()

    .
  • Maybe use itervalues

    instead values

    (2.x only).
+6


source


I see no problem with this approach. You need a recursive function that manages some kind of global state - this is a pretty sane way to do it (internal functions in Python are not that unusual).

However, you can add a default argument if you want to avoid the nested function:



def _get_modules_from_db(db, leaves=None):
    if leaves is None:
        leaves = []
    if not isinstance(db, dict):
        return leaves

    # Use 'in' check to avoid sorting keys and doing a list compare
    if 'path_to_file' in db and 'checksum' in db:
        leaves.append(db)
    else:
        for v in db.values():
            _get_modules_from_db(v, leaves)

    return leaves

      

+2


source


In my personal opinion, nested functions are good, but here's a shorter version, nevertheless

from operator import add

def _get_modules_from_db(db):
  if 'path_to_file' in db and 'sha512sum' in db:
    return [db]
  return reduce(add, (_get_modules_from_db(db[m]) for m in db))

      

+1


source







All Articles