Recursive function in Python: the best way to get a list of specific nested elements
I have a tree of nested dictionaries. This is a small extract to give you an idea:
db = {
'compatibility': {
'style': {
'path_to_file': 'compatibility/render/style.py',
'checksum': {
'0.0.3':'AAA55d796c25ad867bbcb8e0da4e48d17826e6f9fce',
'0.0.2': '55d796c25ad867bbcb8e0da4e48d17826e6f9fe606',}}},
'developer': {
'render': {
'installation': {
'path_to_file': 'developer/render/installation.py',
'checksum': {
'0.0.1': 'c1c0d4080e72292710ac1ce942cf59ce0e26319cf3'}},
'tests': {
'path_to_file': 'developer/render/test.py',
'checksum': {
'0.0.1': 'e71173ac43ecd949fdb96cfb835abadb877a5233a36b115'}}}}}
I want to get a list of all dictionary modules nested in a tree. That way I could drill through the list and check the checksum of each file (note that modules can be at different levels, like in the example above).
I wrote the following recursive function for this. I know that every module has keys "path_to_file" and "checksum", so I use that to check if a dict is a module. Note that I had to wrap the recursive function inside another function that contains the list so that the list is not overwritten every time the recursive function is executed.
def _get_modules_from_db(dictionary):
def recursive_find(inner_dictionary):
for k, v in inner_dictionary.iteritems():
if (isinstance(v, dict) and
not sorted(v.keys()) == ['path_to_file', 'sha512sum']):
recursive_find(v)
else:
leaves.append(v)
leaves = []
recursive_find(dictionary)
return leaves
This approach works, however to wrap a function seems very ugly to me. So my question for the pros on Stack Overflow is:
Is there a simpler (or better) approach that you would recommend achieving, without the need to wrap the function?
source to share
First, the only reason you need to wrap this function is because you are creating a recursive_find
mutation of the closing cell leaves
instead return
. Sometimes it's a useful performance optimization (although just as often it's a pessimization), and sometimes it's just not clear how to do it otherwise, but this time it's trivial:
def _get_modules_from_db(dictionary):
leaves = []
for k, v in dictionary.iteritems():
if (isinstance(v, dict) and
not sorted(v.keys()) == ['path_to_file', 'sha512sum']):
leaves.extend(_get_modules_from_db(v))
else:
leaves.append(v)
return leaves
For additional improvements: I would probably turn this into a generator (at least in 3.3+, s yield from
; in 2.7 I might think twice). And, while we are doing this, I would compare the key view (in 3.x) or set(v)
(in 2.x) with a set, rather than do unnecessary sorting (and no reason for .keys()
using set
or sorted
) and use !=
instead of not
and ==
. And, if there is no good reason to actually accept subclassing dict
and dict
, I would either tone it down or use it collections.[abc.]Mapping
. So:
def _get_modules_from_db(dictionary):
for k, v in dictionary.items():
if isinstance(v, Mapping) and v.keys() != {'path_to_file', 'sha512sum'}:
yield from _get_modules_from_db(v)
else:
yield v
Or, alternatively, pull out the base files, so you can call it directly on the line:
def _get_modules_from_db(d):
if isinstance(d, Mapping) and d.keys() != {'path_to_file', 'sha512sum'}:
for v in d.values():
yield from _get_modules_from_db(v)
else:
yield d
I think it's a little more readable than yours and it's 6 lines instead of 11 (although version 2.x will be 7 lines). But I don't see anything wrong with your version.
If you don't know how to turn this 3.3+ code into 2.7 / 3.2 code:
- Rewrite
yield from eggs
asfor egg in eggs: yield egg
. -
Mapping
is incollections
, notcollections.abc
. - Use
set(v)
insteadv.keys()
. - Maybe use
itervalues
insteadvalues
(2.x only).
source to share
I see no problem with this approach. You need a recursive function that manages some kind of global state - this is a pretty sane way to do it (internal functions in Python are not that unusual).
However, you can add a default argument if you want to avoid the nested function:
def _get_modules_from_db(db, leaves=None):
if leaves is None:
leaves = []
if not isinstance(db, dict):
return leaves
# Use 'in' check to avoid sorting keys and doing a list compare
if 'path_to_file' in db and 'checksum' in db:
leaves.append(db)
else:
for v in db.values():
_get_modules_from_db(v, leaves)
return leaves
source to share