Squash and sum of values ​​from three-level nested dictionaries in one-line (or two-line)

I was answering another OP's question on how to add different items to nested dictionaries and I came up with three nested labeling methods for

for adding items in a 3-level dictionary that works, but meeeh ... I'm sure it can be done more succinctly.

To provide a real use case: let's say I have some data from a camera through some kind of store entrance that counts how many people walk in and out of the store after 10 minutes:

data = {
        "2014/01/01": {
            "15:00:00" : {
                "ins": 7,
                "outs": 5,
            },
            "15:10:00" : {
                "ins": 24,
                "outs": 10,
            },
            "15:20:00" : {
                "ins": 10,
                "outs": 20,
            },
        },
        "2014/01/02": {
            "15:00:00" : {
                "ins": 10,
                "outs": 10,
            },
            "15:10:00" : {
                "ins": 12,
                "outs": 5,
            },
            "15:20:00" : {
                "ins": 5,
                "outs": 10,
            },
        },
}

      

I would like to glue these dictionaries together and add attachments and outputs, grouping them at specific times, regardless of date. Let me say that I want to know "how many people got in and out of my place for each bucket of time, regardless of the date" or with another wording "I want to know how many total entries and exits have been since the beginning of time for each bucket of time

This will be the result of adding inputs and outputs for all dicts whose key is the time found in the data

dict (excluding the level of the first date). With the above example data

, which would be:

"15:00:00": {
    "ins": 17  # (7 + 10)
    "outs": 15 # (5 + 10)
},
"15:10:00": {
    "ins": 36  # (24 + 12)
    "outs": 15 # (10 + 5)
},
"15:20:00": {
    "ins": 15  # (10 + 5)
    "outs": 30 # (20 + 10)
}

      

Is there a way ... somehow (I'm guessing via itertools , but I don't know which tools would be correct) starting with data

, end up with the result shown above in one line (or two)?

I've been messing around with the answers found in

But I cannot figure out how to get what I want. I either get a list of objects Counter

(and then I don't know what to do with them), or I get an error because I am trying to add two dict

s ...

I know it doesn't really matter (three loops for

do the job), but I'm wondering how much this is possible and how to shorten my code (and probably find out about itertools

which about the time ...)

Thanks in advance.

+3


source to share


3 answers


Yes, it can be done as one liner. I split it into two lines, and even with the fact that it is unreadable.

flattened = sorted((time,key,count) for day in data.values() for time,counters in day.items() for key,count in counters.items())
{time:{key:sum(datum[2] for datum in counters) for key,counters in itertools.groupby(group, lambda x:x[1])} for time,group in itertools.groupby(flattened, lambda x:x[0])}

{'15:20:00': {'outs': 30, 'ins': 15}, '15:00:00': {'outs': 15, 'ins': 17}, '15:10:00': {'outs': 15, 'ins': 36}}

      



Just because something can be done doesn't mean it should be done. I would go with the clearest solution and it is not.

+1


source


It's a little longer than two lines, but:

from collections import Counter, defaultdict

flattened = (time for day in data.itervalues() for time in day.iteritems())
sums = defaultdict(lambda: Counter())

for time, entries in flattened:
    sums[time] += Counter(entries)

      



which gives:

In [116]: dict(sums)
Out[116]: 
{'15:00:00': Counter({'ins': 17, 'outs': 15}),
 '15:10:00': Counter({'ins': 36, 'outs': 15}),
 '15:20:00': Counter({'outs': 30, 'ins': 15})}

      

+2


source


You can use pandas DataFrames: fooobar.com/questions/275852 / ...

It would be 2 lines, one of which would create a dataframe (previous answers to previous questions) and the other for a simple numpy () sum with the conditions you want, which can also be added at the end of the line and make it a one-line (pretty long).

UPDATE: the code is not ofuscated value ...

# Create data frame
>>> table = pd.DataFrame([[c2, d2['ins'], d2['outs']] for d1 in data.values() for c2, d2 in d1.items()])
>>> table
          0   1   2
0  15:20:00   5  10
1  15:00:00  10  10
2  15:10:00  12   5
3  15:20:00  10  20
4  15:00:00   7   5
5  15:10:00  24  10

[6 rows x 3 columns]

      

If column 1 ins

and 2 are outs

.

>>> table.groupby(0).sum()
           1   2
0               
15:00:00  17  15
15:10:00  36  15
15:20:00  15  30

      

+1


source







All Articles