Limit panda missing data padding to one index in multi-indexed DataFrame
As an example, let's say I have a df with columns for 'year', 'quarter' (sequential after one year), variable ('var') and dimension ('value'):
year quarter var value
2015 1 A 0.1
2015 2 A 0.5
2015 3 A 0.6
2015 4 A 1.0
2015 1 B 0.1
2015 4 B 0.5
2015 2 C 0.0
2015 3 C 0.7
2015 4 C 1.2
but sometimes data are missing (example: see [2015.2, "B"]). it doesn't stretch too much to insert NaN into the data using reindexing so I get this:
year quarter var value
2015 1 A 0.1
2015 2 A 0.5
2015 3 A 0.6
2015 4 A 1.0
2015 1 B 0.1
2015 2 B NaN
2015 3 B NaN
2015 4 B 0.5
2015 1 C NaN
2015 2 C 0.0
2015 3 C 0.7
2015 4 C 1.2
but what I would like to do is fill in the "missing" data using forward fill to propagate the values ββ- i.e. df.ffill () - and then fill the remaining values ββwith zero - i.e. df.fillna (0) so you get something like this:
year quarter var value
2015 1 A 0.1
2015 2 A 0.5
2015 3 A 0.6
2015 4 A 1.0
2015 1 B 0.1
2015 2 B 0.1
2015 3 B 0.1
2015 4 B 0.5
2015 1 C 0.0
2015 2 C 0.0
2015 3 C 0.7
2015 4 C 1.2
however, when I use df.ffill (), I haven't found a way to restrict / section to 'var' or 'year'.
My first idea was to convert the data to a pivot table:
pd.pivot_table(data,values='value',index=['year','quarter'],columns='var',aggfunc=np.sum)
and then forward fill, but I can't figure out how to limit it to a year (or how to unzip the pivot table back to its original form).
any help is appreciated!
source to share
You basically want your data in a table over time on row indices and everything else in columns. You can use pivot table or stack / stack:
df2 = df.set_index(['year', 'quarter', 'var']).unstack('var')
>>> df2
value
var A B C
year quarter
2015 1 0.1 0.1 NaN
2 0.5 NaN 0.0
3 0.6 NaN 0.7
4 1.0 0.5 1.2
After the data is in this form, then fill in the fill and back.
df2 = df2.ffill().bfill(0)
Finally, add and sort your data, then reset your index if you like:
>>> df2.stack('var').sortlevel(2).reset_index()
year quarter var value
0 2015 1 A 0.1
1 2015 2 A 0.5
2 2015 3 A 0.6
3 2015 4 A 1.0
4 2015 1 B 0.1
5 2015 2 B 0.1
6 2015 3 B 0.1
7 2015 4 B 0.5
8 2015 1 C 0.0
9 2015 2 C 0.0
10 2015 3 C 0.7
11 2015 4 C 1.2
source to share