Add additional column as cumulative temporary difference
How do I add an extra column that is the cumulative value of temporary differences for each course? For example, the original table:
id_A course weight ts_A value
id1 cotton 3.5 2017-04-27 01:35:30 150.000000
id1 cotton 3.5 2017-04-27 01:36:00 416.666667
id1 cotton 3.5 2017-04-27 01:36:30 700.000000
id1 cotton 3.5 2017-04-27 01:37:00 950.000000
id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000
id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000
id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667
id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000
Expected Result:
id_A course weight ts_A value cum_delta_sec
id1 cotton 3.5 2017-04-27 01:35:30 150.000000 0
id1 cotton 3.5 2017-04-27 01:36:00 416.666667 30
id1 cotton 3.5 2017-04-27 01:36:30 700.000000 60
id1 cotton 3.5 2017-04-27 01:37:00 950.000000 90
id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000 0
id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000 30
id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667 60
id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000 90
+3
source to share
3 answers
You can bind the method diff
with cumsum
:
# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)
# convert ts_A to seconds, group by id and then use transform to calculate the cumulative difference
df['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df
+3
source to share
Use groupby
, transform
and .iloc
:
df['ts_A'] = pd.to_datetime(df.ts_A)
df['cum_delta_sec'] = (df.groupby('id_A')['ts_A']
.transform(lambda x: (x - x.iloc[0]).dt.total_seconds()))
Output:
id_A course weight ts_A value cum_delta_sec
0 id1 cotton 3.5 2017-04-27 01:35:30 150.000000 0
1 id1 cotton 3.5 2017-04-27 01:36:00 416.666667 30
2 id1 cotton 3.5 2017-04-27 01:36:30 700.000000 60
3 id1 cotton 3.5 2017-04-27 01:37:00 950.000000 90
4 id2 cotton blue 5.0 2017-04-27 02:35:30 150.000000 0
5 id2 cotton blue 5.0 2017-04-27 02:36:00 450.000000 30
6 id2 cotton blue 5.0 2017-04-27 02:36:30 520.666667 60
7 id2 cotton blue 5.0 2017-04-27 02:37:00 610.000000 90
In the group, subtract the current value from the first value and use an .dt
accessor to convert to seconds.
+2
source to share
import csv
import datetime as dt
with open('path/to/input') as fin, open('path/to/output', 'w') as fout:
infile = csv.DictReader(fin, delimiter='\t')
outfile = csv.DictWriter(fout, delimiter='\t', fieldnames=infile.fieldnames + ['cum_delta_sec'])
cdt = 0
last = None
for row in infile:
if last is None:
last = dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")
row['cum_delta_sec'] = 0
outfile.writerow(row)
continue
cdt += (last - dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")).total_seconds()
row['cum_delta_sec'] = cdt
outfile.writerow(row)
0
source to share