Formatting the timedelta64 string output
In a similar vein question , I have a column numpy.timedelta64
in a pandas DataFrame. According to this answer to the above question, there is a function pandas.tslib.repr_timedelta64
that displays the timedelta perfectly in days, hours: minutes: seconds. I would like to format them only after a few days and days.
So, I have the following:
def silly_format(hours):
(days, hours) = divmod(hours, 24)
if days > 0 and hours > 0:
str_time = "{0:.0f} d, {1:.0f} h".format(days, hours)
elif days > 0:
str_time = "{0:.0f} d".format(days)
else:
str_time = "{0:.0f} h".format(hours)
return str_time
df["time"].astype("timedelta64[h]").map(silly_format)
which gets me the output I want, but I was wondering if there is a function in numpy
or pandas
similar datetime.strftime
that can format numpy.timedelta64
according to the provided format string?
I tried to adapt @ Jeff's solution further, but it is slower than my answer. Here he is:
days = time_delta.astype("timedelta64[D]").astype(int)
hours = time_delta.astype("timedelta64[h]").astype(int) % 24
result = days.astype(str)
mask = (days > 0) & (hours > 0)
result[mask] = days.astype(str) + ' d, ' + hours.astype(str) + ' h'
result[(hours > 0) & ~mask] = hours.astype(str) + ' h'
result[(days > 0) & ~mask] = days.astype(str) + ' d'
source to share
While the answers provided by @sebix and @Jeff show a good way to convert timedeltas to days and hours, and @Jeff's solution in particular preserves the index Series
', they lacked the flexibility of final formatting the string. Now I am using the following solution:
def delta_format(days, hours):
if days > 0 and hours > 0:
return "{0:.0f} d, {1:.0f} h".format(days, hours)
elif days > 0:
return "{0:.0f} d".format(days)
else:
return "{0:.0f} h".format(hours)
days = time_delta.astype("timedelta64[D]")
hours = time_delta.astype("timedelta64[h]") % 24
return [delta_format(d, h) for (d, h) in izip(days, hours)]
which suits me and I return the index by inserting that list into the original one DataFrame
.
source to share
Here's how to do it in vector.
In [28]: s = pd.to_timedelta(range(5),unit='d') + pd.offsets.Hour(3)
In [29]: s
Out[29]:
0 0 days, 03:00:00
1 1 days, 03:00:00
2 2 days, 03:00:00
3 3 days, 03:00:00
4 4 days, 03:00:00
dtype: timedelta64[ns]
In [30]: days = s.astype('timedelta64[D]').astype(int)
In [31]: hours = s.astype('timedelta64[h]').astype(int)-days*24
In [32]: days
Out[32]:
0 0
1 1
2 2
3 3
4 4
dtype: int64
In [33]: hours
Out[33]:
0 3
1 3
2 3
3 3
4 3
dtype: int64
In [34]: days.astype(str) + ' d, ' + hours.astype(str) + ' h'
Out[34]:
0 0 d, 3 h
1 1 d, 3 h
2 2 d, 3 h
3 3 d, 3 h
4 4 d, 3 h
dtype: object
If you want exactly the same as OP:
In [4]: result = days.astype(str) + ' d, ' + hours.astype(str) + ' h'
In [5]: result[days==0] = hours.astype(str) + ' h'
In [6]: result
Out[6]:
0 3 h
1 1 d, 3 h
2 2 d, 3 h
3 3 d, 3 h
4 4 d, 3 h
dtype: object
source to share
I don't know how this is done in pandas, but here's my numpy-only method for your problem:
import numpy as np
t = np.array([200487900000000,180787000000000,400287000000000,188487000000000], dtype='timedelta64[ns]')
days = t.astype('timedelta64[D]').astype(np.int32) # gives: array([2, 2, 4, 2], dtype=int32)
hours = t.astype('timedelta64[h]').astype(np.int32)%24 # gives: array([ 7, 2, 15, 4], dtype=int32)
So, I just convert the raw data to the desired output type (let numpy do it), then we have two arrays with data and can be used as we like. To group them in pairs, just do:
>>> np.array([days, hours]).T
array([[ 2, 7],
[ 2, 2],
[ 4, 15],
[ 2, 4]], dtype=int32)
For example:
for row in d:
print('%dd %dh' % tuple(row))
gives:
2d 7h
2d 2h
4d 15h
2d 4h
source to share