Determine the number of overlapping time slots in a data frame

Question

Determine the number of overlapping time slots in a data frame

I have a list of contracts with start and end dates.

How can I calculate the number of overlapping contracts over the duration of the contracts?

df = pd.DataFrame({
    'contract': pd.Series(['A1', 'A2', 'A3', 'A4']),
    'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']),
    'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015'])
})

which gives:

  contract         end       start
0       A1  16/01/2015  01/01/2015
1       A2  10/02/2015  03/02/2015
2       A3  18/01/2015  15/01/2015
3       A4  12/01/2015  10/01/2015

A1 overlaps A3 and A4, so overlaps = 2. A2 overlaps without contract, so overlaps = 0. A3 overlaps with A1, so overlaps = 1. A4 overlaps with A1, so overlaps = 1.

I could just compare each time span (from start to finish), but is that O(n**2)

any better idea?

I have a feeling it could be improved by sorting and then looping through the sorted ranges

+3

python pandas

NoIdeaHowToFixThis May 04 '15 at 14:29

source to share

1 answer

Primer · Answer 1 · 2015-05-04T20:01:50+0000

Here's how to do it:

df = pd.DataFrame({
    'contract': pd.Series(['A1', 'A2', 'A3', 'A4']),
    'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']),
    'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015'])
})
df['start'] = pd.to_datetime(df.start, dayfirst=True)
df['end'] = pd.to_datetime(df.end, dayfirst=True)

periods = df[['start', 'end']].apply(lambda x: (pd.date_range(x['start'], x['end']),), axis=1)
overlap = periods.apply(lambda col: periods.apply(lambda col_: col[0].isin(col_[0]).any()))
df['overlap_count'] = overlap[overlap].apply(lambda x: x.count() - 1, axis=1)
print df

What gives:

  contract        end      start  overlap_count
0       A1 2015-01-16 2015-01-01              2
1       A2 2015-02-10 2015-02-03              0
2       A3 2015-01-18 2015-01-15              1
3       A4 2015-01-12 2015-01-10              1

I updated the code to output the number of matches, not the overlap in days.

Determine the number of overlapping time slots in a data frame

More articles: