SAS Date Difference by Group
Consider the dataset test
as follows:
Group Date
1 05JAN2014
1 08JAN2014
1 14JAN2014
2 05JAN2013
2 10FEB2015
2 27FEB2015
I want to calculate the difference in dates by group. The code below takes into account the difference between every two dates:
data test;
datediff = dif(Date);
run;
How can I only distinguish between dates in one group? Also, is there a way to change the difference between the last and first dates in each group?
source to share
Starting from this:
data test;
datediff = dif(Date);
run;
Let me address your numbers one at a time. Firstly, by adding an operator set
and an operator by
, we can also add first
and last
so that you can determine where you are in the group. This assumes that it is already sorted by group
.
data test2;
set test;
by group;
datediff=dif(date);
run;
It doesn't work any differently (assuming you had the set assignment originally, anyway). But now you have new possibilities.
First, if you can use dif
, I recommend the method for that retain
. You can more easily see what it does and avoid some common pitfalls: in particular, lag
and dif
not actually being compared to the previous entry - they create a queue and are compared to what can lead to complications when using conditional statements.
data test2;
set test;
by group;
retain last_date;
if first.group then last_date=0;
datediff = date - last_date;
output;
last_date = date;
run;
This does the same thing as before - it compares the previous value to the current value, but makes it easier to see, and we add an option to reset the variable last_date
when first.group
true - that means we are in the first line of the new group value. I won't discard any of these intermediate variables, but in production code, you can and should. retain
means the value will be stored across lines (instead of resetting every time you get a new line).
Now that you have one variable keeping track of the previous string value date
, it's pretty easy to hope to see how we can do this for the first and last difference.
data test2;
set test;
by group;
retain last_date orig_date;
if first.group then do;
last_date=0;
orig_date=date; **new;
end;
datediff = date - last_date;
if last.group then group_datediff = date-orig_date; **new;
output;
last_date = date;
run;
Now we have done the same as before, but we restart orig_date every time we see first.group
and calculate the group_datediff when we click last.group
.
source to share
Here is a method using the lag
new feature ifn
as well ( ifn
included in SAS 9.2 or later). Be careful with the delay function as it can sometimes lead to unexpected results. See this article for details .
*Data must be sorted to use BY groups;
proc sort data=have; by group date;run;
data want;
set have;
by group;
dateDiff = ifn(first.group, . , dif(date));
retain firstDate;
if first.group then firstDate = date;
if last.group then dateDiff_all = date - firstDate;
run;
source to share
Here is a solution using proc sql and a group statement using the min and max pivot functions.
This will give you the difference in each group for the first and last dates.
Technically this will give you the difference in the smallest and largest dates, but you are date in chronological order, so this will work for that data.
proc sql;
create table want as select
group,
max(date) - min(date) as datediff
from have
group by group;
quit;
source to share