SAS Date Difference by Group

Consider the dataset test

as follows:

 Group   Date
  1      05JAN2014
  1      08JAN2014
  1      14JAN2014
  2      05JAN2013
  2      10FEB2015
  2      27FEB2015

      

I want to calculate the difference in dates by group. The code below takes into account the difference between every two dates:

  data test;
  datediff = dif(Date);
  run;

      

How can I only distinguish between dates in one group? Also, is there a way to change the difference between the last and first dates in each group?

+3


source to share


3 answers


Starting from this:

 data test;
  datediff = dif(Date);
  run;

      

Let me address your numbers one at a time. Firstly, by adding an operator set

and an operator by

, we can also add first

and last

so that you can determine where you are in the group. This assumes that it is already sorted by group

.

data test2;
  set test;
  by group;
  datediff=dif(date);
run;

      

It doesn't work any differently (assuming you had the set assignment originally, anyway). But now you have new possibilities.

First, if you can use dif

, I recommend the method for that retain

. You can more easily see what it does and avoid some common pitfalls: in particular, lag

and dif

not actually being compared to the previous entry - they create a queue and are compared to what can lead to complications when using conditional statements.



data test2;
  set test;
  by group;
  retain last_date;
  if first.group then last_date=0;
  datediff = date - last_date;
  output;
  last_date = date;
run;

      

This does the same thing as before - it compares the previous value to the current value, but makes it easier to see, and we add an option to reset the variable last_date

when first.group

true - that means we are in the first line of the new group value. I won't discard any of these intermediate variables, but in production code, you can and should. retain

means the value will be stored across lines (instead of resetting every time you get a new line).

Now that you have one variable keeping track of the previous string value date

, it's pretty easy to hope to see how we can do this for the first and last difference.

data test2;
  set test;
  by group;
  retain last_date orig_date;
  if first.group then do;
    last_date=0;
    orig_date=date;  **new;
  end;
  datediff = date - last_date; 
  if last.group then group_datediff = date-orig_date;  **new;
  output;
  last_date = date;
run;

      

Now we have done the same as before, but we restart orig_date every time we see first.group

and calculate the group_datediff when we click last.group

.

+4


source


Here is a method using the lag

new feature ifn

as well ( ifn

included in SAS 9.2 or later). Be careful with the delay function as it can sometimes lead to unexpected results. See this article for details .



*Data must be sorted to use BY groups;
proc sort data=have; by group date;run;

data want;
    set have;
    by group;
    dateDiff = ifn(first.group, . , dif(date));

    retain firstDate;
    if first.group then firstDate = date;
    if last.group then dateDiff_all = date - firstDate;
run;

      

+3


source


Here is a solution using proc sql and a group statement using the min and max pivot functions.

This will give you the difference in each group for the first and last dates.

Technically this will give you the difference in the smallest and largest dates, but you are date in chronological order, so this will work for that data.

proc sql;
    create table want as select
    group,
    max(date) - min(date) as datediff
    from have 
    group by group;
quit;

      

+2


source







All Articles