Fill in missing values ​​with back-to-back delay in SAS

Suppose you have a table with username, counter, and count for each counter.

data have;
input user $  counter  score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;

      

Some points are missing in some counters and you want to place the same score as the previous counter. The result will look like this:

A 1 40
A 2 40
A 3 40
A 4 40
A 5 20
A 6 20
B 1 30
B 2 30
C 1 .
C 2 .
C 3 .

      

I was able to fill in the missing score values ​​forward using a function lag

like below:

data result1a;
  set have(keep=user);
  by user;

  *Look ahead;
    merge have have(firstobs=2 keep=score rename=(score=_NextScore));

    if first.user then do;
        if score= . then score=_NextScore;
        end;
    else do;
        _PrevScore = lag(score);
        if score= . then score=_PrevScore;
    end;
    output;
run;

      

Then I sorted the table back using descending

funtion on counter

as shown below:

proc sort data = result1a out= result1b; 
by user descending counter ;
run;

      

Then finally I would fill in the missing values ​​in the raaranged table (backwards according to the initial table) using a function lag

as shown below.

I used the function lag

in do-loop

because I wanted to update the previous value at each step (for example, the value 40 will carry over from the first point to the last point in the group completely).

However, I am getting a weird result. All missing values ​​have no real value. Any idea about fixing the last data step?

data result1c;
set result1b;
by user;

   if first.user then do;
        if score= . then score=_NextScore;
        else score = score;

        end;
   else do;
        _PrevScore = lag(score);
        if score= . then 
        score=_PrevScore;
        else score = score;
   end;
   output;
run;

      

+3


source to share


2 answers


Don't need to use lag

, use retain

(or equivalent). Here's a dual DoW system solution that does it in one datastep (and, in fact, one reads - it buffers the read, so it's as efficient as one read).

We first loop through the dataset to get the first score, so we can grab it for the initial prev_score value. Then install that and re-loop the lines for that user and output them. There is no actual here retain

as I am doing the loop myself, but it looks like if there was retain prev_score;

and it was a normal data step loop. I actually don't retain

, since I want it to go missing when a new user meets.



data want;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(first_score) and not missing(score) then 
      first_score = score;

  end;
  prev_score = first_score;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(score) then
      score = prev_score;
    prev_score = score;
    output;
  end;
run;

      

+2


source


lag()

is a commonly misunderstood function. The name implies that when you call it, SAS looks back at the previous line and grabs the value, but this is not the case at all.

In fact, it lag<n>()

is a function that creates a "queue" with n values. When you call lag<n>(x)

, it pushes the current x value into that queue and reads the previous value from it (of course, the push happens only once per line). Therefore, if you have a lag<n>()

condition inside, the push only happens when that condition is met.

To fix your problem, you need the lag () function to run on every line and execute after the score has been fixed:



data result1c;
set result1b;
by user;
if first.user then do;
    if score= . then score=_NextScore;
    else score = score;
end;
else do;
    if score= . then 
    score=_PrevScore;
    else score = score;
end;
_PrevScore = lag(score);
output;
run;

      

EDIT: I have hung up on misuse of lag and have not provided a working alternative. Since you are changing the score, it is a bad idea to use lag. Save here will be:

data result1c;
set result1b;
by user;
retain _PrevScore;
if first.user then do;
    if score= . then score=_NextScore;
    else score = score;
end;
else do;
    if score= . then 
    score=_PrevScore;
    else score = score;
end;
_PrevScore = score;
output;
run;

      

+1


source







All Articles