Fill in missing values with back-to-back delay in SAS
Suppose you have a table with username, counter, and count for each counter.
data have;
input user $ counter score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;
Some points are missing in some counters and you want to place the same score as the previous counter. The result will look like this:
A 1 40
A 2 40
A 3 40
A 4 40
A 5 20
A 6 20
B 1 30
B 2 30
C 1 .
C 2 .
C 3 .
I was able to fill in the missing score values forward using a function lag
like below:
data result1a;
set have(keep=user);
by user;
*Look ahead;
merge have have(firstobs=2 keep=score rename=(score=_NextScore));
if first.user then do;
if score= . then score=_NextScore;
end;
else do;
_PrevScore = lag(score);
if score= . then score=_PrevScore;
end;
output;
run;
Then I sorted the table back using descending
funtion on counter
as shown below:
proc sort data = result1a out= result1b;
by user descending counter ;
run;
Then finally I would fill in the missing values in the raaranged table (backwards according to the initial table) using a function lag
as shown below.
I used the function lag
in do-loop
because I wanted to update the previous value at each step (for example, the value 40 will carry over from the first point to the last point in the group completely).
However, I am getting a weird result. All missing values have no real value. Any idea about fixing the last data step?
data result1c;
set result1b;
by user;
if first.user then do;
if score= . then score=_NextScore;
else score = score;
end;
else do;
_PrevScore = lag(score);
if score= . then
score=_PrevScore;
else score = score;
end;
output;
run;
source to share
Don't need to use lag
, use retain
(or equivalent). Here's a dual DoW system solution that does it in one datastep (and, in fact, one reads - it buffers the read, so it's as efficient as one read).
We first loop through the dataset to get the first score, so we can grab it for the initial prev_score value. Then install that and re-loop the lines for that user and output them. There is no actual here retain
as I am doing the loop myself, but it looks like if there was retain prev_score;
and it was a normal data step loop. I actually don't retain
, since I want it to go missing when a new user meets.
data want;
do _n_ = 1 by 1 until (last.user);
set have;
by user;
if missing(first_score) and not missing(score) then
first_score = score;
end;
prev_score = first_score;
do _n_ = 1 by 1 until (last.user);
set have;
by user;
if missing(score) then
score = prev_score;
prev_score = score;
output;
end;
run;
source to share
lag()
is a commonly misunderstood function. The name implies that when you call it, SAS looks back at the previous line and grabs the value, but this is not the case at all.
In fact, it lag<n>()
is a function that creates a "queue" with n values. When you call lag<n>(x)
, it pushes the current x value into that queue and reads the previous value from it (of course, the push happens only once per line). Therefore, if you have a lag<n>()
condition inside, the push only happens when that condition is met.
To fix your problem, you need the lag () function to run on every line and execute after the score has been fixed:
data result1c;
set result1b;
by user;
if first.user then do;
if score= . then score=_NextScore;
else score = score;
end;
else do;
if score= . then
score=_PrevScore;
else score = score;
end;
_PrevScore = lag(score);
output;
run;
EDIT: I have hung up on misuse of lag and have not provided a working alternative. Since you are changing the score, it is a bad idea to use lag. Save here will be:
data result1c;
set result1b;
by user;
retain _PrevScore;
if first.user then do;
if score= . then score=_NextScore;
else score = score;
end;
else do;
if score= . then
score=_PrevScore;
else score = score;
end;
_PrevScore = score;
output;
run;
source to share