SAS: sum all but one value
I work at SAS and I am trying to summarize all observations, leaving them each time. For example, if I have:
Count Name Grade
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
I want to output the value for Sam, which is the sum of all ratings except his own, and the value for Adam, which is the sum of all ratings except his own - and so on.
Any ideas? Thank!
source to share
You can only do this in one proc sql using computed keyword:
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
proc sql;
create table want as
select *, sum(grade) as all_grades, calculated all_grades-grade as minus_grade
from have;
quit;
source to share
proc sql;
create table temp as select
sum(grade) as all_grades
from orig_data;
quit;
proc sql;
create table temp2 as select
a.count,
a.name,
a.grade,
(b.all_grades-a.grade) as sum_other_grades
from orig_data a
left join temp b;
quit;
Haven't tested but the above should work. It creates a new temporary database that has the sum of all grades and merges it to create a new table with the sum of all grades less than the students' current grades like sum_other_grades.
source to share
Here is a one pass solution (it will be about the same as a one pass if the dataset fits into the read buffer). I am actually calculating the average here, not just the sum, as I feel like the more interesting result (and sum, of course, means no division).
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
data want;
retain grademean;
if _n_=1 then do;
do _n_ = 1 to nobs_have;
set have(keep=grade) point=_n_ nobs=nobs_have;
gradesum+grade;
end;
grademean=gradesum/nobs_have;
end;
set have;
grade_noti = ((grademean*nobs_have)-grade)/(nobs_have-1);
run;
Calculate the average, then for each record, subtract the portion that the record contributed to the average. This is a super useful technique for testing statistics when you want to compare a record to the rest of the population and you have a complex combination of classes where you would rather do the mean first. In those cases, you first use PROC MEANS
and then concatenate it, then do that subtraction.
source to share
This solution does each observation on your original dataset and then iterates over the same dataset, summing the score values for any records with different names, so starting with "Sam" we only add the variable oth_g
when we find the names that NOT 'Sam':
data want; set have; oth_g = 0; do i = 1 to n; set have (keep = name grade rename = (name = name_loop grade = grade_loop)) nobs = n point = i; if name ^ = name_loop then oth_g + grade_loop; end; drop grade_loop name_loop in; run;
source to share
This is a small change in @ Reese's answer above.
proc sql;
create table want as
select *,
(select sum(grade) from have) as all_grades,
calculated all_grades - grade as minus_grade
from have;
quit;
I rearranged it in such a way as to prevent the below message from being printed in the log:
NOTE: The query requires remerging summary statistics back with the original data.
If you see the above message, it almost always means that you made a mistake. If you actually wanted to re-display the summary statistics with the original data, you must do so explicitly (for example, I did it by refactoring the @reese query.
Personally, I think the refactored version is also understandable.
source to share