SAS: sum all but one value

I work at SAS and I am trying to summarize all observations, leaving them each time. For example, if I have:

Count    Name      Grade
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70

      

I want to output the value for Sam, which is the sum of all ratings except his own, and the value for Adam, which is the sum of all ratings except his own - and so on.

Any ideas? Thank!

+3


source to share


5 answers


You can only do this in one proc sql using computed keyword:



data have;
input Count    Name  $    Grade;
datalines;
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70
;;;;
run;

proc sql;
    create table want as
    select *, sum(grade) as all_grades, calculated all_grades-grade as minus_grade
    from have;
quit;

      

+4


source


proc sql;
create table temp as select
sum(grade) as all_grades
from orig_data;
quit;

proc sql;
create table temp2 as select
a.count,
a.name,
a.grade,
(b.all_grades-a.grade) as sum_other_grades
from orig_data a
left join temp b;
quit;

      



Haven't tested but the above should work. It creates a new temporary database that has the sum of all grades and merges it to create a new table with the sum of all grades less than the students' current grades like sum_other_grades.

+3


source


Here is a one pass solution (it will be about the same as a one pass if the dataset fits into the read buffer). I am actually calculating the average here, not just the sum, as I feel like the more interesting result (and sum, of course, means no division).

data have;
input Count    Name  $    Grade;
datalines;
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70
;;;;
run;

data want;
  retain grademean;
  if _n_=1 then do;
      do _n_ = 1 to nobs_have;
        set have(keep=grade) point=_n_ nobs=nobs_have;
        gradesum+grade;
      end;
      grademean=gradesum/nobs_have;
  end;
  set have;
  grade_noti = ((grademean*nobs_have)-grade)/(nobs_have-1);
run;

      

Calculate the average, then for each record, subtract the portion that the record contributed to the average. This is a super useful technique for testing statistics when you want to compare a record to the rest of the population and you have a complex combination of classes where you would rather do the mean first. In those cases, you first use PROC MEANS

and then concatenate it, then do that subtraction.

+3


source


This solution does each observation on your original dataset and then iterates over the same dataset, summing the score values ​​for any records with different names, so starting with "Sam" we only add the variable oth_g

when we find the names that NOT 'Sam':

data want;
  set have;
  oth_g = 0;
  do i = 1 to n;
    set have 
      (keep = name grade rename = (name = name_loop grade = grade_loop)) 
      nobs = n point = i;
    if name ^ = name_loop then oth_g + grade_loop;
  end;
  drop grade_loop name_loop in;
run;
+1


source


This is a small change in @ Reese's answer above.

proc sql;
    create table want as
    select *,
           (select sum(grade) from have) as all_grades,
           calculated all_grades - grade as minus_grade
    from have;
quit;

      

I rearranged it in such a way as to prevent the below message from being printed in the log:

NOTE: The query requires remerging summary statistics back with the original data.

      

If you see the above message, it almost always means that you made a mistake. If you actually wanted to re-display the summary statistics with the original data, you must do so explicitly (for example, I did it by refactoring the @reese query.

Personally, I think the refactored version is also understandable.

0


source







All Articles