R: Calculate the average of a variable over the unique values of another variable in a data frame?
I'm new to R. I have a dataframe that looks like this:
Pupil ID State GPA
1 FL 3.9
2 TX 3.2
3 NY 2.2
4 AK 3.0
5 CO 2.4
... etc. I would like to create a new dataframe that looks like this:
State Mean GPA Number of pupils
AL 2.91 23
AK 3.23 24
etc. In other words, I would like to find the unique values for the state and calculate the average GPA for each one and the number of students for each one.
Is this possible in R? I know what I can do table(data$State)
to get the unique states and student count, but I don't know how to calculate the average for the unique state values.
source to share
One of so many ways to do this:
x <- read.table(header=T, text="Pupil.ID State GPA
1 FL 3.9
2 TX 3.2
3 NY 2.2
4 AK 3.0
5 CO 2.4")
aggregate(GPA~State, data=x, FUN=function(x) c(mean=mean(x), count=length(x)))
## State GPA.mean GPA.count
## 1 AK 3.0 1.0
## 2 CO 2.4 1.0
## 3 FL 3.9 1.0
## 4 NY 2.2 1.0
## 5 TX 3.2 1.0
source to share
The best way to do this is to use it group_by()
in conjunction with the summarise()
dplyr package. If df is your dataframe,
df %>%
group_by(State) %>%
summarise(mean_GPA = mean(GPA),
number_of_pupils = n())
will give you the GPA for each unique state, as well as the student count (row count).
source to share