Create a grouping value column based on another column
I'm sure this has been asked before, but I don't know what to look for, so I apologize in advance.
Let's say I have the following dataframe:
grades <- data.frame(a = 1:40, b = sample(45:100, 40))
Using deplyr, I want to create a new variable that indicates the student's grade based on the following criteria: 90-100 = excellent, 80-90 = very good, etc.
I thought I could use the following to get this result with nestling ifelse () inside mutate ():
grades %>%
mutate(ifelse(b >= 90, "excellent"),
ifelse(b >= 80 & b < 90, "very_good"),
ifelse(b >= 70 & b < 80, "fair"),
ifelse(b >= 60 & b < 70, "poor", "fail"))
It doesn't work as I get the error "no argument is missing, no default"). I thought "no" would be "bad" at the end, but obviously I misunderstand the syntax.
I can get this if I first filter the original data separately and then call ifelse like this:
a <- grades %>%
filter( b >= 90) %>%
mutate(final = ifelse(b >= 90, "excellent"))
and rbind a, b, c, etc. Obviously this is not how I want to do it, but I wanted to understand the syntax of ifelse (). I assume the latter works because there are no values ββthat do not fill the criteria, but I still cannot figure out how to get it to work when there is more than one ifelse.
source to share
Define vectors with levels and labels and then use cut
in column b
:
levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
a b x
1 1 66 Poor
2 2 78 fair
3 3 97 excellent
4 4 46 Fail
5 5 89 very good
6 6 57 Fail
7 7 80 fair
8 8 98 excellent
9 9 100 excellent
10 10 93 excellent
11 11 59 Fail
12 12 51 Fail
13 13 69 Poor
14 14 75 fair
15 15 72 fair
16 16 48 Fail
17 17 74 fair
18 18 54 Fail
19 19 62 Poor
20 20 64 Poor
21 21 88 very good
22 22 70 Poor
23 23 85 very good
24 24 58 Fail
25 25 95 excellent
26 26 56 Fail
27 27 65 Poor
28 28 68 Poor
29 29 91 excellent
30 30 76 fair
31 31 82 very good
32 32 55 Fail
33 33 96 excellent
34 34 83 very good
35 35 61 Poor
36 36 60 Fail
37 37 77 fair
38 38 47 Fail
39 39 73 fair
40 40 71 fair
Or using data.table:
library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]
Or just in the R base:
grades$x <- cut(grades$b, levels, labels)
Note
After taking a closer look at your initial approach, I noticed that you need to include right = FALSE
in the call cut
, because, for example, 90 points should be "great", not just "very good". So it is used to determine where the gap should be closed (left or right) and the default is on the right, which is slightly different from the OP's original approach. So in dplyr it would be:
grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
and, accordingly, in other versions.
source to share
grades$c = grades$b # creating a new column
#and filling in the grades
grades$c[grades$c >= 90] = "exellent"
grades$c[grades$c <= 90 & grades$c >= 80] = "very good"
grades$c[grades$c <= 80 & grades$c >= 70] = "fair"
grades$c[grades$c <= 70 & grades$c >= 60] = "poor"
grades$c[grades$c <= 60] = "fail"
source to share