Create a grouping value column based on another column

I'm sure this has been asked before, but I don't know what to look for, so I apologize in advance.

Let's say I have the following dataframe:

grades <- data.frame(a = 1:40, b = sample(45:100, 40))

      

Using deplyr, I want to create a new variable that indicates the student's grade based on the following criteria: 90-100 = excellent, 80-90 = very good, etc.

I thought I could use the following to get this result with nestling ifelse () inside mutate ():

grades %>%
mutate(ifelse(b >= 90, "excellent"), 
       ifelse(b >= 80 & b < 90, "very_good"),
       ifelse(b >= 70 & b < 80, "fair"),
       ifelse(b >= 60 & b < 70, "poor", "fail"))

      

It doesn't work as I get the error "no argument is missing, no default"). I thought "no" would be "bad" at the end, but obviously I misunderstand the syntax.

I can get this if I first filter the original data separately and then call ifelse like this:

a <- grades %>%
     filter( b >= 90) %>%
     mutate(final = ifelse(b >= 90, "excellent"))

      

and rbind a, b, c, etc. Obviously this is not how I want to do it, but I wanted to understand the syntax of ifelse (). I assume the latter works because there are no values ​​that do not fill the criteria, but I still cannot figure out how to get it to work when there is more than one ifelse.

+3


source to share


3 answers


Define vectors with levels and labels and then use cut

in column b

:

levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
    a   b         x
1   1  66      Poor
2   2  78      fair
3   3  97 excellent
4   4  46      Fail
5   5  89 very good
6   6  57      Fail
7   7  80      fair
8   8  98 excellent
9   9 100 excellent
10 10  93 excellent
11 11  59      Fail
12 12  51      Fail
13 13  69      Poor
14 14  75      fair
15 15  72      fair
16 16  48      Fail
17 17  74      fair
18 18  54      Fail
19 19  62      Poor
20 20  64      Poor
21 21  88 very good
22 22  70      Poor
23 23  85 very good
24 24  58      Fail
25 25  95 excellent
26 26  56      Fail
27 27  65      Poor
28 28  68      Poor
29 29  91 excellent
30 30  76      fair
31 31  82 very good
32 32  55      Fail
33 33  96 excellent
34 34  83 very good
35 35  61      Poor
36 36  60      Fail
37 37  77      fair
38 38  47      Fail
39 39  73      fair
40 40  71      fair

      

Or using data.table:

library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]

      

Or just in the R base:



grades$x <- cut(grades$b, levels, labels)

      

Note

After taking a closer look at your initial approach, I noticed that you need to include right = FALSE

in the call cut

, because, for example, 90 points should be "great", not just "very good". So it is used to determine where the gap should be closed (left or right) and the default is on the right, which is slightly different from the OP's original approach. So in dplyr it would be:

grades %>% mutate(x = cut(b, levels, labels, right = FALSE))

      

and, accordingly, in other versions.

+15


source


Everyone ifelse

should be inside each other. Try the following:



mutate(ifelse(b >= 90, "excellent", 
       ifelse(b >= 80 & b < 90, "very_good",
       ifelse(b >= 70 & b < 80, "fair",
       ifelse(b >= 60 & b < 70, "poor", "fail")))))

      

+4


source


grades$c = grades$b # creating a new column 
#and filling in the grades
grades$c[grades$c >= 90] = "exellent"
grades$c[grades$c <= 90 &  grades$c >= 80] = "very good"
grades$c[grades$c <= 80 &  grades$c >= 70] = "fair"
grades$c[grades$c <= 70 &  grades$c >= 60] = "poor"
grades$c[grades$c <= 60] = "fail"

      

0


source







All Articles