Replacing strings with the dplyr lookup table

I am trying to create a lookup table in R to get my data in the same format as the company I work for.

He looks at the different education categories that I want to combine with dplyr.

library(dplyr)

# Create data
education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")

    data <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))

    tbl_df(data)

    # Create lookup table
    lut <- c("Mechanichal Engineering" = "Engineering",
             "Electric Engineering" = "Engineering",
             "Political Science" = "Social Science",
             "Economics" = "Social Science")

    # Assign lookup table
    data$X1 <- lut[data$X1]

      

But in my release, my old values ​​are replaced with the wrong ones, i.e. not the ones I created in the lookup table. Rather, it seems like the lookup table is randomly assigned.

+3


source to share


2 answers


education <- c("Mechanichal Engineering","Electric Engineering","Political Science","Economics")
lut <- list("Mechanichal Engineering" = "Engineering",
            "Electric Engineering" = "Engineering",
            "Political Science" = "Social Science",
            "Economics" = "Social Science")
lut2<-melt(lut)
data1 <- data.frame(X1=replicate(1,sample(education,1000,rep=TRUE)))
data1$new <- lut2[match(data1$X1,lut2$L1),'value']
head(data1)


=======================  ==============
X1                       new           
=======================  ==============
Political Science        Social Science
Political Science        Social Science
Mechanichal Engineering  Engineering   
Mechanichal Engineering  Engineering   
Political Science        Social Science
Political Science        Social Science
=======================  ==============

      



+2


source


I found that the best way to do this is to use recode()

from packagecar

# Observe that dplyr also has a recode function, so require car after dplyr
    require(dplyr)
    require(car)

      

The data represent four training categories that are sampled from.

    education <- c("Mechanichal Engineering",
                   "Electric Engineering","Political Science","Economics")

data <- data.frame(ID = c(1:1000), X1 = replicate(1,sample(education,1000,rep=TRUE)))

      

Using recode()

for data I will recode the categories



lut <- data.frame(ID = c(1:1000), X2 = recode(data$X1, '"Economics" = "Social Science";
                         "Electric Engineering" = "Engineering";
                          "Political Science" = "Social Science";
                          "Mechanichal Engineering" = "Engineering"'))

      

To make sure it is done correctly, attach the original data and the transcoded data

data <- full_join(data, lut, by = "ID")

head(data)

   ID                     X1             X2
1  1       Political Science Social Science
2  2               Economics Social Science
3  3    Electric Engineering    Engineering
4  4       Political Science Social Science
5  5               Economics Social Science
6  6 Mechanichal Engineering    Engineering

      

With recode, you don't need to sort the data before re-encoding it.

0


source







All Articles