Add column using rows of another column

I want to add a column to my framework using a transform function. One of my columns contains character strings as elements. I want to search for some rows and add another column.

UNIT.NO. USAGE..kWh.month.
     A1               863
     A1              1339
     D3              1058
     D1               782
     L1              1339
     L7              1058
     L1               782

      

I want to add another column to classify data category and get the following result:

UNIT.NO. USAGE..kWh.month.   Category
     A1               863       A
     A1              1339       A
     D3              1058       D
     D1               782       D
     L1              1339       L
     L7              1058       L
     L1               782       L

      

I used the following code but it doesn't work.

dataset.1<-transform(
  dataset.1,
  Category=
    if(grepl("A",dataset.1$UNIT.NO.)==T){
      "A"
    } else 
      if(grepl("D",dataset.1$UNIT.NO.)==T){
        "D"
      } else 
        if(grepl("L",dataset.1$UNIT.NO.)==T){
          "L"
        }else{
              "Other"
            }
)

      

Warning in R: In if ( grepl("A", dataset.1$UNIT.NO.) == T

) {: the condition has length> 1 and only the first element will be used

Hence, all of my category values ​​are now A, and the different characters are not replaced according to their block number. What is the best way to add such a column.

I need these categories to perform nonparametric analysis. Thank you in advance.

+3


source to share


3 answers


One option is simple

indx <- gsub("[0-9]", "" , df1$UNIT.NO.)
df1$Category <- "Other"
df1[indx %in% c("A","D","L"), "Category"] <- indx

      



Another (more efficient)

library(data.table)
setDT(df1)[, Category := "Other"]
df1[indx %in% c("A","D","L"), Category := indx]

      

+3


source


Use substr

to get the first letter:

dataset.1$Category <- ifelse(substr(dataset.1$"UNIT.NO.",1,1) %in% c("A","D","L"), 
                             substr(dataset.1$"UNIT.NO.",1,1),
                             "other")

      



If you don't need "other", just use:

dataset.1$Category <- substr(dataset.1$"UNIT.NO.",1,1)

      

+2


source


There are many ways:

#dummy data
dataset.1 <- read.table(text="
UNIT.NO. USAGE..kWh.month.
A1               863
A1              1339
D3              1058
D1               782
L1              1339
L7              1058
L1               782
XX1               782", header=TRUE)

#using your approach - nested ifelse
dataset.1$CategoryIfElse <-
  ifelse(grepl("A",dataset.1$UNIT.NO.)==T,"A",
         ifelse(grepl("D",dataset.1$UNIT.NO.)==T,"D",
                ifelse(grepl("L",dataset.1$UNIT.NO.)==T,"L","Other")))

#using substr
dataset.1$CategorySusbstr <-
  substr(dataset.1$"UNIT.NO.",1,1)
dataset.1$CategorySusbstr <- 
  factor(dataset.1$CategorySusbstr,levels=c("A","D","L","Other"))
dataset.1$CategorySusbstr[ is.na(dataset.1$CategorySusbstr)] <- "Other"

#result
dataset.1

# UNIT.NO. USAGE..kWh.month. CategoryIfElse CategorySusbstr
# 1       A1               863              A               A
# 2       A1              1339              A               A
# 3       D3              1058              D               D
# 4       D1               782              D               D
# 5       L1              1339              L               L
# 6       L7              1058              L               L
# 7       L1               782              L               L
# 8      XX1               782          Other           Other

      

+1


source







All Articles