Formation of equations from factors in R

I am new to R and I am trying to create a new column that is one column minus another column. For example:

price <- c("$10.00", "$7.15", "$8.75", "12.00", "9.20")
quantity <- c(5, 6, 7, 8, 9)
price <- as.factor(price)
quantity <- as.factor(quantity)
df <- data.frame(price, quantity)

      

In my actual dataset, all columns are imported as factors. When I try to create a new column, I get this:

diff <- price - quantity
In Ops.factor(price, quantity): - not meaningful for factors

      

I tried to force the data to be numeric using as.numeric (df), as.numeric (levels (df)), as.numeric (levels (df)) [df] and setting strAsAsFactors to false, but the data is converted to neural network. Data.matrix changes values. Is there any other way to make the above equation work? Thank!

+3


source to share


3 answers


Try:

 as.numeric(gsub("^\\$","", price))-as.numeric(as.character(quantity))
 #[1] 5.00 1.15 1.75 4.00 0.20

      



Or from df

 df$diff <- Reduce(`-`,lapply(df, function(x) as.numeric(gsub("^\\$","",x))))
 df$diff
 #[1] 5.00 1.15 1.75 4.00 0.20

      

0


source


If you are stuck with factor columns, you can add a new column diff

with within()

and some coercion like

> within(df, {
      diff <- as.numeric(gsub("[$]", "", price)) - 
                  as.numeric(as.character(quantity))
  })
#    price quantity diff
# 1 $10.00        5 5.00
# 2  $7.15        6 1.15
# 3  $8.75        7 1.75
# 4  12.00        8 4.00
# 5   9.20        9 0.20

      

You can also think about returning and re-reading data in R. It's simple and it will make things easier. This is how you could do it and get the desired result this way.

Create a data file: you don't need this as you can just read the original file again.



> write.table(df, "df.txt") 

      

Read the data into R, remove the sign, $

and calculate the difference:

> df2 <- read.table("df.txt", stringsAsFactors = FALSE)
> df2$price <- as.numeric(gsub("[$]", "", df2$price))
> with(df2, { price - quantity })
# [1] 5.00 1.15 1.75 4.00 0.20

      

0


source


You should avoid columns "and" $ in price and avoid converting them to factors if you want to perform mathematical operations on them:

price <- c(10.00, 7.15, 8.75, 12.00, 9.20)
quantity <- c(5, 6, 7, 8, 9)
df <- data.frame(price, quantity)

df$diff <- price - quantity

df
  price quantity diff
1 10.00        5 5.00
2  7.15        6 1.15
3  8.75        7 1.75
4 12.00        8 4.00
5  9.20        9 0.20

      

0


source







All Articles