Recursive error in dplyr mutate
Just studying dplyr (and R) and I don't understand why this fails or what is the correct approach for this. I'm looking for a general explanation, not something specific, for this contrived dataset.
Suppose I have 3 file sizes with multipliers and I would like to combine them into one numeric column.
require(dplyr)
m <- data.frame(
K = 1E3,
M = 1E6,
G = 1E9
)
s <- data.frame(
size = 1:3,
mult = c('K', 'M', 'G')
)
Now I want to multiply the size by a factor so that I can try:
mutate(s, total = size * m[[mult]])
#Error in .subset2(x, i, exact = exact) :
# recursive indexing failed at level 2
which is causing the error. I've also tried:
mutate(s, total = size * as.numeric(m[mult]))
#1 1 K 1e+06
#2 2 M 2e+09
#3 3 G 3e+03
what's worse than a mistake (wrong answer)!
I've tried many other permutations but couldn't find an answer.
Thanks in advance!
Edit:
(or this will be another question)
akrun's answer worked great and I thought I figured it out, but if I
rbind(s, c(4, NA))
then update the mutation to
mutate(s, total = size *
ifelse(is.na(mult), 1,
unlist(m[as.character(mult)])
it falls apart again with selected columns "undefined"
The column "mult" is the "factor". Convert it to "character" for subset "m", "unlist" and then multiply it by "size"
mutate(s, new= size*unlist(m[as.character(mult)]))
# size mult new
#1 1 K 1e+03
#2 2 M 2e+06
#3 3 G 3e+09
If we look at how "factor" columns act based on "levels"
m[s$mult]
# M G K
#1 1e+06 1e+09 1000
We get the same order of output using match
between names(m)
andlevels(s$mult)
m[match(names(m), levels(s$mult))]
# M G K
#1 1e+06 1e+09 1000
So this could be the reason why you got a different result
If you don't mind changing the data structure m
, you can use
# change m to a table
m = as.data.frame(t(m))
m$mult = rownames(m)
colnames(m)[which(colnames(m) == "V1")] = "value"
# to avoid indexing
s %>%
inner_join(m) %>%
mutate(total = size*value) %>%
select(size, mult, total)
to save more dplyr
.
EDIT: While it works, you may need to be a little careful about the datatypes in the columns, though