Recursive error in dplyr mutate
Just studying dplyr (and R) and I don't understand why this fails or what is the correct approach for this. I'm looking for a general explanation, not something specific, for this contrived dataset.
Suppose I have 3 file sizes with multipliers and I would like to combine them into one numeric column.
require(dplyr)
m <- data.frame(
K = 1E3,
M = 1E6,
G = 1E9
)
s <- data.frame(
size = 1:3,
mult = c('K', 'M', 'G')
)
Now I want to multiply the size by a factor so that I can try:
mutate(s, total = size * m[[mult]])
#Error in .subset2(x, i, exact = exact) :
# recursive indexing failed at level 2
which is causing the error. I've also tried:
mutate(s, total = size * as.numeric(m[mult]))
#1 1 K 1e+06
#2 2 M 2e+09
#3 3 G 3e+03
what's worse than a mistake (wrong answer)!
I've tried many other permutations but couldn't find an answer.
Thanks in advance!
Edit:
(or this will be another question)
akrun's answer worked great and I thought I figured it out, but if I
rbind(s, c(4, NA))
then update the mutation to
mutate(s, total = size *
ifelse(is.na(mult), 1,
unlist(m[as.character(mult)])
it falls apart again with selected columns "undefined"
source to share
The column "mult" is the "factor". Convert it to "character" for subset "m", "unlist" and then multiply it by "size"
mutate(s, new= size*unlist(m[as.character(mult)]))
# size mult new
#1 1 K 1e+03
#2 2 M 2e+06
#3 3 G 3e+09
If we look at how "factor" columns act based on "levels"
m[s$mult]
# M G K
#1 1e+06 1e+09 1000
We get the same order of output using match
between names(m)
andlevels(s$mult)
m[match(names(m), levels(s$mult))]
# M G K
#1 1e+06 1e+09 1000
So this could be the reason why you got a different result
source to share
If you don't mind changing the data structure m
, you can use
# change m to a table
m = as.data.frame(t(m))
m$mult = rownames(m)
colnames(m)[which(colnames(m) == "V1")] = "value"
# to avoid indexing
s %>%
inner_join(m) %>%
mutate(total = size*value) %>%
select(size, mult, total)
to save more dplyr
.
EDIT: While it works, you may need to be a little careful about the datatypes in the columns, though
source to share