Coefficient of change values ββin a row by coefficients of another row
Here is an example data frame
df <- data.frame(v1=factor(c("empty","a","empty","c","b")),
v2=factor(c("empty","z","z","y","x")))
Now I want to replace the values empty
in v1
if there is a non-empty analog in v2
. In this example, z
in is v2
mapped to a
in v1
in the second line. So the empty
third line should also be a
.
The ending data frame should be:
df.final <- data.frame(v1=factor(c("empty","a","a","c","b")),
v2=factor(c("empty","z","z","y","x")))
What is the solution to change this? I tried this with two nested loops, but it takes forever (~ 15 minutes for my dataframe with 25000 rows and several thousand levels of factors).
For various reasons, I want to keep the factor levels and don't want to change the numerical ones.
source to share
Here's a possible solution data.table
(I'm assuming you have one unique value in v1
for each value in v2
- correct me if I'm wrong). Here I am trying to reduce the problem by only working on values v2
that are not empty
, using negative binary join when assigning by reference using the operator:=
library(data.table)
setkey(setDT(df), v2)
df[!J("empty"), v1 := v1[v1 != "empty"][1L], by = v2]
Edit
More compatible with the real dataset variant would probably be
df[!J("empty"), v1 := replace(v1, v1 == "empty", v1[v1 != "empty"][1L]), by = v2]
source to share
One parameter changes "blank" strings to "NA" and then uses na.locf
"NA" values ββwith a previous value other than NA to replace them.
library(zoo)
is.na(df) <- df=='empty'
df[] <- lapply(df, na.locf, na.rm=FALSE)
Or as @DavidArenburg suggested, if there are only "character" columns, you can apply na.locf
directly to the dataset, otherwise a subset of the dataset might be required. If the initial columns are a "factorial" class, this is converted to "character" even if the output is "data.frame"
df[] <- na.locf(df, na.rm=FALSE)
If you want to return as "empty" (it is better to keep the "NA" values)
df[] <- lapply(df, function(x) {x1 <- na.locf(x, na.rm=FALSE)
replace(x1, which(is.na(x1)), 'empty') })
source to share