Coefficient of change values ​​in a row by coefficients of another row

Here is an example data frame

df <- data.frame(v1=factor(c("empty","a","empty","c","b")),
                 v2=factor(c("empty","z","z","y","x")))

      

Now I want to replace the values empty

in v1

if there is a non-empty analog in v2

. In this example, z

in is v2

mapped to a

in v1

in the second line. So the empty

third line should also be a

.

The ending data frame should be:

df.final <- data.frame(v1=factor(c("empty","a","a","c","b")),
                       v2=factor(c("empty","z","z","y","x")))

      

What is the solution to change this? I tried this with two nested loops, but it takes forever (~ 15 minutes for my dataframe with 25000 rows and several thousand levels of factors).

For various reasons, I want to keep the factor levels and don't want to change the numerical ones.

+3


source to share


2 answers


Here's a possible solution data.table

(I'm assuming you have one unique value in v1

for each value in v2

- correct me if I'm wrong). Here I am trying to reduce the problem by only working on values v2

that are not empty

, using negative binary join when assigning by reference using the operator:=

library(data.table)
setkey(setDT(df), v2)
df[!J("empty"), v1 := v1[v1 != "empty"][1L], by = v2]

      




Edit

More compatible with the real dataset variant would probably be

df[!J("empty"), v1 := replace(v1, v1 == "empty", v1[v1 != "empty"][1L]), by = v2]

      

+3


source


One parameter changes "blank" strings to "NA" and then uses na.locf

"NA" values ​​with a previous value other than NA to replace them.

 library(zoo)
 is.na(df) <- df=='empty'
 df[] <- lapply(df, na.locf, na.rm=FALSE)

      

Or as @DavidArenburg suggested, if there are only "character" columns, you can apply na.locf

directly to the dataset, otherwise a subset of the dataset might be required. If the initial columns are a "factorial" class, this is converted to "character" even if the output is "data.frame"



 df[] <- na.locf(df, na.rm=FALSE)

      

If you want to return as "empty" (it is better to keep the "NA" values)

 df[] <- lapply(df, function(x) {x1 <- na.locf(x, na.rm=FALSE)
              replace(x1, which(is.na(x1)), 'empty') })

      

+4


source







All Articles