Coefficient of change values in a row by coefficients of another row

Question

Coefficient of change values in a row by coefficients of another row

Here is an example data frame

df <- data.frame(v1=factor(c("empty","a","empty","c","b")),
                 v2=factor(c("empty","z","z","y","x")))

Now I want to replace the values empty

in v1

if there is a non-empty analog in v2

. In this example, z

in is v2

mapped to a

in v1

in the second line. So the empty

third line should also be a

.

The ending data frame should be:

df.final <- data.frame(v1=factor(c("empty","a","a","c","b")),
                       v2=factor(c("empty","z","z","y","x")))

What is the solution to change this? I tried this with two nested loops, but it takes forever (~ 15 minutes for my dataframe with 25000 rows and several thousand levels of factors).

For various reasons, I want to keep the factor levels and don't want to change the numerical ones.

+3

r dataframe

spore234 23 june 15 at 9:58 am

source to share

2 answers

One parameter changes "blank" strings to "NA" and then uses na.locf

"NA" values with a previous value other than NA to replace them.

 library(zoo)
 is.na(df) <- df=='empty'
 df[] <- lapply(df, na.locf, na.rm=FALSE)

Or as @DavidArenburg suggested, if there are only "character" columns, you can apply na.locf

directly to the dataset, otherwise a subset of the dataset might be required. If the initial columns are a "factorial" class, this is converted to "character" even if the output is "data.frame"

 df[] <- na.locf(df, na.rm=FALSE)

If you want to return as "empty" (it is better to keep the "NA" values)

 df[] <- lapply(df, function(x) {x1 <- na.locf(x, na.rm=FALSE)
              replace(x1, which(is.na(x1)), 'empty') })

+4

akrun 23 june 15 at 10:05

source to share

David Arenburg · Accepted Answer · 2015-06-23T10:24:03+0000

Here's a possible solution data.table

(I'm assuming you have one unique value in v1

for each value in v2

- correct me if I'm wrong). Here I am trying to reduce the problem by only working on values v2

that are not empty

, using negative binary join when assigning by reference using the operator:=

library(data.table)
setkey(setDT(df), v2)
df[!J("empty"), v1 := v1[v1 != "empty"][1L], by = v2]

Edit

More compatible with the real dataset variant would probably be

df[!J("empty"), v1 := replace(v1, v1 == "empty", v1[v1 != "empty"][1L]), by = v2]

Coefficient of change values ​​in a row by coefficients of another row

More articles:

Coefficient of change values in a row by coefficients of another row