Merge and Replace Based on Multiple Unequal Columns

I have two data frames. The first contains the original state of the image with all the data available for restoring the image from scratch (the entire set of coordinates and their color values).

Then I have a second dataframe. This size is smaller and contains only the difference (change) data between the updated state and the original state. Similar to video encoding with keyframes.

Unfortunately I don't have a unique id column to help me match them. I have a column x, and I have a y-column, which together can make up a unique ID.

My question is this: What is an elegant way to merge these two datasets, replacing the values ​​in the original dataframe with values ​​in the "delimited" dataframe whose x and y coordinates are the same.

Here are some sample data to illustrate:

original <- data.frame(x = 1:10, y = 23:32, value = 120:129)

    x  y value
1   1 23   120
2   2 24   121
3   3 25   122
4   4 26   123
5   5 27   124
6   6 28   125
7   7 29   126
8   8 30   127
9   9 31   128
10 10 32   129

      

And a date frame with updated differences:

update <- data.frame(x = c(1:4, 8), y = c(2, 24, 17, 23, 30), value = 50:54)

  x  y value
1 1  2    50
2 2 24    51
3 3 17    52
4 4 23    53
5 8 30    54

      

The desired final output should contain all lines in the original data frame . However, lines in the original where the x and y coordinates match the corresponding coordinates in the update should replace their value with the values ​​in the update dataframe. Here's the desired output:

original_updated <- data.frame(x = 1:10, y = 23:32, 
                               value = c(120, 51, 122:126, 54, 128:129))

    x  y value
1   1 23   120
2   2 24    51
3   3 25   122
4   4 26   123
5   5 27   124
6   6 28   125
7   7 29   126
8   8 30    54
9   9 31   128
10 10 32   129

      

I've been trying to come up with an indexed vectorial solution for some time now, but I can't seem to figure it out. I would normally use% in% if it was just one uniquely identifier column. But these two columns are not unique.

One solution would be to treat them as strings or tuples and concatenate them with one column as a coordinate pair, then use% in%.

But I was curious if there is any solution to this problem related to indexing with boolean vectors. Any suggestions?

+3


source to share


2 answers


First, the merge is done in such a way as to ensure that all values ​​from the original are present:

merged = merge(original, update, by = c("x","y"), all.x = TRUE)

      



Then use dplyr

to select update

values ​​where possible and original

value otherwise:

library(dplyr)
middle = mutate(merged, value = ifelse(is.na(value.y), value.x, value.y))
final = select(middle, x, y, value)

      

+3


source


The mapping function is used to generate indexes. An argument is required nomatch

to prevent NA on the left data.frame.[<-

. I don't think this is transparent, like a merge followed by a replacement, but I guess it will be faster:

original[  match(update$x, original$x)[
                                       match(update$x, original$x, nomatch=0) == 
                                       match(update$y, original$y,nomatch=0)]   ,
          "value"] <- 
  update[ which( match(update$x, original$x) == match(update$y, original$y)), 
           "value"]

      

You can see the difference:



> match(update$x, original$x)[
            match(update$x, original$x) == 
                match(update$y, original$y) ]
[1] NA  2 NA  8
> match(update$x, original$x)[
            match(update$x, original$x, nomatch=0) == 
                match(update$y, original$y,nomatch=0)]
[1] 2 8

      

The "interior" match functions are returned:

> match(update$y, original$y)
[1] NA  2 NA  1  8
> match(update$x, original$x)
[1] 1 2 3 4 8

      

+1


source







All Articles