Merge and Replace Based on Multiple Unequal Columns

Question

Merge and Replace Based on Multiple Unequal Columns

I have two data frames. The first contains the original state of the image with all the data available for restoring the image from scratch (the entire set of coordinates and their color values).

Then I have a second dataframe. This size is smaller and contains only the difference (change) data between the updated state and the original state. Similar to video encoding with keyframes.

Unfortunately I don't have a unique id column to help me match them. I have a column x, and I have a y-column, which together can make up a unique ID.

My question is this: What is an elegant way to merge these two datasets, replacing the values in the original dataframe with values in the "delimited" dataframe whose x and y coordinates are the same.

Here are some sample data to illustrate:

original <- data.frame(x = 1:10, y = 23:32, value = 120:129)

    x  y value
1   1 23   120
2   2 24   121
3   3 25   122
4   4 26   123
5   5 27   124
6   6 28   125
7   7 29   126
8   8 30   127
9   9 31   128
10 10 32   129

And a date frame with updated differences:

update <- data.frame(x = c(1:4, 8), y = c(2, 24, 17, 23, 30), value = 50:54)

  x  y value
1 1  2    50
2 2 24    51
3 3 17    52
4 4 23    53
5 8 30    54

The desired final output should contain all lines in the original data frame . However, lines in the original where the x and y coordinates match the corresponding coordinates in the update should replace their value with the values in the update dataframe. Here's the desired output:

original_updated <- data.frame(x = 1:10, y = 23:32, 
                               value = c(120, 51, 122:126, 54, 128:129))

    x  y value
1   1 23   120
2   2 24    51
3   3 25   122
4   4 26   123
5   5 27   124
6   6 28   125
7   7 29   126
8   8 30    54
9   9 31   128
10 10 32   129

I've been trying to come up with an indexed vectorial solution for some time now, but I can't seem to figure it out. I would normally use% in% if it was just one uniquely identifier column. But these two columns are not unique.

One solution would be to treat them as strings or tuples and concatenate them with one column as a coordinate pair, then use% in%.

But I was curious if there is any solution to this problem related to indexing with boolean vectors. Any suggestions?

+3

merge r data.table dplyr

Lauler 04 Apr 17 at 2:16

source to share

2 answers

The mapping function is used to generate indexes. An argument is required nomatch

to prevent NA on the left data.frame.[<-

. I don't think this is transparent, like a merge followed by a replacement, but I guess it will be faster:

original[  match(update$x, original$x)[
                                       match(update$x, original$x, nomatch=0) == 
                                       match(update$y, original$y,nomatch=0)]   ,
          "value"] <- 
  update[ which( match(update$x, original$x) == match(update$y, original$y)), 
           "value"]

You can see the difference:

> match(update$x, original$x)[
            match(update$x, original$x) == 
                match(update$y, original$y) ]
[1] NA  2 NA  8
> match(update$x, original$x)[
            match(update$x, original$x, nomatch=0) == 
                match(update$y, original$y,nomatch=0)]
[1] 2 8

The "interior" match functions are returned:

> match(update$y, original$y)
[1] NA  2 NA  1  8
> match(update$x, original$x)
[1] 1 2 3 4 8

+1

42- 04 Apr 17 at 4:23

source to share

lebelinoz · Accepted Answer · 2017-04-04T02:50:09+0000

First, the merge is done in such a way as to ensure that all values from the original are present:

merged = merge(original, update, by = c("x","y"), all.x = TRUE)

Then use dplyr

to select update

values where possible and original

value otherwise:

library(dplyr)
middle = mutate(merged, value = ifelse(is.na(value.y), value.x, value.y))
final = select(middle, x, y, value)

Merge and Replace Based on Multiple Unequal Columns

More articles: