Is there a way to create a Stta _merge indicator variable with R merge ()?

Question

Is there a way to create a Stta _merge indicator variable with R merge ()?

Stata automatically creates a variable named "_merge" to indicate the consistent variables in both datasets after the merge. Is there a way to get a variable like this generated by the R merge () function?

+3

r stata

Aline May 20 '15 at 19:09

source to share

3 answers

Metrics · Answer 1 · 2015-05-20T19:23:10+0000

Possible values _merge

in Stata

: (note merge

can also have values 4 and 5)

              1       master             observation appeared in master only
              2       using              observation appeared in using only
              3       match              observation appeared in both

As R

you can do so by typing the argument like all=TRUE

or all.x=TRUE

orall.y=TRUE

eg,

merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all = TRUE)
 merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.x = TRUE)
 merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.y = TRUE)

Aline · Answer 2 · 2015-05-24T01:37:06+0000

I wrote the following function based on @Metrics answer. It creates a "merge" variable in the result dataset that points to observations, just like Stata does.

stata.merge <- function(x,y, by = intersect(names(x), names(y))){

x[is.na(x)] <- Inf
y[is.na(y)] <- Inf

matched <- merge(x, y, by.x = by, by.y = by, all = TRUE)
matched <- matched[complete.cases(matched),]
matched$merge <- "matched"
master <- merge(x, y, by.x = by, by.y = by, all.x = TRUE)
master <- master[!complete.cases(master),]
master$merge <- "master"
using <- merge(x, y, by.x = by, by.y = by, all.y = TRUE)
using <- using[!complete.cases(using),]
using$merge <- "using"

df <- rbind(matched, master,using)
df[sapply(df, is.infinite)] <- NA
df
}

Test.

df1 <- data.frame(id = letters[c(1:5,8:9)], v1=c(1:5,8:9))
df1

   id v1
1  a  1
2  b  2
3  c  3
4  d  4
5  e  5
6  h  8
7  i  9

df2 <- data.frame(id = letters[1:8], v1=c(1:7,NA))
df2

  id v1
1  a  1
2  b  2
3  c  3
4  d  4
5  e  5
6  f  6
7  g  7
8  h NA

stata.merge(df1,df2, by = "id")

   id v1.x v1.y   merge
1   a    1    1 matched
2   b    2    2 matched
3   c    3    3 matched
4   d    4    4 matched
5   e    5    5 matched
6   h    8   NA matched
7   i    9   NA  master
71  f   NA    6   using
8   g   NA    7   using

rwbuie · Answer 3 · 2017-10-05T09:10:31+0000

Here is (I think) a much simpler and more efficient version of the previous stata.merge function. This assumes there are no variables named "new1" or "new2" in your data frames. If this assumption is incorrect, change the variable names in this function. This function takes 3 variables, the first data frame, the second data frame, and a value to enter the "by =" part of the merge function.

stata.merge <- function(x,y, name){
  x$new1 <- 1
  y$new2 <- 2
  df <- merge(x,y, by = name, all = TRUE)
  df$stat.merge.variable <- rowSums(df[,c("new1", "new2")], na.rm=TRUE)
  df$new1 <- NULL
  df$new2<- NULL
  df
}

Is there a way to create a Stta _merge indicator variable with R merge ()?

More articles: