Is there a way to create a Stta _merge indicator variable with R merge ()?

Stata automatically creates a variable named "_merge" to indicate the consistent variables in both datasets after the merge. Is there a way to get a variable like this generated by the R merge () function?

+3
r stata


source to share


3 answers


Possible values _merge

in Stata

: (note merge

can also have values ​​4 and 5)

              1       master             observation appeared in master only
              2       using              observation appeared in using only
              3       match              observation appeared in both

      

As R

you can do so by typing the argument like all=TRUE

or all.x=TRUE

orall.y=TRUE



eg,

merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all = TRUE)
 merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.x = TRUE)
 merge(x, y, by = intersect(names(x), names(y)),by.x = by, by.y = by, all.y = TRUE)

      

+4


source to share


I wrote the following function based on @Metrics answer. It creates a "merge" variable in the result dataset that points to observations, just like Stata does.

stata.merge <- function(x,y, by = intersect(names(x), names(y))){

x[is.na(x)] <- Inf
y[is.na(y)] <- Inf

matched <- merge(x, y, by.x = by, by.y = by, all = TRUE)
matched <- matched[complete.cases(matched),]
matched$merge <- "matched"
master <- merge(x, y, by.x = by, by.y = by, all.x = TRUE)
master <- master[!complete.cases(master),]
master$merge <- "master"
using <- merge(x, y, by.x = by, by.y = by, all.y = TRUE)
using <- using[!complete.cases(using),]
using$merge <- "using"

df <- rbind(matched, master,using)
df[sapply(df, is.infinite)] <- NA
df
}

      



Test.

df1 <- data.frame(id = letters[c(1:5,8:9)], v1=c(1:5,8:9))
df1

   id v1
1  a  1
2  b  2
3  c  3
4  d  4
5  e  5
6  h  8
7  i  9

df2 <- data.frame(id = letters[1:8], v1=c(1:7,NA))
df2

  id v1
1  a  1
2  b  2
3  c  3
4  d  4
5  e  5
6  f  6
7  g  7
8  h NA

stata.merge(df1,df2, by = "id")

   id v1.x v1.y   merge
1   a    1    1 matched
2   b    2    2 matched
3   c    3    3 matched
4   d    4    4 matched
5   e    5    5 matched
6   h    8   NA matched
7   i    9   NA  master
71  f   NA    6   using
8   g   NA    7   using

      

0


source to share


Here is (I think) a much simpler and more efficient version of the previous stata.merge function. This assumes there are no variables named "new1" or "new2" in your data frames. If this assumption is incorrect, change the variable names in this function. This function takes 3 variables, the first data frame, the second data frame, and a value to enter the "by =" part of the merge function.

stata.merge <- function(x,y, name){
  x$new1 <- 1
  y$new2 <- 2
  df <- merge(x,y, by = name, all = TRUE)
  df$stat.merge.variable <- rowSums(df[,c("new1", "new2")], na.rm=TRUE)
  df$new1 <- NULL
  df$new2<- NULL
  df
}

      

0


source to share







All Articles
Loading...
X
Show
Funny
Dev
Pics