Inner join with conditions in R

I want to do an inner join with the condition that it should give me a subtraction from two columns.

df1 = data.frame(Term = c("T1","T2","T3"), Sec = c("s1","s2","s3"), Value =c(10,30,30))

df2 = data.frame(Term = c("T1","T2","T3"), Sec = c("s1","s3","s2"), Value =c(40,20,10)

 df1
 Term Sec Value
  T1  s1    10
  T2  s2    30
  T3  s3    30

  df2
  Term  Sec Value
  T1  s1    40
  T2  s3    20
  T3  s2    10

      

As a result, I want

  Term  Sec Value
   T1   s1   30
   T2   s2   20
   T3   s3   10

      

Basically I am joining two tables and the column value I am taking

Value=  abs(df1$Value - df2$Value)

      

I've struggled, but couldn't find any way to do this conditional merge in the R base. Probably if this is not possible with the R base, dplyr should do it using inner_join (), but I don't know much of this package very well.

So any suggestion with base R and / or dplyr would be appreciated

edit

I have included my raw data as requested. My details are here

https://jsfiddle.net/6z6smk80/1/

DF1 is the first table and the second is DF2. DF2 starts at line 168.

Everything is logical, I want to join these two tables, which are 160 rows long. I want to join the ID and accept the Value column from both tables. The resulting dataset must have the same number of rows, which is 160 with an optional diff column value

+3


source to share


3 answers


Here is a "base R" solution using a function merge()

on a column Term

separated by your source frames df1

and df2

:



df_merged <- merge(df1, df2, by="Sec")
df_merged$Value <- abs(df_merged$Value.x - df_merged$Value.y)
df_merged <- df_merged[, c("Sec", "Term.x", "Value")]
names(df_merged)[2] <- "Term"

> df_merged
  Sec Term Value
1  s1   T1    30
2  s2   T2    20
3  s3   T3    10

      

+3


source


Using a data.table

binary join, you can modify columns when joining. nomatch = 0L

makes sure you are doing an inner join



library(data.table)
setkey(setDT(df2), Sec)
setkey(setDT(df1), Sec)[df2, .(Term, Sec, Value = abs(Value - i.Value)), nomatch = 0L]
#    Term Sec Value
# 1:   T1  s1    30
# 2:   T2  s2    20
# 3:   T3  s3    10

      

+4


source


Since this is dplyr's question, here is dplyr's solution:

First use inner_join

and then transmute

to save the variables and calculate and add a new one.

inner_join(df1, df2, by = "Sec") %>% 
  transmute(Term = Term.x, Sec, Value = abs(Value.x - Value.y))

      

+2


source







All Articles