Vlookup-like function in R

I am very new to R and currently I have to apply at least a little R knowledge that I have for the analytical work I have to do in order to work.

I have two dataframes: dataframe A consists of transaction data and dataframe B consists of monthly exchange rates for different currencies.

Data Frame A - Transaction Details

    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE
1           0001              INR         305000 Mar 2014 2014-03-01
2           0002              USD          15000 Oct 2014 2014-10-31
3           0003              JPY          85000 Feb 2015 2015-02-09
4           0004              CNY        1800000 Mar 2015 2015-03-27

structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"), 
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000, 
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")), 
row.names = c(NA, -4L), class = "data.frame")

      

Data frame B - Currency exchange rates

    MMYYYY       Date    CNY    INR     JPY       USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036

structure(list(MMYYYY = structure(c(2014.16666666667, 
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"), 
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY = 
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383), 
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692, 
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY", 
"USD"), class = "data.frame", row.names = c(NA, -4L))

      

What I would like to do is create a new column in the dataframe, possibly named Exchange Rate

. And I would like to get this exchange rate value by looking at dataframe B, matching COLLECTION_CRNCY

and MMYYYY

in dataframe A with BIe dataframe:

TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
1           0001              INR         305000 Mar 2014 2014-03-01    47.7260000
2           0002              USD          15000 Oct 2014 2014-10-31     0.7778469
3           0003              JPY          85000 Feb 2015 2015-02-09    87.7690000
4           0004              CNY        1800000 Mar 2015 2015-03-27     4.5179000

      

I can easily do this through Excel using vlookup and match, but I would like to know how I can achieve the same results as R since the transaction data file is quite large.

+3


source to share


4 answers


An approach is possible here data.table

. Basically you need to convert df2

to long format and then just (binary) left join todf1

library(data.table)
temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY")
setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value]
df1
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1:           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2:           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3:           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4:           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

      




Alternatively you can use a similar thing using "Hadleyverse" but dplyr

won't be able to merge on class columns zoo

(yet) so you need to uncompress them first

library(dplyr)
library(tidyr)
df2[-2] %>% 
  gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>%
  mutate(MMYYYY = as.numeric(MMYYYY)) %>%
  left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), .,
                           by = c("MMYYYY", "COLLECTION_CRNCY"))
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
# 1           0001              INR         305000 2014.167 2014-03-01    47.7260000
# 2           0002              USD          15000 2014.750 2014-10-31     0.7778469
# 3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
# 4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

      

+4


source


You can solve this problem by using base R

using apply

and merge

.

To break up the problem

  • merge two datasets together
  • extract the corresponding column

1

To combine data, just use:

merge(dfa, dfb, by="MMYYYY")

2

To retrieve the corresponding field, we can use the apply function as roles.

apply(df, 1, function(x) ...)



where df

is data.frame, 1

signals this.


Putting it all together, we can extract the exchange rate in one line like this:

dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])

What the row does is x[x[['COLLECTION_CRNCY']]]

simply look at the column COLLECTION_CRNCY

and then use that value to query the corresponding currency column.


final code:

dfa$exchange.rate <- apply(merge(dfa, dfb, by="MMYYYY"), 1, function(x) x[x[['COLLECTION_CRNCY']]])
dfa$exchange.rate <- as.numeric(dfa$exchange.rate) # since it isn't numeric format.
#    TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY  LODG_DATE exchange.rate
#  1           0001              INR         305000 2014.167 2014-03-01    47.7260000
#  2           0002              USD          15000 2014.750 2014-10-31     0.7778469
#  3           0003              JPY          85000 2015.083 2015-02-09    87.7690000
#  4           0004              CNY        1800000 2015.167 2015-03-27     4.5179000

      

+1


source


Another way for reference purposes:

res <- numeric(nrow(dfA))
for(i in seq_len(nrow(dfA))) {
    res[i] <- dfB[match(dfA$MMYYYY[i], dfB$MMYYY), 
                  match(dfA$COLLECTION_CRNCY[i], names(dfB))]}
dfA$Exchange<- res
#   TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT   MMYYYY
# 1           0001              INR         305000 2014.167
# 2           0002              USD          15000 2014.750
# 3           0003              JPY          85000 2015.083
# 4           0004              CNY        1800000 2015.167
#    LODG_DATE   Exchange
# 1 2014-03-01 47.7260000
# 2 2014-10-31  0.7778469
# 3 2015-02-09 87.7690000
# 4 2015-03-27  4.5179000

      

+1


source


Conversion from wide format can be done with reshape()

. This may be the most annoying feature in R, but you can usually get where you want to if you play around with variations long enough. Once you have B in long format, a simple call merge()

gets the desired result.

B.id <- c('MMYYYY','Date');
B.time <- setdiff(names(B),B.id);
B.long <- reshape(B,dir='l',idvar=B.id,varying=B.time,times=B.time,timevar='COLLECTION_CRNCY',v.names='exchange.rate',new.row.names=1:(length(B.time)*nrow(B)));
B.long;
##      MMYYYY       Date COLLECTION_CRNCY exchange.rate
## 1  2014.167 2014-03-31              CNY     4.9444000
## 2  2014.750 2014-10-31              CNY     4.7552000
## 3  2015.083 2015-02-27              CNY     4.5990000
## 4  2015.167 2015-03-31              CNY     4.5179000
## 5  2014.167 2014-03-31              INR    47.7260000
## 6  2014.750 2014-10-31              INR    47.7490000
## 7  2015.083 2015-02-27              INR    45.2220000
## 8  2015.167 2015-03-31              INR    45.3830000
## 9  2014.167 2014-03-31              JPY    82.0845000
## 10 2014.750 2014-10-31              JPY    87.2604000
## 11 2015.083 2015-02-27              JPY    87.7690000
## 12 2015.167 2015-03-31              JPY    87.5395000
## 13 2014.167 2014-03-31              USD     0.7951654
## 14 2014.750 2014-10-31              USD     0.7778469
## 15 2015.083 2015-02-27              USD     0.7338372
## 16 2015.167 2015-03-31              USD     0.7287036
merge(A,B.long[c('MMYYYY','COLLECTION_CRNCY','exchange.rate')],all.x=T);
##   COLLECTION_CRNCY   MMYYYY TRANSACTION_ID COLLECTION_AMT  LODG_DATE exchange.rate
## 1              CNY 2015.167           0004        1800000 2015-03-27     4.5179000
## 2              INR 2014.167           0001         305000 2014-03-01    47.7260000
## 3              JPY 2015.083           0003          85000 2015-02-09    87.7690000
## 4              USD 2014.750           0002          15000 2014-10-31     0.7778469

      

0


source







All Articles