Vlookup-like function in R
I am very new to R and currently I have to apply at least a little R knowledge that I have for the analytical work I have to do in order to work.
I have two dataframes: dataframe A consists of transaction data and dataframe B consists of monthly exchange rates for different currencies.
Data Frame A - Transaction Details
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE
1 0001 INR 305000 Mar 2014 2014-03-01
2 0002 USD 15000 Oct 2014 2014-10-31
3 0003 JPY 85000 Feb 2015 2015-02-09
4 0004 CNY 1800000 Mar 2015 2015-03-27
structure(list(TRANSACTION_ID = c("0001", "0002", "0003", "0004"),
COLLECTION_CRNCY = c("INR", "USD", "JPY", "CNY"), COLLECTION_AMT = c(305000,
15000, 85000, 1800000), MMYYYY = structure(c(2014.16666666667,
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
LODG_DATE = structure(c(16130, 16374, 16475, 16521), class = "Date")),
row.names = c(NA, -4L), class = "data.frame")
Data frame B - Currency exchange rates
MMYYYY Date CNY INR JPY USD
1 Mar 2014 2014-03-31 4.9444 47.726 82.0845 0.7951654
2 Oct 2014 2014-10-31 4.7552 47.749 87.2604 0.7778469
3 Feb 2015 2015-02-27 4.5990 45.222 87.7690 0.7338372
4 Mar 2015 2015-03-31 4.5179 45.383 87.5395 0.7287036
structure(list(MMYYYY = structure(c(2014.16666666667,
2014.75, 2015.08333333333, 2015.16666666667), class = "yearmon"),
Date = structure(c(16160, 16374, 16493, 16525), class = "Date"), CNY =
c(4.9444, 4.7552, 4.599, 4.5179), INR = c(47.726, 47.749, 45.222, 45.383),
JPY = c(82.0845, 87.2604, 87.769, 87.5395), USD = c(0.795165394, 0.77784692,
0.733837235, 0.728703636)), .Names = c("MMYYYY", "Date", "CNY", "INR", "JPY",
"USD"), class = "data.frame", row.names = c(NA, -4L))
What I would like to do is create a new column in the dataframe, possibly named Exchange Rate
. And I would like to get this exchange rate value by looking at dataframe B, matching COLLECTION_CRNCY
and MMYYYY
in dataframe A with BIe dataframe:
TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
1 0001 INR 305000 Mar 2014 2014-03-01 47.7260000
2 0002 USD 15000 Oct 2014 2014-10-31 0.7778469
3 0003 JPY 85000 Feb 2015 2015-02-09 87.7690000
4 0004 CNY 1800000 Mar 2015 2015-03-27 4.5179000
I can easily do this through Excel using vlookup and match, but I would like to know how I can achieve the same results as R since the transaction data file is quite large.
source to share
An approach is possible here data.table
. Basically you need to convert df2
to long format and then just (binary) left join todf1
library(data.table)
temp <- melt(setDT(df2[-2]), "MMYYYY", variable.name = "COLLECTION_CRNCY")
setkey(setDT(df1), MMYYYY, COLLECTION_CRNCY)[temp, exchange.rate := i.value]
df1
# TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
# 1: 0001 INR 305000 2014.167 2014-03-01 47.7260000
# 2: 0002 USD 15000 2014.750 2014-10-31 0.7778469
# 3: 0003 JPY 85000 2015.083 2015-02-09 87.7690000
# 4: 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
Alternatively you can use a similar thing using "Hadleyverse" but dplyr
won't be able to merge on class columns zoo
(yet) so you need to uncompress them first
library(dplyr)
library(tidyr)
df2[-2] %>%
gather(COLLECTION_CRNCY, exchange.rate, -MMYYYY) %>%
mutate(MMYYYY = as.numeric(MMYYYY)) %>%
left_join(df1 %>% mutate(MMYYYY = as.numeric(MMYYYY)), .,
by = c("MMYYYY", "COLLECTION_CRNCY"))
# TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
# 1 0001 INR 305000 2014.167 2014-03-01 47.7260000
# 2 0002 USD 15000 2014.750 2014-10-31 0.7778469
# 3 0003 JPY 85000 2015.083 2015-02-09 87.7690000
# 4 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
source to share
You can solve this problem by using base R
using apply
and merge
.
To break up the problem
- merge two datasets together
- extract the corresponding column
1
To combine data, just use:
merge(dfa, dfb, by="MMYYYY")
2
To retrieve the corresponding field, we can use the apply function as roles.
apply(df, 1, function(x) ...)
where df
is data.frame, 1
signals this.
Putting it all together, we can extract the exchange rate in one line like this:
dfa$exchange.rate <- apply(df, 1, function(x) x[x[['COLLECTION_CRNCY']]])
What the row does is x[x[['COLLECTION_CRNCY']]]
simply look at the column COLLECTION_CRNCY
and then use that value to query the corresponding currency column.
final code:
dfa$exchange.rate <- apply(merge(dfa, dfb, by="MMYYYY"), 1, function(x) x[x[['COLLECTION_CRNCY']]])
dfa$exchange.rate <- as.numeric(dfa$exchange.rate) # since it isn't numeric format.
# TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY LODG_DATE exchange.rate
# 1 0001 INR 305000 2014.167 2014-03-01 47.7260000
# 2 0002 USD 15000 2014.750 2014-10-31 0.7778469
# 3 0003 JPY 85000 2015.083 2015-02-09 87.7690000
# 4 0004 CNY 1800000 2015.167 2015-03-27 4.5179000
source to share
Another way for reference purposes:
res <- numeric(nrow(dfA))
for(i in seq_len(nrow(dfA))) {
res[i] <- dfB[match(dfA$MMYYYY[i], dfB$MMYYY),
match(dfA$COLLECTION_CRNCY[i], names(dfB))]}
dfA$Exchange<- res
# TRANSACTION_ID COLLECTION_CRNCY COLLECTION_AMT MMYYYY
# 1 0001 INR 305000 2014.167
# 2 0002 USD 15000 2014.750
# 3 0003 JPY 85000 2015.083
# 4 0004 CNY 1800000 2015.167
# LODG_DATE Exchange
# 1 2014-03-01 47.7260000
# 2 2014-10-31 0.7778469
# 3 2015-02-09 87.7690000
# 4 2015-03-27 4.5179000
source to share
Conversion from wide format can be done with reshape()
. This may be the most annoying feature in R, but you can usually get where you want to if you play around with variations long enough. Once you have B in long format, a simple call merge()
gets the desired result.
B.id <- c('MMYYYY','Date');
B.time <- setdiff(names(B),B.id);
B.long <- reshape(B,dir='l',idvar=B.id,varying=B.time,times=B.time,timevar='COLLECTION_CRNCY',v.names='exchange.rate',new.row.names=1:(length(B.time)*nrow(B)));
B.long;
## MMYYYY Date COLLECTION_CRNCY exchange.rate
## 1 2014.167 2014-03-31 CNY 4.9444000
## 2 2014.750 2014-10-31 CNY 4.7552000
## 3 2015.083 2015-02-27 CNY 4.5990000
## 4 2015.167 2015-03-31 CNY 4.5179000
## 5 2014.167 2014-03-31 INR 47.7260000
## 6 2014.750 2014-10-31 INR 47.7490000
## 7 2015.083 2015-02-27 INR 45.2220000
## 8 2015.167 2015-03-31 INR 45.3830000
## 9 2014.167 2014-03-31 JPY 82.0845000
## 10 2014.750 2014-10-31 JPY 87.2604000
## 11 2015.083 2015-02-27 JPY 87.7690000
## 12 2015.167 2015-03-31 JPY 87.5395000
## 13 2014.167 2014-03-31 USD 0.7951654
## 14 2014.750 2014-10-31 USD 0.7778469
## 15 2015.083 2015-02-27 USD 0.7338372
## 16 2015.167 2015-03-31 USD 0.7287036
merge(A,B.long[c('MMYYYY','COLLECTION_CRNCY','exchange.rate')],all.x=T);
## COLLECTION_CRNCY MMYYYY TRANSACTION_ID COLLECTION_AMT LODG_DATE exchange.rate
## 1 CNY 2015.167 0004 1800000 2015-03-27 4.5179000
## 2 INR 2014.167 0001 305000 2014-03-01 47.7260000
## 3 JPY 2015.083 0003 85000 2015-02-09 87.7690000
## 4 USD 2014.750 0002 15000 2014-10-31 0.7778469
source to share