Finding row / column names from correlation matrix values
I have a correlation matrix that contains the correlations of stock prices. it was calculated through:
corMatrix <- cor(cl2014, use="pairwise.complete.obs")
The matrix is ββmuch larger, but it looks like this:
> corMatrix
RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close 1.0000000 0.8990782 0.8700985 -0.2505789
CM.TO.Close 0.8990782 1.0000000 0.8240780 -0.4184085
BNS.TO.Close 0.8700985 0.8240780 1.0000000 -0.2141785
TD.TO.Close -0.2505789 -0.4184085 -0.2141785 1.0000000
> class(corMatrix)
[1] "matrix"
I am trying to figure out how I can get the row and column names of all values ββin a matrix that have a correlation greater than some value.
I can index a matrix to create an index matrix like this:
workingset <- corMatrix > 0.85
What I really want is just a list of row / column pairs identified by the row and column name, so I know which pairs to navigate next.
How can I go from indexing table to row / column names?
Ideally, I would also only consider the bottom or top of the matrix so as not to generate duplicate values, and of course the main diagonal can be ignored as it will always be 1.
source to share
Another option is to use melt
from "reshape2" and subset
:
library(reshape2)
subset(melt(corMatrix), value > .85)
# Var1 Var2 value
# 1 RY.TO.Close RY.TO.Close 1.0000000
# 2 CM.TO.Close RY.TO.Close 0.8990782
# 3 BNS.TO.Close RY.TO.Close 0.8700985
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 6 CM.TO.Close CM.TO.Close 1.0000000
# 9 RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16 TD.TO.Close TD.TO.Close 1.0000000
You will need melt(as.matrix(corMatrix))
it if your dataset is data.frame
, as there are different methods melt
for matrices and data.frame
s.
Update
As you noticed, you are only interested in the values ββfrom the upper triangle (to avoid duplicate pairs / values) and excluding the diagonal, you can do the following:
CM <- corMatrix # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85) # melt and subset as before
# Var1 Var2 value
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
You can also do this with base R. Continuing with the help "CM"
from above, try:
subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
# Var1 Var2 value
# 5 RY.TO.Close CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985
source to share
You can use which
to get a matrix of row / column pairs. Use an argument arr.ind
. We can then map the row and column names for the pairs and place them in the dataframe with their respective values.
w <- which(corMatrix > 0.85, arr.ind = TRUE)
data.frame(row = rownames(w), col = colnames(corMatrix)[w[, "col"]],
value = corMatrix[corMatrix > 0.85])
# row col value
# 1 RY.TO.Close RY.TO.Close 1.0000000
# 2 CM.TO.Close RY.TO.Close 0.8990782
# 3 BNS.TO.Close RY.TO.Close 0.8700985
# 4 RY.TO.Close CM.TO.Close 0.8990782
# 5 CM.TO.Close CM.TO.Close 1.0000000
# 6 RY.TO.Close BNS.TO.Close 0.8700985
# 7 BNS.TO.Close BNS.TO.Close 1.0000000
# 8 TD.TO.Close TD.TO.Close 1.0000000
source to share