Finding row / column names from correlation matrix values

Question

Finding row / column names from correlation matrix values

I have a correlation matrix that contains the correlations of stock prices. it was calculated through:

corMatrix <- cor(cl2014, use="pairwise.complete.obs")

The matrix is much larger, but it looks like this:

> corMatrix
             RY.TO.Close CM.TO.Close BNS.TO.Close TD.TO.Close
RY.TO.Close    1.0000000   0.8990782    0.8700985  -0.2505789
CM.TO.Close    0.8990782   1.0000000    0.8240780  -0.4184085
BNS.TO.Close   0.8700985   0.8240780    1.0000000  -0.2141785
TD.TO.Close   -0.2505789  -0.4184085   -0.2141785   1.0000000

> class(corMatrix)
[1] "matrix"

I am trying to figure out how I can get the row and column names of all values in a matrix that have a correlation greater than some value.

I can index a matrix to create an index matrix like this:

workingset <- corMatrix > 0.85

What I really want is just a list of row / column pairs identified by the row and column name, so I know which pairs to navigate next.

How can I go from indexing table to row / column names?

Ideally, I would also only consider the bottom or top of the matrix so as not to generate duplicate values, and of course the main diagonal can be ignored as it will always be 1.

+3

matrix r correlation

chollida 31 oct. 14 at 2:14

source to share

2 answers

You can use which

to get a matrix of row / column pairs. Use an argument arr.ind

. We can then map the row and column names for the pairs and place them in the dataframe with their respective values.

w <- which(corMatrix > 0.85, arr.ind = TRUE)
data.frame(row = rownames(w), col = colnames(corMatrix)[w[, "col"]], 
           value = corMatrix[corMatrix > 0.85])
#            row          col     value
# 1  RY.TO.Close  RY.TO.Close 1.0000000
# 2  CM.TO.Close  RY.TO.Close 0.8990782
# 3 BNS.TO.Close  RY.TO.Close 0.8700985
# 4  RY.TO.Close  CM.TO.Close 0.8990782
# 5  CM.TO.Close  CM.TO.Close 1.0000000
# 6  RY.TO.Close BNS.TO.Close 0.8700985
# 7 BNS.TO.Close BNS.TO.Close 1.0000000
# 8  TD.TO.Close  TD.TO.Close 1.0000000

+3

Rich scriven 31 oct. '14 at 2:16

source to share

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-10-31T02:42:53+0000

Another option is to use melt

from "reshape2" and subset

:

library(reshape2)
subset(melt(corMatrix), value > .85)
#            Var1         Var2     value
# 1   RY.TO.Close  RY.TO.Close 1.0000000
# 2   CM.TO.Close  RY.TO.Close 0.8990782
# 3  BNS.TO.Close  RY.TO.Close 0.8700985
# 5   RY.TO.Close  CM.TO.Close 0.8990782
# 6   CM.TO.Close  CM.TO.Close 1.0000000
# 9   RY.TO.Close BNS.TO.Close 0.8700985
# 11 BNS.TO.Close BNS.TO.Close 1.0000000
# 16  TD.TO.Close  TD.TO.Close 1.0000000

You will need melt(as.matrix(corMatrix))

it if your dataset is data.frame

, as there are different methods melt

for matrices and data.frame

s.

Update

As you noticed, you are only interested in the values from the upper triangle (to avoid duplicate pairs / values) and excluding the diagonal, you can do the following:

CM <- corMatrix                               # Make a copy of your matrix
CM[lower.tri(CM, diag = TRUE)] <- NA          # lower tri and diag set to NA
subset(melt(CM, na.rm = TRUE), value > .85)   # melt and subset as before
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

You can also do this with base R. Continuing with the help "CM"

from above, try:

subset(na.omit(data.frame(expand.grid(dimnames(CM)), value = c(CM))), value > .85)
#          Var1         Var2     value
# 5 RY.TO.Close  CM.TO.Close 0.8990782
# 9 RY.TO.Close BNS.TO.Close 0.8700985

Finding row / column names from correlation matrix values

Update

More articles: