The question of finding the correlation in R

I am trying to find correlation

between two separate datasets in a R

. The structure of my first dataset (when used print(matr1)

in R

):

        year  month  income  
 [1,]  "2000" "01"  "30000"
 [2,]  "2000" "02"  "12364"
 [3,]  "2000" "03"  "37485"
 [4,]  "2000" "04"  "2000"
 [5,]  "2000" "05"  "7573"

      

The structure of my second dataset (when used print(matr2)

in R

):

     month_year     value     
 [1,] "Jan 2000" "84737476"
 [2,] "Feb 2000" "39450334"
 [3,] "Mar 2000" "48384943"
 [4,] "Apr 2000" "12345678"
 [5,] "May 2000" "49595340"

      

Now I want to know the relationship between these two datasets, but the problem I'm having is that the month and year format is different in both datasets. Also when i used R command cor(matr1[,"income"],matr2[,"value"])

i got error like

Error in cor(matr1[,"income"],matr2[,"value"]) : 
  'x' must be numeric

      

So my question is:

  • How do I remove the error?
  • How do I find the correlation when the month and year formats are different?

Any guidance would be helpful to me as I am new to this.

+3


source to share


1 answer


Dealing with dates is a pain IMO. But if you already know what your strings match (i.e., income

in the I from line matr1

comes with / for the same month and year as value

in the same line matr2

), you can get the correlation quite simply with:



cor(as.numeric(matr1[,"income"]), as.numeric(matr2[,"value"]))

      

+2


source







All Articles