The question of finding the correlation in R
I am trying to find correlation
between two separate datasets in a R
. The structure of my first dataset (when used print(matr1)
in R
):
year month income
[1,] "2000" "01" "30000"
[2,] "2000" "02" "12364"
[3,] "2000" "03" "37485"
[4,] "2000" "04" "2000"
[5,] "2000" "05" "7573"
The structure of my second dataset (when used print(matr2)
in R
):
month_year value
[1,] "Jan 2000" "84737476"
[2,] "Feb 2000" "39450334"
[3,] "Mar 2000" "48384943"
[4,] "Apr 2000" "12345678"
[5,] "May 2000" "49595340"
Now I want to know the relationship between these two datasets, but the problem I'm having is that the month and year format is different in both datasets. Also when i used R command cor(matr1[,"income"],matr2[,"value"])
i got error like
Error in cor(matr1[,"income"],matr2[,"value"]) :
'x' must be numeric
So my question is:
- How do I remove the error?
- How do I find the correlation when the month and year formats are different?
Any guidance would be helpful to me as I am new to this.
Dealing with dates is a pain IMO. But if you already know what your strings match (i.e., income
in the I from line matr1
comes with / for the same month and year as value
in the same line matr2
), you can get the correlation quite simply with:
cor(as.numeric(matr1[,"income"]), as.numeric(matr2[,"value"]))
source to share