Number of unique numbers in R and SQL
I have data that I am using in both R and SQL Server. The problem is that when I search for unique numbers for a specific column in R it shows 222 and in SQL it returns 216. What is the problem that caused this difference?
Query used in SQL:
Select count(distinct ColName) from TableName
And in R:
Length(Unique(DataframeName$Colname))
source to share
It's hard to tell without the actual data, but R and SQL look at unique values ββdifferently. R (greater than a dot - unique
) will treat NA and various whitespace sizes as unique values:
> unique(c("f","g","f",""," ",NA,NULL))
[1] "f" "g" "" " " NA
> length(unique(c("f","g","f",""," ",NA,NULL)))
[1] 5
SQL will treat different space sizes as equal and not unique:
CREATE TABLE Persons (
PersonID int,
LastName varchar(255));
INSERT INTO Persons (PersonID, LastName)
VALUES (1, 'Rockwell'),(2,''),(4,'Cohen'),(5,' '),(6,' ');
Select count(distinct LastName) from Persons
Gives an answer 3
You can easily trim all trailing and leading whitespace with the help str_trim
from the library stringr
in R:
library(stringr)
a <- str_trim(c("f","g","f",""," ",NA,NULL))
unique(a)
[1] "f" "g" "" NA
source to share