Number of unique numbers in R and SQL

I have data that I am using in both R and SQL Server. The problem is that when I search for unique numbers for a specific column in R it shows 222 and in SQL it returns 216. What is the problem that caused this difference?

Query used in SQL:

Select count(distinct ColName) from TableName

      

And in R:

Length(Unique(DataframeName$Colname))

      

+3


source to share


2 answers


It's hard to tell without the actual data, but R and SQL look at unique values ​​differently. R (greater than a dot - unique

) will treat NA and various whitespace sizes as unique values:

> unique(c("f","g","f","","  ",NA,NULL))
[1] "f"  "g"  ""   "  " NA 

> length(unique(c("f","g","f","","  ",NA,NULL)))
[1] 5

      

SQL will treat different space sizes as equal and not unique:

CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255));
INSERT INTO Persons (PersonID, LastName)
       VALUES (1, 'Rockwell'),(2,''),(4,'Cohen'),(5,' '),(6,'  ');

Select count(distinct LastName) from Persons

      



Gives an answer 3

You can easily trim all trailing and leading whitespace with the help str_trim

from the library stringr

in R:

library(stringr)

a <- str_trim(c("f","g","f","","  ",NA,NULL))
unique(a)
[1] "f" "g" ""  NA

      

+2


source


With package library(sqldf)

in, R

you can use query SQL

in R

.

For example, you can count unique numbers in R as follows:



library(sqldf)

sqldf('select distinct count (ColName) as count,
ColName
from TableName
group by ColName')

      

0


source







All Articles