Get the top three most common values ​​from each table column

I am trying to write a query that will create a very small sample of data from each column of a table, in which the sample is composed of the three main most common values. This particular issue is part of a larger challenge, which is to write scripts that can characterize a database and its tables, data integrity, and quickly view the total values ​​in a table based on columns. Think of it as an automatic "parsing" of the table.

In one column, I do this by simply calculating the frequency of the values ​​and then sorting by frequency. If I had a column called "color" and all the colors were in it, and it so happened that the color "blue" was in most rows, then the first most common value would be "blue". In SQL that's easy to compute.

However, I'm not sure how to do this with multiple columns.

Currently, when I am doing a calculation on all columns of a table, I am doing the following type of query:

USE database;

DECLARE @t nvarchar(max)
SET @t = N'SELECT '

SELECT @t = @t + 'count(DISTINCT CAST(' + c.name + ' as varchar(max))) "' + c.name + '",'
FROM sys.columns c 
WHERE c.object_id = object_id('table');

SET @t = SUBSTRING(@t, 1, LEN(@t) - 1) + ' FROM table;'

EXEC sp_executesql @t

      

However, it is not entirely clear to me how I will do it here.

(Sidenote: columns with type text, ntext and image as they will throw errors when counting different values, but I am less concerned about this solution)

But the problem of getting the three most common values ​​per column made me stop completely.

Ideally, I would like to get something like this:

Col1     Col2              Col3       Col4     Col5
---------------------------------------------------------------------
1,2,3    red,blue,green    29,17,0    c,d,j    nevada,california,utah

      

+3


source to share


2 answers


I hacked this together, but it seems to work:

I can't help but think I should be using RANK ().



USE <DB>;

DECLARE @query nvarchar(max)
DECLARE @column nvarchar(max)
DECLARE @table nvarchar(max)
DECLARE @i INT = 1
DECLARE @maxi INT = 10
DECLARE @target NVARCHAR(MAX) = <table>

declare @stage TABLE (i int IDENTITY(1,1), col nvarchar(max), tbl nvarchar(max))
declare @results table (ColumnName nvarchar(max), ColumnValue nvarchar(max), ColumnCount int, TableName NVARCHAR(MAX))

insert into @stage

select c.name, o.name
    from sys.columns c
    join sys.objects o on o.object_id=c.object_id and o.type = 'u'
    and c.system_type_id IN (select system_type_id from sys.types where [name] not in ('text','ntext','image'))
    and o.name like @target



SET @maxi = (select max(i) from @stage)

while @i <= @maxi

BEGIN

set @column = (select col from @stage where i = @i)
set @table = (select tbl from @stage where i = @i)


SET @query = N'SELECT ' +''''+@column+''''+' , '+ @column

SELECT @query = @query + ', COUNT(  ' + @column + ' ) as count' + @column + ' , ''' + @table + ''' as tablename'
select @query = @query + ' from ' + @table + ' group by ' + @column

--Select @query
insert into @results
EXEC sp_executesql @query

SET @i = @i + 1
END

select * from @results
; with cte as (
                select *, ROW_NUMBER() over (partition by Columnname order by ColumnCount desc) as rn from @results
                )


select * from cte where rn <=3

      

+1


source


Start with this SQL Query Builder and modify it to your liking:

EDIT Added order by description



With ColumnSet As
(
    Select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
    From INFORMATION_SCHEMA.COLUMNS
    Where 1=1
        And TABLE_NAME IN ('Table1')
        And COLUMN_NAME IN ('Column1', 'Column2')
)
Select 'Select Top 3 ' + COLUMN_NAME + ', Count (*) NumInstances From ' + TABLE_SCHEMA + '.'+ TABLE_NAME + ' Group By ' + COLUMN_NAME + ' Order by Count (*) Desc'
From ColumnSet

      

0


source







All Articles