Generating statistics from a table

I have a table with 20 columns of dimensions. I would like to "convert" the table to a 20 row table with Avg, Min, Max, StdDev, Count data types media columns. There is another question like this, but for the "R" language. Another question is here.

I could do the following for each column (handling the results with C ++):

    Select Count(Case When [avgZ_l1] <= 0.15 and avgZ_l1 > 0 then 1 end) as countValue1, 
Count(case when [avgZ_l1] <= 0.16 and avgZ_l1 > 0.15 then 1 end) as countValue2,
Count(case when [avgZ_l1] <= 0.18 and avgZ_l1 > 0.16 then 1 end) as countValue3, 
Count(case when [avgZ_l1] <= 0.28 and avgZ_l1 > 0.18 then 1 end) as countValue4,
Avg(avgwall_l1) as avg1, Min(avgwall_l1) as min1, Max(avgZ_l1) as max1, 
STDEV(avgZ_l1) as stddev1, count(*) as totalCount  from myProject.dbo.table1

      

But I don't want to process 50,000 records 20 times (once for each column). I thought that you would need to "expand" the table on your side and process the data at the same time. I've seen examples of "Pivot", but they all seem to pivot to an integer field of type, month number, or device ID. After converting the table, I could get each row from C ++. Perhaps it's really just "Insert into ... select ... from".
Would the fastest (runtime) just create a really long select statement that returns all the information I want for all columns? We can have 500,000 lines. I am using C ++ and SQL 2014.

Any thoughts or comments are appreciated. I just don't want my naive code to be used as a prime example of how NOT to do something ...;) ...

+3


source to share


1 answer


If your table looks like the code you sent to r, then the following query will work for you. It selects the data you requested and rotates it at the same time.

create table #temp(ID int identity(1,1),columnName nvarchar(50));
insert into #temp 
SELECT COLUMN_NAME as columnName
FROM myProject.INFORMATION_SCHEMA.COLUMNS               -- change myProject to the name of your database. Unless myProject is your database
WHERE TABLE_NAME = N'table1';                           --change table1 to your table that your looking at. Unless table1 is your table
declare @TableName nvarchar(50) = 'table1';             --change table1 to your table again 

declare @loop int = 1;
declare @query nvarchar(max) = '';
declare @columnName nvarchar(50);
declare @endQuery nvarchar(max)='';
while (@loop <= (select count(*) from #temp))
    begin
        set @columnName = (select columnName from #temp where ID = @loop);
        set @query = 'select t.columnName, avg(['+@columnName+']) as Avg ,min(['+@columnName+']) as min ,max(['+@columnName+'])as max ,stdev(['+@columnName+']) as STDEV,count(*) as totalCount from '+@tablename+' join   #temp t on t.columnName = '''+@columnName+''' group by t.columnName';
        set @loop += 1;
        set @endQuery += 'union all('+ @query + ')';
    end;
set @endQuery = stuff(@endQuery,1,9,'')
Execute(@endQuery);
drop table #temp;

      

Creates a #temp table that stores the column header values ​​next to the ID. Then it uses the id when looping, even though the number of columns you have. Then it generates a query that picks what you want and then concatenates it. This query will work on any number of columns, which means that if you add or remove more columns, it should give the correct result.

With this input:



 age   height_seca1 height_chad1 height_DL weight_alog1
1   19         1800         1797       180           70
2   19         1682         1670       167           69
3   21         1765         1765       178           80
4   21         1829         1833       181           74
5   21         1706         1705       170          103
6   18         1607         1606       160           76
7   19         1578         1576       156           50
8   19         1577         1575       156           61
9   21         1666         1665       166           52
10  17         1710         1716       172           65
11  28         1616         1619       161           66
12  22         1648         1644       165           58
13  19         1569         1570       155           55
14  19         1779         1777       177           55
15  18         1773         1772       179           70
16  18         1816         1809       181           81
17  19         1766         1765       178           77
18  19         1745         1741       174           76
19  18         1716         1714       170           71
20  21         1785         1783       179           64
21  19         1850         1854       185           71
22  31         1875         1880       188           95
23  26         1877         1877       186          106
24  19         1836         1837       185          100
25  18         1825         1823       182           85
26  19         1755         1754       174           79
27  26         1658         1658       165           69
28  20         1816         1818       183           84
29  18         1755         1755       175           67

      

It will produce this output:

                    avg     min     max     stdev   totalcount
age                 20      17      31      3.3     29 
height_seca1        1737    1569    1877    91.9    29 
height_chad1        1736    1570    1880    92.7    29 
height_DL           173     155     188     9.7     29 
weight_alog1        73      50      106     14.5    29 

      

Hope this helps and works for you. :)

+1


source







All Articles