BigQuery GROUP_CONCAT and ORDER BY

I am currently using BigQuery and GROUP_CONCAT which work great. However, when I try to add an ORDER BY clause to the GROUP_CONCAT statement, as I would do in SQL, I get an error.

So, for example, something like

SELECT a, GROUP_CONCAT(b ORDER BY c) FROM test GROUP BY a

The same thing happens if I try to specify a delimiter.

Any ideas on how to approach this?

+3


source to share


2 answers


Since BigQuery does not support the ORDER BY clause inside the GROUP_CONCAT function, this function can be performed using the analytic window functions. And in BigQuery, the separator for GROUP_CONCAT is just the second parameter to the function. Below is an example:

select key, first(grouped_value) concat_value from (
select 
  key, 
  group_concat(value, ':') over 
    (partition by key
     order by value asc
     rows between unbounded preceding and unbounded following) 
  grouped_value 
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key

      

Will produce the following:



Row key concat_value     
1   1   a:b:c    
2   2   x:y

      

NOTE in window specification: The query uses "lines between unrestricted preceding and unrestricted subsequent" window specifications to ensure that all lines in a section participate in the GROUP_CONCAT aggregation. The default SQL Standard specification is "lines between unrestricted preceding and current line", which is good for things like running a sum, but won't work correctly in this problem.

Performance note. Even though it repeats the aggregation function many times, the BigQuery optimizer recognizes that since the window does not change, the result will be the same, so it only computes the aggregation once for each section.

+4


source


BigQuery's standard SQL mode supports ORDER BY clause in some aggregate functions, including STRING_AGG, for example:

#standardSQL
select string_agg(t.x order by t.y) 
from unnest([struct<x STRING, y INT64>('a', 5), ('b', 1), ('c', 10)]) t

      

will lead to



b,a,c

The documentation is here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#using-order-by-with-aggregate-functions

+1


source







All Articles