BigQuery GROUP_CONCAT and ORDER BY
I am currently using BigQuery and GROUP_CONCAT which work great. However, when I try to add an ORDER BY clause to the GROUP_CONCAT statement, as I would do in SQL, I get an error.
So, for example, something like
SELECT a, GROUP_CONCAT(b ORDER BY c)
FROM test
GROUP BY a
The same thing happens if I try to specify a delimiter.
Any ideas on how to approach this?
source to share
Since BigQuery does not support the ORDER BY clause inside the GROUP_CONCAT function, this function can be performed using the analytic window functions. And in BigQuery, the separator for GROUP_CONCAT is just the second parameter to the function. Below is an example:
select key, first(grouped_value) concat_value from (
select
key,
group_concat(value, ':') over
(partition by key
order by value asc
rows between unbounded preceding and unbounded following)
grouped_value
from (
select key, value from
(select 1 as key, 'b' as value),
(select 1 as key, 'c' as value),
(select 1 as key, 'a' as value),
(select 2 as key, 'y' as value),
(select 2 as key, 'x' as value))) group by key
Will produce the following:
Row key concat_value
1 1 a:b:c
2 2 x:y
NOTE in window specification: The query uses "lines between unrestricted preceding and unrestricted subsequent" window specifications to ensure that all lines in a section participate in the GROUP_CONCAT aggregation. The default SQL Standard specification is "lines between unrestricted preceding and current line", which is good for things like running a sum, but won't work correctly in this problem.
Performance note. Even though it repeats the aggregation function many times, the BigQuery optimizer recognizes that since the window does not change, the result will be the same, so it only computes the aggregation once for each section.
source to share
BigQuery's standard SQL mode supports ORDER BY clause in some aggregate functions, including STRING_AGG, for example:
#standardSQL
select string_agg(t.x order by t.y)
from unnest([struct<x STRING, y INT64>('a', 5), ('b', 1), ('c', 10)]) t
will lead to
b,a,c
The documentation is here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#using-order-by-with-aggregate-functions
source to share