Avoid correlated subquery errors in BigQuery
I have a simple query to get the used exchange rate at the time of creating a transaction:
SELECT t.orderid, t.date,
(SELECT rate FROM sources.currency_rates r WHERE currencyid=1 AND
r.date>=t.date ORDER BY date LIMIT 1) rate
FROM sources.transactions t
This throws an error:
Error: Correlated subqueries that reference other tables are not
supported unless they can be de-correlated, such as by transforming
them into an efficient JOIN.'
I've tried with several types of joins and named subqueries, but none seem to work. What's the best way to do this? Seems like a very common scenario that should be simple enough in BQ Standard Sql.
+3
source to share
2 answers
Below is for standard SQLQuery SQL
#standardSQL
SELECT
t.orderid AS orderid,
t.date AS date,
ARRAY_AGG(r.rate ORDER BY r.date LIMIT 1)[SAFE_OFFSET(0)] AS rate
FROM `sources.transactions` AS t
JOIN `sources.currency_rates` AS r
ON currencyid = 1
AND r.date >= t.date
GROUP BY orderid, date
+3
source to share
I've noticed similar behavior with other correlated subqueries. They are useful, but may not always be automatically modeled for JOINs using BigQuery.
A similar case that works:
#standardSQL
SELECT name, (
SELECT AVG(temp)
FROM `bigquery-public-data.noaa_gsod.gsod2017` b
WHERE a.usaf=b.stn
) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
LIMIT 10
Does not work:
#standardSQL
SELECT name, (
SELECT temp
FROM `bigquery-public-data.noaa_gsod.gsod2017` b
WHERE a.usaf=b.stn
ORDER BY da
LIMIT 1
) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
LIMIT 10
Fix:
#standardSQL
SELECT name, ARRAY_AGG(temp ORDER BY da LIMIT 1) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod2017` b
ON a.usaf=b.stn
GROUP BY 1
LIMIT 10
(give me a public dataset and I'll write a query that works with your data)
+3
source to share