Avoid correlated subquery errors in BigQuery

I have a simple query to get the used exchange rate at the time of creating a transaction:

SELECT t.orderid, t.date, 
 (SELECT rate FROM sources.currency_rates r WHERE currencyid=1 AND 
r.date>=t.date  ORDER BY date LIMIT 1) rate
FROM sources.transactions t

      

This throws an error:

Error: Correlated subqueries that reference other tables are not 
supported unless they can be de-correlated, such as by transforming 
them into an efficient JOIN.' 

      

I've tried with several types of joins and named subqueries, but none seem to work. What's the best way to do this? Seems like a very common scenario that should be simple enough in BQ Standard Sql.

+3


source to share


2 answers


Below is for standard SQLQuery SQL



#standardSQL
SELECT 
  t.orderid AS orderid, 
  t.date AS date, 
  ARRAY_AGG(r.rate ORDER BY r.date LIMIT 1)[SAFE_OFFSET(0)] AS rate
FROM `sources.transactions` AS t
JOIN `sources.currency_rates` AS r
ON currencyid = 1 
AND r.date >= t.date
GROUP BY orderid, date

      

+3


source


I've noticed similar behavior with other correlated subqueries. They are useful, but may not always be automatically modeled for JOINs using BigQuery.

A similar case that works:

#standardSQL
SELECT name, (
  SELECT AVG(temp) 
  FROM `bigquery-public-data.noaa_gsod.gsod2017` b
  WHERE a.usaf=b.stn  
) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
LIMIT 10

      

Does not work:



#standardSQL
SELECT name, (
  SELECT temp 
  FROM `bigquery-public-data.noaa_gsod.gsod2017` b
  WHERE a.usaf=b.stn  
  ORDER BY da 
  LIMIT 1
) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
LIMIT 10

      

Fix:

#standardSQL
SELECT name, ARRAY_AGG(temp ORDER BY da LIMIT 1) temp
FROM `bigquery-public-data.noaa_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod2017` b
ON a.usaf=b.stn  
GROUP BY 1
LIMIT 10

      

(give me a public dataset and I'll write a query that works with your data)

+3


source







All Articles