Latest query table in BigQuery dataset
I have a dataset containing tables with similar table names ending in yyyymmdd. For example:
myproject:mydataset.Sales20140815
myproject:mydataset.Sales20140816
myproject:mydataset.Sales20140817
myproject:mydataset.Sales20140818
...
myproject:mydataset.Sales20140903
myproject:mydataset.Sales20140904
Is there a way to write BigQuery to query the last table in the dataset (for the above example, this is myproject: mydataset.Sales20140904)?
source to share
N, N. The answer is good, but relying on the modification date is problematic if the old dataset is reimported, which will erroneously be pulled as "last". Since table_id explicitly lists dates in the correct order, it is best to use this value directly.
SELECT
*
FROM
TABLE_QUERY(MyDATASET,
'table_id CONTAINS "MyTable"
AND table_id= (Select MAX(table_id)
FROM MyDATASET.__TABLES__
where table_id contains "MyTable")'
)
source to share
The only solutions I can think of involve changes to your daily ETL:
A: update your ETL to create a copy of the latest table after downloading or updating it. If you are using the bq command line tool it will look something like this:
bq cp mydataset.Sales20140904 mydataset.SalesLatestDay
Then you just query the SalesLatestDay table.
B: Better yet, create a view that references your most recent table ("SELECT * FROM mydataset.Sales20140904") and update it daily. Information on creating views using the REST API: https://developers.google.com/bigquery/docs/reference/v2/tables#resource
source to share
I would use a table lookup function. If the latter is today's table use
Select * from TABLE_DATE_RANG(MyDATASET.,Current_Timestamp(),Current_Timestamp())
If the last modified table may have a past date. you can use:
SELECT
*
FROM
TABLE_QUERY(MyDATASET,
'table_id CONTAINS "MyTable"
AND last_modified_time= (Select MAX(last_modified_time)
FROM MyDATASET.__TABLES__
where table_id contains "MyTable")'
)
Hope it helps ...
source to share