Latest query table in BigQuery dataset

I have a dataset containing tables with similar table names ending in yyyymmdd. For example:

myproject:mydataset.Sales20140815
myproject:mydataset.Sales20140816
myproject:mydataset.Sales20140817
myproject:mydataset.Sales20140818
...
myproject:mydataset.Sales20140903
myproject:mydataset.Sales20140904 

      

Is there a way to write BigQuery to query the last table in the dataset (for the above example, this is myproject: mydataset.Sales20140904)?

+3


source to share


5 answers


N, N. The answer is good, but relying on the modification date is problematic if the old dataset is reimported, which will erroneously be pulled as "last". Since table_id explicitly lists dates in the correct order, it is best to use this value directly.



SELECT 
  *
FROM 
TABLE_QUERY(MyDATASET, 
      'table_id CONTAINS "MyTable" 
      AND table_id= (Select MAX(table_id) 
                              FROM MyDATASET.__TABLES__
                              where table_id contains "MyTable")'
            )

      

+7


source


The only solutions I can think of involve changes to your daily ETL:

A: update your ETL to create a copy of the latest table after downloading or updating it. If you are using the bq command line tool it will look something like this:

bq cp mydataset.Sales20140904 mydataset.SalesLatestDay

      



Then you just query the SalesLatestDay table.

B: Better yet, create a view that references your most recent table ("SELECT * FROM mydataset.Sales20140904") and update it daily. Information on creating views using the REST API: https://developers.google.com/bigquery/docs/reference/v2/tables#resource

+2


source


I would use a table lookup function. If the latter is today's table use

Select * from TABLE_DATE_RANG(MyDATASET.,Current_Timestamp(),Current_Timestamp())

      

If the last modified table may have a past date. you can use:

    SELECT 
      *
    FROM 
    TABLE_QUERY(MyDATASET, 
          'table_id CONTAINS "MyTable" 
          AND last_modified_time= (Select MAX(last_modified_time) 
                                  FROM MyDATASET.__TABLES__
                                  where table_id contains "MyTable")'
                )

      

Hope it helps ...

+2


source


SELECT * 
FROM TABLE_QUERY(myproject:mydataset,
  "table_id IN (
     SELECT table_id FROM myproject:mydataset.__TABLES__  
     WHERE REGEXP_MATCH(table_id, r"^Sales.*")
     ORDER BY creation_time DESC LIMIT 1)")

      

+2


source


If your spreadsheet is definitely updated daily, here's my trick.

SELECT * FROM TABLE_DATE_RANGE(myproject:mydataset.Sales, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP())

      

0


source







All Articles