Bigquery: splitting data after the 2000 limit (update: the limit is now 4000)

From the BigQuery page on partitioned tables:

Each table can have up to 2000 partitions.

We planned to split our data by day. Most of our queries will be based on dates, but we have data from the last 5 years and we plan to collect more data every day. With only 2000 partitions: 2000/365 gives us 5.5 years of data.

What's the best practice for tables that require more than 2000 partitions?

  • Create different tables per year and merge tables if necessary?
  • Can you divide by week or month instead?
  • Can this limit of 2000 partitions be increased if you contact support?

Update: The table limit is now 4000 partitions.

+5


source to share


3 answers


We are in the process of enforcing the 2000 partition limits (so we documented that a little early to give our users early notice). So this is the soft limit for now.

Creating a large number of partitions has performance implications, so we suggest limiting tables to 2000 partitions. We have some kind of room here, depending on the table schema, so it would be wise to ask support if an increase is possible. We will consider it depending on how much resources, in our opinion, will be required to operate on this table.



We hope that in the future we will support more partitions (up to 10K), but we are working with the necessary design and implementation changes for this (we do not have an ETA at the moment).

+5


source


Regarding your question "Can I split by week or month instead?" There is a feature request to get more flexibility in the type of sections: https://issuetracker.google.com/issues/35905817



If we can also have INT as a section type, then it would be easy to define "monthly sections" in YYYYMM way.

+2


source


The current limit is 4000 partitions, which is just over 10 years of data. However, if you have 10 years of data and you want it to be split by one day, we used a way to split your table into decades and then write a top view to concatenate the decades tables.

When you query a date-separated field view in a where clause, BigQuery knows to process only the required sections, even if it's across multiple or within a single table.

We used this approach to ensure that business users (data analysts and report designers) only need to worry about one table while still gaining access to the performance and cost benefits of partitioned tables.

0


source







All Articles