BigQuery is much larger in Capacitor than Hadoop (ORC)

During my first tests with BigQuery, I noticed that the table imported into BigQuery is much larger than its original Hadoop view.

Here are the numbers I get:

  • ORC original Hadoop table: 2GB
  • Astro compressed view for loading data into BigQuery: 6.4 GB
  • (test: Avro uncompressed: 45.8 GB)
  • size in BigQuery (capacitor format): 47.1 GB

This table contains 11 million rows with 366 columns (most of which are "rows").

Is this normal BigQuery behavior? I thought Capacitor optimized the data in a very efficient way.

Is there a way to see the internal structures of my data in BigQuery to understand what's going wrong and what's causing this amount of space?

+3


source to share





All Articles