BigQuery is much larger in Capacitor than Hadoop (ORC)
During my first tests with BigQuery, I noticed that the table imported into BigQuery is much larger than its original Hadoop view.
Here are the numbers I get:
- ORC original Hadoop table: 2GB
- Astro compressed view for loading data into BigQuery: 6.4 GB
- (test: Avro uncompressed: 45.8 GB)
- size in BigQuery (capacitor format): 47.1 GB
This table contains 11 million rows with 366 columns (most of which are "rows").
Is this normal BigQuery behavior? I thought Capacitor optimized the data in a very efficient way.
Is there a way to see the internal structures of my data in BigQuery to understand what's going wrong and what's causing this amount of space?
+3
source to share
No one has answered this question yet
See similar questions:
or similar: