BigQuery is much larger in Capacitor than Hadoop (ORC)

During my first tests with BigQuery, I noticed that the table imported into BigQuery is much larger than its original Hadoop view.

Here are the numbers I get:

ORC original Hadoop table: 2GB
Astro compressed view for loading data into BigQuery: 6.4 GB
(test: Avro uncompressed: 45.8 GB)
size in BigQuery (capacitor format): 47.1 GB

This table contains 11 million rows with 366 columns (most of which are "rows").

Is this normal BigQuery behavior? I thought Capacitor optimized the data in a very efficient way.

Is there a way to see the internal structures of my data in BigQuery to understand what's going wrong and what's causing this amount of space?

+3

hadoop google-bigquery

Sourygna May 29 '17 at 9:22

source to share

No one has answered this question yet

See similar questions:

1

Can I upload Avro files using Snappy compression to BigQuery?

or similar:

4

Orc is not faster than csv in Hive?

3

What is the best way to use the introduction of capacitors besides improving performance?

2

Converting xml to json to process the file in Bigquery

1

Migrating from MySQL to BigQuery without overshooting?

1

GROUP EACH BY query explodes

1

Export tables from BigQuery with compressed AVRO

0

Increase the number of cards when reading with ORC

0

processing hive data takes longer than expected

0

the file size is larger than usual in the hive

0

Export table from Bigquery to GCS split sizes

All Articles