Delta between request execution time and Java request call to complete
Context
- Our container cluster is located @ us-east1-c
- We are using the following Java library: google-cloud-bigquery, 0.9.2-beta li>
- Our dataset has about 26M rows and represents ~ 10G
- All of our queries return less than 100 rows as we are always grouping by a specific column.
Question
We analyzed the last 100 queries executed in BigQuery, they all took about 2-3 seconds (we analyzed this by calling bq -format = prettyjson show -j JOBID , the end time is the creation time).
In our Java logs, however, most bigquery.query calls will block for 5-6 seconds (and 10 seconds is not unusual). What could explain the system gap between a query that needs to be completed in a BigQuery cluster and the results available in Java? I know 5-6 seconds is not astronomical, but I'm curious to know if this is ok when using the BigQuery Java Cloud Library.
I didn't dig until I analyzed the outgoing call using Wireshark. All our tests were performed on our container cluster (Kubernetes).
code
QueryRequest request = QueryRequest.newBuilder(sql)
.setMaxWaitTime(30000L)
.setUseLegacySql(false)
.setUseQueryCache(false)
.build();
QueryResponse response = bigquery.query(request);
thank
source to share
Just by looking at the code briefly: https://github.com/GoogleCloudPlatform/google-cloud-java/blob/master/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryImpl.java
There seem to be several potential sources of latency:
- Retrieving Query Results
- Restart (there are some automatic restarts that may explain the delay spikes)
- Frequency of checking new results
It looks like looking at Wireshark will give you an exact answer as to what's going on.
source to share