Delta between request execution time and Java request call to complete

Context

  • Our container cluster is located @ us-east1-c
  • We are using the following Java library: google-cloud-bigquery, 0.9.2-beta li>
  • Our dataset has about 26M rows and represents ~ 10G
  • All of our queries return less than 100 rows as we are always grouping by a specific column.

Question

We analyzed the last 100 queries executed in BigQuery, they all took about 2-3 seconds (we analyzed this by calling bq -format = prettyjson show -j JOBID , the end time is the creation time).

In our Java logs, however, most bigquery.query calls will block for 5-6 seconds (and 10 seconds is not unusual). What could explain the system gap between a query that needs to be completed in a BigQuery cluster and the results available in Java? I know 5-6 seconds is not astronomical, but I'm curious to know if this is ok when using the BigQuery Java Cloud Library.

I didn't dig until I analyzed the outgoing call using Wireshark. All our tests were performed on our container cluster (Kubernetes).

code

QueryRequest request = QueryRequest.newBuilder(sql)
                .setMaxWaitTime(30000L)
                .setUseLegacySql(false)
                .setUseQueryCache(false)
                .build();

QueryResponse response = bigquery.query(request);

      

thank

+3


source to share


1 answer


Just by looking at the code briefly: https://github.com/GoogleCloudPlatform/google-cloud-java/blob/master/google-cloud-bigquery/src/main/java/com/google/cloud/bigquery/BigQueryImpl.java

There seem to be several potential sources of latency:



  • Retrieving Query Results
  • Restart (there are some automatic restarts that may explain the delay spikes)
  • Frequency of checking new results

It looks like looking at Wireshark will give you an exact answer as to what's going on.

0


source







All Articles