Google Dataflow stopped after BigQuery was disabled
I have a Google Dataflow Job. The data flow task reads messages from Pub / Sub, enriches them, and writes enriched data to BigQuery.
Dataflow processed about 5000 messages per second. I am using 20 workers to run a data flow job.
BigQuery seems to have crashed yesterday. Therefore, writing data to the BigQuery part failed. After a while, my data stream stops working. I see 1000 errors like below
(7dd47a65ad656a43): Exception: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "The project xx-xxxxxx-xxxxxx has not enabled BigQuery.",
"reason" : "invalid"
} ],
"message" : "The project xx-xxxxxx-xxxxxx has not enabled BigQuery.",
"status" : "INVALID_ARGUMENT"
}
com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:285)
com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:175)
com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2728)
com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2685)
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:159)
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:194)
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)
com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:65)
com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
com.google.cloud.dataflow.sdk.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:719)
Stack trace truncated. Please see Cloud Logging for the entire trace.
Please note that the data flow is not working, even BigQuery started working. I had to restart the dataflow to get it working.
This results in data loss. Not only during shutdown, but also until I noticed the error and restarted the data flow. Is there a way to tweak the retry setting so that the data flow specification is not obsolete in these cases?
source to share
No one has answered this question yet
Check out similar questions: