Google Dataflow freezes without logs
When I run the sample WordCount job from Dataflow docs using * DataflowPipelineRunner, it starts the workers and then just hangs with the Running state.
The last two status messages are:
Jan 29, 2016, 22:05:50
S02: (b959a12901787f4d): Executing operation ReadLines+WordCount.CountWords/ParDo(ExtractWords)+WordCount.CountWords/Count.PerElement/Init+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey+WordCount.CountWords/Count.PerElement/Count.PerKey/Combine.GroupedValues/Partial+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Reify+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Write
Jan 29, 2016, 22:06:42
(c3fc1276c0229a41): Workers have started successfully.
and what he is. When I click Work Logs, it is completely empty. It stays the same for at least 20 minutes.
It works fine with DirectPipelineRunner (completes in a few seconds and creates an output file on my gs: // ...).
What should I be looking at?
Command line parameters:
--project=my-project
--stagingLocation=gs://my-project/dataflow/staging
source to share
A common reason for missing logs is that the Cloud Logging API is not enabled. If all of the APIs listed in the getting started guide are not enabled it can lead to both of the problems you described (no registration and dangling workers).
Please try again through the getting started guide and enable all relevant APIs.
source to share
If the entire API is enabled, check its authorization.
glcoud auth login
and
gcloud auth application-default login
Also, make sure you run this command with a user who has project owner or editor
access to project owner or editor
.
Alternatively, you can use the service account in your work as import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'
below:import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'
source to share