Google Dataflow freezes without logs

When I run the sample WordCount job from Dataflow docs using * DataflowPipelineRunner, it starts the workers and then just hangs with the Running state.

The last two status messages are:

Jan 29, 2016, 22:05:50
S02: (b959a12901787f4d): Executing operation ReadLines+WordCount.CountWords/ParDo(ExtractWords)+WordCount.CountWords/Count.PerElement/Init+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey+WordCount.CountWords/Count.PerElement/Count.PerKey/Combine.GroupedValues/Partial+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Reify+WordCount.CountWords/Count.PerElement/Count.PerKey/GroupByKey/Write

Jan 29, 2016, 22:06:42
(c3fc1276c0229a41): Workers have started successfully.

      

and what he is. When I click Work Logs, it is completely empty. It stays the same for at least 20 minutes.

It works fine with DirectPipelineRunner (completes in a few seconds and creates an output file on my gs: // ...).

What should I be looking at?

Command line parameters:

--project=my-project
--stagingLocation=gs://my-project/dataflow/staging

      

+2


source to share


2 answers


A common reason for missing logs is that the Cloud Logging API is not enabled. If all of the APIs listed in the getting started guide are not enabled it can lead to both of the problems you described (no registration and dangling workers).



Please try again through the getting started guide and enable all relevant APIs.

+2


source


If the entire API is enabled, check its authorization.

glcoud auth login

and



gcloud auth application-default login

Also, make sure you run this command with a user who has project owner or editor

access to project owner or editor

.

Alternatively, you can use the service account in your work as import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'

below:import os os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '<creds.json>'

0


source







All Articles