Failed to Update Job Status Exception in Python Cloud Dataflow
I have a Python Cloud Dataflow app that works fine on small subsets, but doesn't seem to work for obvious reasons on a full dataset.
The only error I get in the Dataflow frontend is the standard error message:
The task was attempted 4 times without success. Every time the employee ended up losing contact with the service.
Parsing the Stackdriver logs only shows this error:
Runtime Exception: Traceback (last call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 736, at run deferred_exception_details = deferred_exception_details) File "/ usr / local / lib / python2.7 / dist-packages / dataflow_worker / batchworker.py ", line 590, in do_work exception_details = exception_details) File" /usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry .py ", line 167, wrapped return fun (* args, ** kwargs) File" /usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py ", line 454, in report_completion_status exception_details = exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 266, in report_status work_executor = self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py ", line 364, in response report_status = self._client.projects_jobs_workItems.ReportStatus (request) File" / usr / local / lib / python2. 7 / dist-packages / apache_bea m / internal / clients / dataflow / dataflow_v1b3_client.py ", line 210, in the ReportStatus configuration, request, global_params = global_params) File" /usr/local/lib/python2.7/dist-packages/ apitools / base / py / base_api.py ", line 723, in _RunMethod return self.ProcessHttpResponse (method_config, http_response, request) File" /usr/local/lib/python2.7/dist-packages/apitools/base/py/ base_api.py ", line 729, in ProcessHttpResponse self .__ ProcessHttpResponse (method_config, http_response, request)) File" /usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py ", line 599,at __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError, access https://dataflow.googleapis.com/v1b3/projects//jobs/2017-05-03_03_33_40-3860129055041750274/ response: <{'status':' 400 ',' content-length ':' 360 ',' x-xss-protection ':' 1; mode = block ',' x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF' , '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 03 May 2017 16:46:11 GMT', 'x-frame-options':' SAMEORIGIN ',' content-type ':' application / json; charset = UTF-8 '}>,content <{"error": {"code": 400, "message": "(2a7b20b33659c46e): Failed to publish the result of the work update. Reasons: (2a7b20b33659c523): Failed to update the work status. Reasons: (8a8b13f5c3a944ba): Failed update the work status., (8a8b13f5c3a945d9): The work \ "4047499437681669251 \" is not leased (or the contract has been lost). "," status ":" INVALID_ARGUMENT "}}>not leased (or the contract has been lost). "," status ":" INVALID_ARGUMENT "}}>not leased (or the contract has been lost). "," status ":" INVALID_ARGUMENT "}}>
I'm guessing this error Failed to update work status
is related to Cloud Runner? But since I didn't find any information about this error on the internet, I was wondering if someone else encountered it and has a better explanation?
I am using Google Cloud Dataflow SDK for Python 0.5.5
.
source to share
One of the main reasons for lease expiration is related to memory pressure on the VM. You can try running your job on machines with higher memory. In particular, the highmem machine type should do the trick.
For more information on machine types, please see the GCE Documentation
The next version of Dataflow (2.0.0) should handle these cases better.
source to share