AssertError while reading rows with bigquery reader in python google cloud dataflow

I have a seemingly simple scenario where I am using python data stream to query data using a large query.

I am facing AssertionError when bq request returns null strings, below shows script and assertions error. I am wondering if this is a bug or maybe there is a recommended way to handle null strings from bq reader in py data stream?

Dataflow Script:

from apache_beam.io import WriteToText
from apache_beam.typehints import Any, Dict

pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
BIGQUERY_ROW_TYPE = Dict[str, Any]

# construct a bigquery SQL
query_sql = Query().build_sql()
lines = p \
        | 'read from bigquery' >>  beam.io.Read(beam.io.BigQuerySource(query=query_sql, validate=True)).with_output_types(BIGQUERY_ROW_TYPE) \
        | 'write to test' >> WriteToText(known_args.output)

result = p.run()

      

The error I see when the query returns null strings:

(98b5a6e4c0cd002e): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 581, in do_work
work_executor.execute() 
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 166, in execute
op.start()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/native_operations.py", line 48, in start
for value in reader:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 186, in __iter__
for eof, record, delta_offset in self.read_records():
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", line 102, in read_records
assert block.num_records() > 0
AssertionError

`2017-06-27 (13:55:58) Workflow failed. Causes: (7390b72dc5ceedb6): S04:read from bigquery+write to test/Write/WriteImpl/Wr...
 (bb74ab934e658b06): Workflow failed. Causes: (7390b72dc5ceedb6): S04:read
  from bigquery+write to test/Write/WriteImpl/WriteBundles/Do+write to 
  test/Write/WriteImpl/Pair+write to 
  test/Write/WriteImpl/WindowInto(WindowIntoFn)+write to 
  test/Write/WriteImpl/GroupByKey/Reify+write to 
  test/Write/WriteImpl/GroupByKey/Write failed.`

      

+3


source to share


1 answer


This is a bug and has been fixed (by @jkff). A fix will be available in the next release of Dataflow due in 3-5 weeks.



+1


source







All Articles