Apache Beam: PubsubReader not working with NPE

I have a ray pipeline that reads from PubSub and writes to BigQuery after some transform has been applied. The pipeline runs in series with NPE. I am using SDK version 0.6.0. Any idea on what I might be doing wrong? I am trying to start a pipeline using DirectRunner.

java.lang.NullPointerException
at org.apache.beam.sdk.io.PubsubUnboundedSource$PubsubReader.ackBatch(PubsubUnboundedSource.java:640)
at org.apache.beam.sdk.io.PubsubUnboundedSource$PubsubCheckpoint.finalizeCheckpoint(PubsubUnboundedSource.java:313)
at org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.getReader(UnboundedReadEvaluatorFactory.java:174)
at org.apache.beam.runners.direct.UnboundedReadEvaluatorFactory$UnboundedReadEvaluator.processElement(UnboundedReadEvaluatorFactory.java:127)
at org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:139)
at org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:107)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

      

0


source to share


1 answer


This issue exists due to a bug ( BEAM-1656 ) in DirectRunner and a prerequisite in PubsubCheckpoint. The bug in DirectRunner was fixed in pull request 2237 , which is merged into the master Github branch, but after the 0.6.0 release.

Upgrading to 0.7.0 nightly build or building from github HEAD will fix this issue when using DirectRunner.



To update your current nightly build, you will have to add the following repos to your project pom.xml

. The earliest version of the module beam-runners-direct-java

containing the fix, 0.7.0-20170316.070901-9

but not all modules are built with that specific version, so you may need to either specify individually compatible versions or use0.7.0-SNAPSHOT

    <repositories>
      <repository>
        <id>apache.snapshots</id>
        <name>Apache Development Snapshot Repository</name>

 <url>https://repository.apache.org/content/repositories/snapshots/</url>
        <releases>
          <enabled>false</enabled>
        </releases>
        <snapshots>
          <enabled>true</enabled>
        </snapshots>
      </repository>

    </repositories>

      

+1


source







All Articles