Working with SDK version 0.4.150414

Pulled the latest SDK (0.4.150414) from Maven and our jobs are not working now.

We traced it back to something with the deserialization of the HashMap that is used in one of our classes and referenced by the ParDo transform.

remarks:

  • It breaks on startup both locally and on the CDF service to the cloud.
  • HashMap is populated correctly before calling processElement

  • Placing a breakpoint in the method processElement

    shows that the HashMap has a different object identifier (which should be from deserializing the original HashMap), but now it's empty, meaning all elements are lost.
  • We reverted to version 0.3.150326 and it works great with this version.

Anything changed with the serialization / deserialization features in the latest SDK?

Please email our code with feedback if you need it.

+3


source to share


1 answer


In the latest version, a change was made to clone DoFn when passed to ParDo.of. This leads to better behavior if DoFn is used multiple times and changes between them.

The problem you described would happen if the HashMap field was filled after the DoFn was passed to ParDo.of.



You can confirm this by setting a breakpoint in ParDo.of and checking the DoFn state there. To fix this, initialize the field before calling ParDo.of.

Hope this helps!

+5


source







All Articles