Combining windowed (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming

Spark 2.2.0 structured streaming is currently in use.

Given a watermarked timestamped data stream, is there a way to combine (1) an operation groupBy

to reach a window using a timestamp field and other grouping criteria with (2) an operation groupByKey

to apply mapGroupsWithState

to groups for a user session?

Or is it something I have to go along with somehow embedding windowing and other grouping logic in groupByKey

?

For context:

  • a call groupBy

    that supports windowed mode on Dataset returns a RelationalGroupedDataset , which does not mapGroupsWithState

    .

  • the call groupByKey

    that supports mapGroupsWithState

    returns KeyValueGroupedDataset , but it has no support for the window

+3


source to share





All Articles