Combining windowed (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming

Spark 2.2.0 structured streaming is currently in use.

Given a watermarked timestamped data stream, is there a way to combine (1) an operation groupBy

to reach a window using a timestamp field and other grouping criteria with (2) an operation groupByKey

to apply mapGroupsWithState

to groups for a user session?

Or is it something I have to go along with somehow embedding windowing and other grouping logic in groupByKey

?

For context:

  • a call groupBy

    that supports windowed mode on Dataset returns a RelationalGroupedDataset , which does not mapGroupsWithState

    .

  • the call groupByKey

    that supports mapGroupsWithState

    returns KeyValueGroupedDataset , but it has no support for the window

+3
apache-spark structured-streaming


source to share


No one has answered this question yet

Check out similar questions:

7
Spark Strucutured Streaming Window on a column without time stamp
2
Is the correct session possible with Spark Structured Streaming?
2
Get all lines of a window in Spark structured streaming
2
Error when using mapGroupsWithState in Spark Structured Streaming
2
Is there an equivalent for shortening ByKeyAndWindow in Spark Structured Streaming?
1
sparking: select record with maximum timestamp for each id in dataframe (pyspark)
1
Structured streaming exception: adding output mode is not supported for streaming aggregates
0
Writing Spark Structure Streaming data in Cassandra fails due to assertion failure
0
Spark Structured Streaming - Custom Aggregation with Window Time Event



All Articles
Loading...
X
Show
Funny
Dev
Pics