Checkpoint through point duration is increased by flow equalization

I have a flink job that reads custom events, uses session windows, and writes back to kafka.

The state backend I'm using is s3 (no hdfs cluster, just using libs).

The problem is that the checkpoint completion time continues to increase until the checkpoints are reset, and most of the time is spent on "Align".

The question is why? How can I solve this problem without setting the checkpoint mode to AT_LEAST_ONCE?

As you can see, the duration of the checkpoints continues to grow.

+4


source to share


1 answer


After digging deeper into the issue, this was due to the large GC time (which often happens during checkpoints). We used the FS state backend, while its name has FS, which only refers to the exit location of the breakpoint, while all the state is still in memory (as opposed to the roadsdb state backend).



However, we are still using the FS state backend due to high (er) latency, which we cannot tolerate in this application.

0


source







All Articles