Blast cleaning RDD test points

We have sparking from kafka creating checkpoints on HDFS server and it is not flushed, we now have millions of checkpoints in HDFS. Is there a way to clear the spark from it automatically?

Spark Version 1.6 HDFS 2.70

There are other random directories besides checkpoints that have not been cleaned up

+3


source to share


1 answer


val conf = new SparkConf().set("spark.cleaner.referenceTracking.cleanCheckpoints", "true")

      



Cleaning does not have to be done automatically for all control points, so they must be kept close to each other. Where Spark Streaming stores intermediate state data as checkpoints and relies on it to recover from driver failures.

+2


source







All Articles