Deleting files older than 10 days on HDFS

Is there a way to delete files older than 10 days on HDFS?

On Linux I would use:

find /path/to/directory/ -type f -mtime +10 -name '*.txt' -execdir rm -- {} \;

      

Is there a way to do this on HDFS? (Deletion is performed based on file creation date)

+3


source to share


3 answers


Solution 1: Using multiple commands as requested by daemon12

hdfs dfs -ls /file/Path    |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

      



Solution 2:   Using a shell script

today=`date +'%s'`
hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')

if [ ${difference} -gt 10 ]; then
    hdfs dfs -rm -r $filePath
fi
done

      

0


source


How about this:

hdfs dfs -ls /tmp    |   tr -s " "    |    cut -d' ' -f6-8    |     grep "^[0-9]"    |    awk 'BEGIN{ MIN=14400; LAST=60*MIN; "date +%s" | getline NOW } { cmd="date -d'\''"$1" "$2"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-WHEN; if(DIFF > LAST){ print "Deleting: "$3; system("hdfs dfs -rm -r "$3) }}'

      



Detailed description here .

+3


source


Yes, you can try with HdfsFindTool :

hadoop jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-job.jar org.apache.solr.hadoop.HdfsFindTool -find /pathhodir -mtime +10 -name ^.*\.txt$ | xargs hdfs dfs -rm -r -skipTrash

      

0


source







All Articles