How do I compare the overlap of file sizes between duplicate directories?

I need to compare two directories to validate a backup.

Say my directory looks like this:

Filename        Filesize      Filename        Filesize
user@main_server:~/mydir/     user@backup_server:~/mydir/
file1000.txt    4182410737    file1000.txt    4182410737
file1001.txt    8241410737    -                          <-- missing on backup_server!
...                           ...
file9999.txt    2410418737    file9999.txt    1111111111 <-- size != main_server

      

Is there a quick one liner that would bring me closer to conclusion, like this:

Invalid Backup Files:
file1001.txt
file9999.txt

      

(for the purpose of instructing the backup script to update these files)

I have tried getting options for the following to no avail.

[main_server] $ rsync -n ~/mydir/ user@backup_server:~/mydir

      

I cannot do rsync

to back up the directories themselves because it takes too long (8-24 hours). Instead, I run multiple threads scp

to extract files in batches. This completes regularly <1 h. However, sometimes I find multiple files that have been missed somehow (connection may have been disconnected).

Speed ​​is a priority, so file sizes must be adequate. But I'm open to including checksum

if it doesn't slow down the process as I find with help rsync

.

Here's my test process:

# Generate Large Files (1GB)
for i in {1..100}; do head -c 1073741824 </dev/urandom >foo-$i ; done

# SCP them from src to dest
for i in {1..100}; do ( scp ~/mydir/foo-$i user@backup_server:~/mydir/ & ) ; sleep 0.1 ; done

# Confirm destination has everything from source
# This is the point of the question. I've tried:

rsync -Sa ~/mydir/ user@backup_server:~/mydir
# Way too slow

      

What do you recommend?

+3


source to share


1 answer


By default, rsync uses the quick check method, which only transfers files that differ in size or when they were last modified. Since you are reporting that the dimensions are not resizing, this seems to indicate that the timestamps are different. There are two options:



  • Use -p

    to preserve timestamps when transferring files.

  • Use --size-only

    to ignore timestamps and transfer only files that differ in size.

+1


source







All Articles