How can I check if a local file is the same as an S3 object without loading it with boto3?

How can I check if a local file is the same as a file stored on S3 without downloading it? Avoid downloading large files over and over again. S3 objects have electronic tags, but they are difficult to figure out if the file was uploaded in parts, and the solution from this question doesn't seem to work. Is there an easier way to avoid unnecessary downloads?

+3


source to share


3 answers


I would just compare the last modified time and load if they are different. Alternatively, you can compare the size before uploading. Given bucket

, key

and a local file fname

:



import boto3
import os.path

def isModified(bucket, key, fname):
  s3 = boto3.resource('s3')
  obj = s3.Object(bucket, key)
  return int(obj.last_modified.strftime('%s')) != int(os.path.getmtime(fname))

      

+3


source


Can I use a small local database like. text file?

  • Load the S3 object once. Not his ETag.
  • Calculate any signature you want.
  • Place the pair (ETag, signature) in the "database".


Next time, look for the ETag in the database before proceeding with the download. If it is there, compute the signature of your existing file and compare it with the signature corresponding to the ETag. If they match, the deleted file will be the same as yours.

There is a possibility that the same file will be re-uploaded with different fragments, thus changing the ETag. If this is not very likely, you can simply ignore the false negative and re-upload the file in this rare case.

+1


source


If you don't need immediate inventory, you can generate s3 storage inventory and then import them into your database for future use.

Calculate the local Etag file as shown here for a regular file and a huge multi-page file .

0


source







All Articles