Download Large Number of Files Using the Amazon S3 Bucket Java SDK

I have a large number of files to download from the S3 bucket. My problem is similar to this article , except I'm trying to run it in Java.

public static void main(String args[]) {
        AWSCredentials myCredentials = new BasicAWSCredentials("key","secret");
        TransferManager tx = new TransferManager(myCredentials);
        File file = <thefile>
        try{
        MultipleFileDownload myDownload = tx.downloadDirectory("<bucket>", null, file);
        System.out.println("Transfer: " + myDownload.getDescription());
        System.out.println("  - State: " + myDownload.getState());
        System.out.println("  - Progress: " + myDownload.getProgress().getBytesTransfered());

        while (myDownload.isDone() == false) {
           System.out.println("Transfer: " + myDownload.getDescription());
           System.out.println("  - State: " + myDownload.getState());
            System.out.println("  - Progress: " + myDownload.getProgress().getBytesTransfered());
            try {
                // Do work while we wait for our upload to complete...
                Thread.sleep(500);
            } catch (InterruptedException ex) {
                ex.printStackTrace();
            }
         }
         } catch(Exception e){
          e.printStackTrace();
         }

      }

      

This has been adapted from the TransferManager class example for multiple uploads. This bucket contains over 100,000 objects. Any help would be great.

+3


source to share


2 answers


Please use the list () method to get a list of your files and then use the get () method to get each file.



class S3 extends AmazonS3Client {

    final String bucket;


    S3(String u, String p, String Bucket) {
        super(new BasicAWSCredentials(u, p));
        bucket = Bucket;
    }


    String get(String k) {
        try {
            final S3Object f = getObject(bucket, k);
            final BufferedInputStream i = new BufferedInputStream(f.getObjectContent());
            final StringBuilder s = new StringBuilder();
            final byte[] b = new byte[1024];
            for (int n = i.read(b); n != -1; n = i.read(b)) {
                s.append(new String(b, 0, n));
            }
            return s.toString();
        } catch (Exception e) {
            log("Cannot get " + bucket + "/" + k + " from S3 because " + e);
        }
        return null;
    }


    String[] list(String d) {
        try {
            final ObjectListing l = listObjects(bucket, d);
            final List<S3ObjectSummary> L = l.getObjectSummaries();
            final int n = L.size();
            final String[] s = new String[n];
            for (int i = 0; i < n; ++i) {
                final S3ObjectSummary k = L.get(i);
                s[i] = k.getKey();
            }
            return s;
        } catch (Exception e) {
            log("Cannot list " + bucket + "/" + d + " on S3 because " + e);
        }
        return new String[]{};
    }
}

      

+3


source


The TransferManager internally uses the countdownlatch function, which leads me to believe it is a concurrent download (which seems like the right way to do it). Does it make more sense to use it rather than getting one file after another in sequence?



+1


source







All Articles