How to download GZip file from S3?

I have covered both AWS S3 Java SDKs - Download Help File and Working with Zip and GZip Files in Java .

While they provide ways to download and process files from S3 and GZipped files respectively, they do not help to deal with a GZipped file located on S3. How should I do it?

I currently have:

try {
    AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));
    BufferedReader fileIn = new BufferedReader(new InputStreamReader(
            fileObj.getObjectContent()));
    String fileContent = "";
    String line = fileIn.readLine();
    while (line != null){
        fileContent += line + "\n";
        line = fileIn.readLine();
    }
    fileObj.close();
    return fileContent;
} catch (IOException e) {
    e.printStackTrace();
    return "ERROR IOEXCEPTION";
}

      

Obviously I am not handling the compressed nature of the file, but my output is:

    sU 3204 50 5010 20 24  L,(   O V M-.NLOU R U     <s  <# ^ .wߐX %w         }C= % J3  .     ė‘š S įœ‘   ZQ T e  #sr cdN#ē˜:& 
S BĒ”J    P <  

      

However, I am unable to implement the above example in the second question , because the file is not locally local, it requires booting from S3.

What should I do?

+5


source to share


4 answers


I solved the problem by using Scanner

instead InputStream

.

The scanner takes a GZIPInputStream and reads the unpacked file line by line:



fileObj = s3Client.getObject(new GetObjectRequest(oSummary.getBucketName(), oSummary.getKey()));
fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));

      

+7


source


You should use GZIPInputStream

to read the GZIP file

       AmazonS3 s3Client = new AmazonS3Client(
            new ProfileCredentialsProvider());
    String URL = downloadURL.getPrimitiveJavaObject(arg0[0].get());
    S3Object fileObj = s3Client.getObject(getBucket(URL), getFile(URL));

    byte[] buffer = new byte[1024];
    int n;
    FileOutputStream fileOuputStream = new FileOutputStream("temp.gz");
    BufferedInputStream bufferedInputStream = new BufferedInputStream( new GZIPInputStream(fileObj.getObjectContent()));

    GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fileOuputStream);
    while ((n = bufferedInputStream.read(buffer)) != -1) {
        gzipOutputStream.write(buffer);
    }
    gzipOutputStream.flush();
    gzipOutputStream.close();

      



Please try this method to download GZip file from S3.

+3


source


Try it

    BasicAWSCredentials creds = new BasicAWSCredentials("accessKey", "secretKey");
    AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds))
            .withRegion(Regions).build();
    String bucketName = "bucketName";
    String keyName = "keyName";
    S3Object fileObj = s3.getObject(new GetObjectRequest(bucketName, keyName));
    Scanner fileIn = new Scanner(new GZIPInputStream(fileObj.getObjectContent()));
    if (null != fileIn) {
        while (fileIn.hasNext()) {
            System.out.println("Line: " + fileIn.nextLine());
        }
    }
}

      

0


source


I didn't really go looking for this issue, but I wanted to improve the quality of this thread by explaining why the solution already provided works.

No, it's not because of the Scanner as suggested. This is because the stream unfolds by wrapping fileObj.getObjectContent()

in GZIPInputStream

, which unpacks the content.

Delete scanner

but keep GZIPInputStream

and everything will work.

-1


source







All Articles