ETag algorithm for S3 multiphase downloads in Java?

Question

ETag algorithm for S3 multiphase downloads in Java?

I understand in theory the algorithm for generating eTag S3 multi-rate download. But I am not getting the expected results. Can anyone please help?

ETag theory for multipart uploads (at least my understanding) :

Take md5 of each downloadable piece and put it together. Then take the md5 of the concatenated md5. Finally, add "-" and the number of pieces loaded.

NOTE. The example below uses md5 values. The resulting md5 is not the actual md5 of the md5 part

eg.

283771245d05b26c35768d1f182fbac0 - part of file 1 md5
673c3f1ad03d60ea0f64315095ad7131 - part of file 2 md5
11c68be603cbe39357a0f93be6ab9e2c - part of file 3 md5

Concatenated md5: 283771245d05b26c35768d1f182fbac0673c3f1ad03d60ea0f64315095ad713111c68be603cbe39357a0f93be6ab9e2c

md5 of the concatenated line above with a dash and number of file parts :
115671880dfdfe8860d6aabd09139708-3

To do this in Java I tried two methods - neither of them returns the correct eTag value

int MB = 1048576;
int bufferSize = 5 * MB;
byte[] buffer = new byte[ bufferSize ];

try {  // String method
    FileInputStream fis = new FileInputStream( new File( fileName ) );

    int bytesRead;
    String md5s = "";

    do {
        bytesRead = fis.read( buffer );
        String md5 =  org.apache.commons.codec.digest.DigestUtils.md5Hex( new String( buffer ) );
        md5s += md5;
    }  while ( bytesRead == bufferSize );

    System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( md5s ) );
    fis.close();

}
catch( Exception e ) {
    System.out.println( e );
}


try {  //  Byte array method
    FileInputStream fis = new FileInputStream( new File( fileName ) );

    int bytesRead;
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

    do {
        bytesRead = fis.read( buffer );
        byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );
    }  while ( bytesRead == bufferSize );

    System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( byteArrayOutputStream.toByteArray() ) );
    fis.close();
}
catch( Exception e ) {
    System.out.println( e );
}

Can anyone understand why none of the algorithms work?

+3

java algorithm amazon-s3 amazon-web-services hash

Todd 21 oct. 14 at 20:41

source to share

1 answer

Maarten bodewes · Answer 1 · 2015-11-07T12:38:33+0000

You should use a byte oriented method.

It fails because:

}  while ( bytesRead == bufferSize );

fails if the file consists of exactly x parts.

Also, it doesn't work for:

byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );

if the block is not completely filled with bytes, i.e. when the file does not consist of exactly x parts.

In other words, it always fails.

ETag algorithm for S3 multiphase downloads in Java?

More articles: