ETag algorithm for S3 multiphase downloads in Java?
I understand in theory the algorithm for generating eTag S3 multi-rate download. But I am not getting the expected results. Can anyone please help?
ETag theory for multipart uploads (at least my understanding) :
Take md5 of each downloadable piece and put it together. Then take the md5 of the concatenated md5. Finally, add "-" and the number of pieces loaded.
NOTE. The example below uses md5 values. The resulting md5 is not the actual md5 of the md5 part
eg.
- 283771245d05b26c35768d1f182fbac0 - part of file 1 md5
- 673c3f1ad03d60ea0f64315095ad7131 - part of file 2 md5
- 11c68be603cbe39357a0f93be6ab9e2c - part of file 3 md5
Concatenated md5: 283771245d05b26c35768d1f182fbac0673c3f1ad03d60ea0f64315095ad713111c68be603cbe39357a0f93be6ab9e2c
md5 of the concatenated line above with a dash and number of file parts :
115671880dfdfe8860d6aabd09139708-3
To do this in Java I tried two methods - neither of them returns the correct eTag value
int MB = 1048576;
int bufferSize = 5 * MB;
byte[] buffer = new byte[ bufferSize ];
try { // String method
FileInputStream fis = new FileInputStream( new File( fileName ) );
int bytesRead;
String md5s = "";
do {
bytesRead = fis.read( buffer );
String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( new String( buffer ) );
md5s += md5;
} while ( bytesRead == bufferSize );
System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( md5s ) );
fis.close();
}
catch( Exception e ) {
System.out.println( e );
}
try { // Byte array method
FileInputStream fis = new FileInputStream( new File( fileName ) );
int bytesRead;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
do {
bytesRead = fis.read( buffer );
byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );
} while ( bytesRead == bufferSize );
System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( byteArrayOutputStream.toByteArray() ) );
fis.close();
}
catch( Exception e ) {
System.out.println( e );
}
Can anyone understand why none of the algorithms work?
source to share
You should use a byte oriented method.
It fails because:
} while ( bytesRead == bufferSize );
fails if the file consists of exactly x parts.
Also, it doesn't work for:
byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );
if the block is not completely filled with bytes, i.e. when the file does not consist of exactly x parts.
In other words, it always fails.
source to share