ETag algorithm for S3 multiphase downloads in Java?

I understand in theory the algorithm for generating eTag S3 multi-rate download. But I am not getting the expected results. Can anyone please help?

ETag theory for multipart uploads (at least my understanding) :

Take md5 of each downloadable piece and put it together. Then take the md5 of the concatenated md5. Finally, add "-" and the number of pieces loaded.

NOTE. The example below uses md5 values. The resulting md5 is not the actual md5 of the md5 part

eg.

  • 283771245d05b26c35768d1f182fbac0 - part of file 1 md5
  • 673c3f1ad03d60ea0f64315095ad7131 - part of file 2 md5
  • 11c68be603cbe39357a0f93be6ab9e2c - part of file 3 md5

Concatenated md5: 283771245d05b26c35768d1f182fbac0673c3f1ad03d60ea0f64315095ad713111c68be603cbe39357a0f93be6ab9e2c

md5 of the concatenated line above with a dash and number of file parts :
115671880dfdfe8860d6aabd09139708-3

To do this in Java I tried two methods - neither of them returns the correct eTag value

int MB = 1048576;
int bufferSize = 5 * MB;
byte[] buffer = new byte[ bufferSize ];

try {  // String method
    FileInputStream fis = new FileInputStream( new File( fileName ) );

    int bytesRead;
    String md5s = "";

    do {
        bytesRead = fis.read( buffer );
        String md5 =  org.apache.commons.codec.digest.DigestUtils.md5Hex( new String( buffer ) );
        md5s += md5;
    }  while ( bytesRead == bufferSize );

    System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( md5s ) );
    fis.close();

}
catch( Exception e ) {
    System.out.println( e );
}


try {  //  Byte array method
    FileInputStream fis = new FileInputStream( new File( fileName ) );

    int bytesRead;
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

    do {
        bytesRead = fis.read( buffer );
        byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );
    }  while ( bytesRead == bufferSize );

    System.out.println( org.apache.commons.codec.digest.DigestUtils.md5Hex( byteArrayOutputStream.toByteArray() ) );
    fis.close();
}
catch( Exception e ) {
    System.out.println( e );
}

      

Can anyone understand why none of the algorithms work?

+3


source to share


1 answer


You should use a byte oriented method.

It fails because:

}  while ( bytesRead == bufferSize );

      

fails if the file consists of exactly x parts.



Also, it doesn't work for:

byteArrayOutputStream.write( org.apache.commons.codec.digest.DigestUtils.md5( buffer ) );

      

if the block is not completely filled with bytes, i.e. when the file does not consist of exactly x parts.

In other words, it always fails.

0


source







All Articles