Iterative hashing returns different values ​​in Python and Java

I am trying to port a python (2.7) script to Java. It repeats the sha256 hash file several times, but they have different results. I noticed that the first time they return the same result, but it is different from there.

Here is the Python implementation:

import hashlib

def to_hex(s):
  print " ".join(hex(ord(i)) for i in s)

d = hashlib.sha256()

print "Entry:"
r = chr(1)
to_hex(r)

for i in range(2):
  print "Loop", i
  d.update(r)
  r = d.digest()
  to_hex(r)

      

And in Java:

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class LoopTest {

  public static void main(String[] args) {
    MessageDigest d;
    try {
      d = MessageDigest.getInstance("SHA-256");
    } catch (NoSuchAlgorithmException e) {
      System.out.println("NoSuchAlgorithmException");
      return;
    }

    System.out.println("Entry:");
    byte[] r = new byte[] {1};
    System.out.println(toHex(r));

    for(int i = 0; i < 2; i++) {
      System.out.printf("Loop %d\n", i);
      d.update(r);
      r = d.digest();
      System.out.println(toHex(r));
    }
  }

  private static String toHex(byte[] bytes) {
    StringBuilder sb = new StringBuilder(bytes.length);
    for (byte b: bytes) {
       sb.append(String.format("0x%02X ", b));
    }
    return sb.toString();
  }
}

      

Outputs for python:

$ python looptest.py
Entry:
0x1
Loop 0
0x4b 0xf5 0x12 0x2f 0x34 0x45 0x54 0xc5 0x3b 0xde 0x2e 0xbb 0x8c 0xd2 0xb7 0xe3 0xd1 0x60 0xa 0xd6 0x31 0xc3 0x85 0xa5 0xd7 0xcc 0xe2 0x3c 0x77 0x85 0x45 0x9a
Loop 1
0x98 0x1f 0xc8 0xd4 0x71 0xa8 0xb0 0x19 0x32 0xe3 0x84 0xac 0x1c 0xd0 0xa0 0x62 0xc4 0xdb 0x2c 0xe 0x13 0x58 0x61 0x9a 0x83 0xd1 0x67 0xf5 0xe8 0x4e 0x6a 0x17

      

And for java:

$ java LoopTest
Entry:
0x01
Loop 0
0x4B 0xF5 0x12 0x2F 0x34 0x45 0x54 0xC5 0x3B 0xDE 0x2E 0xBB 0x8C 0xD2 0xB7 0xE3 0xD1 0x60 0x0A 0xD6 0x31 0xC3 0x85 0xA5 0xD7 0xCC 0xE2 0x3C 0x77 0x85 0x45 0x9A
Loop 1
0x9C 0x12 0xCF 0xDC 0x04 0xC7 0x45 0x84 0xD7 0x87 0xAC 0x3D 0x23 0x77 0x21 0x32 0xC1 0x85 0x24 0xBC 0x7A 0xB2 0x8D 0xEC 0x42 0x19 0xB8 0xFC 0x5B 0x42 0x5F 0x70

      

What is the reason for this difference?

Edit:

Thanks for the replies @dcsohl and @Alik I understand the reason now. Since I am porting a Python script to Java, I had to keep Python the way it is, so I changed my Java program like this:

byte[] r2 = new byte[]{};
for(int i = 0; i < 2; i++) {
  System.out.printf("Loop %d\n", i);
  d.update(r);
  r2 = d.digest();
  System.out.println(toHex(r2));
  byte[] c = new byte[r.length + r2.length];
  System.arraycopy(r, 0, c, 0, r.length);
  System.arraycopy(r2, 0, c, r.length, r2.length);
  r = c;
}

      

+3


source to share


2 answers


Two languages work update()

and digest()

in different ways.

The python documentation for update()

says

Update the hash object with the arg string. Repeated calls are equivalent to one call with all arguments concatenated: m.update(a); m.update(b)

equivalent m.update(a+b)

.

I tested this with the command sha256sum

.

echo -n '\0x01\0x4b\0xf5\0x12\0x2f\0x34\0x45\0x54\0xc5\0x3b\0xde\0x2e\0xbb\0x8c\0xd2\0xb7\0xe3\0xd1\0x60\0xa\0xd6\0x31\0xc3\0x85\0xa5\0xd7\0xcc\0xe2\0x3c\0x77\0x85\0x45\0x9a' | sha256sum
981fc8d471a8b01932e384ac1cd0a062c4db2c0e1358619a83d167f5e84e6a17 *-

      

You start with \ 0x01 so the first byte and then the rest of the bytes hash 0x01. The resulting hash matches your Python output.



Now let's look at this - I dropped the initial \ 0x01 and got the hash back - it matches your Java output.

> echo -n '\0x4b\0xf5\0x12\0x2f\0x34\0x45\0x54\0xc5\0x3b\0xde\0x2e\0xbb\0x8c\0xd2\0xb7\0xe3\0xd1\0x60\0xa\0xd6\0x31\0xc3\0x85\0xa5\0xd7\0xcc\0xe2\0x3c\0x77\0x85\0x45\0x9a' | sha256sum
9c12cfdc04c74584d787ac3d23772132c18524bc7ab28dec4219b8fc5b425f70 *-

      

But why? Shouldn't the initial \ 0x01 be included? It would, except that the javadoc for digest()

says:

Completes the hash computation by performing final operations such as padding. After this call, the digest is reset.

So your initial \ 0x01 gets discarded when you call digest()

in java and you just digest the old digest without the initial \ 0x01 entry.

+2


source


In Java, d.digest

returns the message digest and flushes the digest at the end.

There is d.digest

no reset digest in Python . This way, the repeated calls are d.update

actually merged with what was transferred in the previous calls

You can just put d = hashlib.sha256()

inside the loop



import hashlib

def to_hex(s):
  print " ".join(hex(ord(i)) for i in s)



print "Entry:"
r = chr(1)
to_hex(r)

for i in range(2):
  print "Loop", i
  d = hashlib.sha256()
  d.update(r)
  r = d.digest()
  to_hex(r)

      

to get the same results as your java program

Entry:
0x1
Loop 0
0x4b 0xf5 0x12 0x2f 0x34 0x45 0x54 0xc5 0x3b 0xde 0x2e 0xbb 0x8c 0xd2 0xb7 0xe3 0xd1 0x60 0xa 0xd6 0x31 0xc3 0x85 0xa5 0xd7 0xcc 0xe2 0x3c 0x77 0x85 0x45 0x9a
Loop 1
0x9c 0x12 0xcf 0xdc 0x4 0xc7 0x45 0x84 0xd7 0x87 0xac 0x3d 0x23 0x77 0x21 0x32 0xc1 0x85 0x24 0xbc 0x7a 0xb2 0x8d 0xec 0x42 0x19 0xb8 0xfc 0x5b 0x42 0x5f 0x70

      

+1


source







All Articles